Benutzerspezifische Werkzeuge
Sie sind hier: Startseite Bioinformatik in München Bioinformatics Colloquium

Bioinformatics Colloquium

erstellt von bimadmin zuletzt verändert: Apr 10, 2014 05:28 PM

Regular Bioinformatics Colloquium: during winter and summer terms, wednesdays every 2 weeks, 18.00 s.t.

Since the advent of the genome area, bioinformatics has developed to an essential discipline for the analysis of biomolecular data. Initiated by a programme of the Deutsche Forschungsgemeinschaft, the Bioinformatics Curriculum in Munich now enters its 3rd year. While the bioinformatics education of the Munich Universities together with the Max-Planck-Institute for Biochemistry and the GSF intends to focus on education and research, the BMBF funded BFAM programme concentrates on the interdisciplinary collaboration of bioinformaticians, computer scientists, mathematicians, and experimentalists. Partners in BFAM are the Munich Universities, the University of Erlangen, the GSF, and 3 bioinformatics companies, namely Biomax Informatics AG, Genomatix GmbH, and Molecular Networks GmbH.

To establish a regular forum for bioinformatics in the Munich region, we like to invite you to our regular Bioinformatics Colloquium that will be held every other week. In this forum, prominent speakers will present most up-to-date results of their work and give reviews on the many facets in bioinformatics ranging from theoretical approaches to the bioinformatics dedicated to biology. Our aim is to present the entire spectrum of bioinformatics. The organizers are also open for nominations of speakers to be invited.

Announcements are sent before every talk via a mailing list.

Location in winter term 2012/13 is lecture hall 102, Richard-Wagner-Str 10

April 23, 2014, 18st: tba

Dr. Stefan Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Göttingen

In our group we are really interested to understand the interplay between genomics and epigenomics and their correlative and causal roles in health and disease. In the first part of my presentation I will present the epigenetic basis of learning and memory in healthy individuals and how this gene-regulatory network changes for different stimuli. The focus of the second part will be the reannotation of the human and murine exome using novel algorithms, the role of the exome in differential splicing, and the impact on patient diagnosis. I will finish with a brief description of our efforts to build new NGS analysis tools and biomarker discovery.

Archive

June 19, 2013, 18st: Scalable Learning of Networks for Disease Biology

Dr. Sach Mukherjee, The Netherlands Cancer Institute, Amsterdam

An emerging approach in systems biology and personalized medicine is that of relating molecular networks to disease outcomes and treatment. In a nutshell, the idea is that networks that describe biology relevant to disease phenotypes may differ between patients, or between patient subpopulations, such that their systematic characterization could help to explain corresponding variation in disease phenotypes or response to therapy. A major computational and experimental challenge is to develop algorithms and protocols by which to learn such networks. Using protein signalling as an paradigmatic example, we will discuss our ongoing efforts to develop approaches for network inference in this setting, including scalable tools for time-course data, joint estimation of multiple (related) networks, non-linear models and validation frameworks that can be applied in the complex mammalian settings relevant to disease biology. Along the way we will discuss some of the caveats and fundamental concerns in the general area of causal networks for biological applications.

May 15, 2013, 18st: Accounting for Hidden Confounding and Context in Genetics Studies of Molecular Traits

Dr. Oliver Stegle EMBL-EBI, Hinxton, Cambridge, UK

Many phenotypic traits of interest are heritable and vary as a function of genetic and external factors. Genome-wide association studies have revealed an abundance of individual genetic loci that are linked to phenotypes, including human diseases. Most recent large-scale studies have begun to complement genotype and phenotype data with readouts from high-throughput molecular profiling techniques, allowing studying intermediate genetic regulation at the level of transcription and translation. However, despite the success of initial studies, many challenges and open questions remain how these high-dimensional data can be analyzed most effectively. As demonstrated by recent work, batch effects, population structure and other confounding factors can both introduce spurious associations and preclude true biological relationships. Moreover, genetic effects are often specific to a particular external context such as the environmental state, which may not be known or measurable with sufficient precision. Despite their relevance, little is understood how these confounding and external factors affect the analyses and can be accounted for. In this talk, I will discuss recent computational approaches that build on statistical modelling to estimate these factors from the observed data to improve genetic studies.

April 24, 18st: Genome-Wide Analysis of Gene Expression: Three Studies

Dr. Julien Gagneur LMU, Gene Center Munich, Department of Chemistry and Biochemistry, Munich

Our group, recently started in Munich, is interested in computational approaches to understand mechanisms of gene regulation and their phenotypic impact from genome-wide assays. I will present three studies illustrating our research activities. (1) The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. With Model-based Gene Set Analysis (MGSA), we tackle the problem by turning the question differently. Instead of searching for all significantly enriched groups, we search for a minimal set of groups that can explain the data. (2) A systematic analysis of sense-antisense expression in response to genetic and environmental changes in yeast showed that antisense expression allows to "switching off" basal levels of gene expression. (3) Systems genetics with environment. Using genome-wide functional genomics assays in yeast, we show that the predictive power of eQTL studies for inferring mediating genes is poor unless performed across multiple environments.

January 9, 2013, 18st: Affinity-Based, Quantitative Proteomic Analysis of Pancreatic Cancer

Dr. Jörg D. Hoheisel, Functional Genome Analysis, Deutsches Krebsforschungszentrum, Heidelberg

Background: Pancreatic adenocarcinoma is one of the most aggressive and malignant tumor entities; mortality is nearly identical to incidence. Most patients die within a year of diagnosis.
Methods: For early and accurate diagnosis and an elucidation of functional aspects, we performed proteome studies by means of a complex antibody microarray with about 1000 binders. We also compared the results to data produced from the very samples at the levels of DNA sequence, promoter methylation, mRNA and microRNA expression as well as genome-wide knock-down analyses.
Results: For early diagnosis, several hundred urine and serum samples were studied, yielding diagnostic protein patterns of high accuracy and specificity. Also, variations in the protein abundance and structure in 650 tissue samples were analyzed, using also antibodies that are specific for cancer-associated isoforms. For comparison, 24 pancreatic cancer cell lines and appropriate controls were investigated, examining the cells at steady-state and after induction with various factors such as cytokines. In addition, we measured the resulting variations in the cells? secretome.
Conclusions: The study revealed many new characteristics of the disease including, for example, information about its degree of differentiation, the source of cells and the metastatic potential. The results form a basis for non-invasive and early diagnosis, allow a more accurate grading of the disease and a determination of its metastatic potential. Several therapy-relevant pathways were identified and protein isoforms were revealed that are specific for tumor tissues and have strong relevance to therapy.

December 5, 2012, 18st: Evolutionary Genomics up a Tree: Using Phylogenies to Infer Complex Selection in Sequence Evolution

Dr. Georgii Bazykin, Dept. of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia

Darwinian selection is the driving force of functional evolution. Even at the simplest level of nucleotide sequences, however, the pathways and constraints followed by selection remain very poorly understood. The ongoing avalanche of genome-scale data on sequence divergence and polymorphism is relevant here: combining such data from multiple species with known phylogenetic relationships can help decipher complex evolutionary scenarios. In particular, epistatic selection, i.e. situation when the fitnesses of different alleles at a site are dependent on alleles present at another site, is expected to lead to non-uniform distribution of allele replacements on the phylogeny. I will tell about some of our work in this field. We address a number of questions on genome-level role of selection and epistasis, using genomes and phylogenies of mammals, insects, viruses, etc. Some of the questions asked are: What fraction of amino acid substitutions are positively selected? Do epistatic interactions between substitutions play a major role in evolution of coding and non-coding sequence? Do radical mutations that get fixed in a population lead to a series of subsequent small-effect mutations that alleviate their effect?

November 21, 2012, 18st: MaxQuant: Computational mass spectrometry-based proteomics for the masses

Dr. Jürgen Cox, Max-Planck-Institute of Biochemistry, Martinsried

Chemical reactions are center-stage in systems biology. Due to low copy numbers and constrained environments inside biological cells or organelles, however, biological reaction kinetics in systems biology is frequently not appropriately described by continuous models, such as differential equations. The chemical master equation (CME) provides an exact mesoscopic description, including higher-order fluctuation kinetics. Numerically sampling from the CME traditionally made use of Gillespie-type algorithms. These algorithms, however, are computationally expensive, hampering large simulations and parameter identification. We present a new class of exact stochastic simulation algorithms (SSA) to sample from the CME in linear or constant time. The algorithms are based on the concept of partial propensities, which allow factorizing the problem into less complex ones. We further show applications of this class of algorithms to studying the effects of fluctuations in mesoscopic reaction networks, as frequently found in systems biology. We demonstrate a novel concentration inversion effect in monostable reaction networks and present applications to parameter inference in network models from noisy experimental data.

October 24, 2012, 18st: Partial-Propensity Simulation Algorithms for Stochastic Chemical Kinetics and the Role of Fluctuations in Mesoscopic Chemical Reaction Systems

Dr. Ivo Sbalzarini, Max Planck Institute of Molecular Cell Biology and Genetics Center of Systems Biology

Chemical reactions are center-stage in systems biology. Due to low copy numbers and constrained environments inside biological cells or organelles, however, biological reaction kinetics in systems biology is frequently not appropriately described by continuous models, such as differential equations. The chemical master equation (CME) provides an exact mesoscopic description, including higher-order fluctuation kinetics. Numerically sampling from the CME traditionally made use of Gillespie-type algorithms. These algorithms, however, are computationally expensive, hampering large simulations and parameter identification. We present a new class of exact stochastic simulation algorithms (SSA) to sample from the CME in linear or constant time. The algorithms are based on the concept of partial propensities, which allow factorizing the problem into less complex ones. We further show applications of this class of algorithms to studying the effects of fluctuations in mesoscopic reaction networks, as frequently found in systems biology. We demonstrate a novel concentration inversion effect in monostable reaction networks and present applications to parameter inference in network models from noisy experimental data.

Juli 23 (Monaday!), 2012, 18st: Studies of Bacteriophage Evolution: What can Genome Sequences tell us?

Dr. David M. Kristensen, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA

Despite the fact that bacteriophages are extremely active players in the global ecosystem, as well as the most abundant biological entities on the planet, much remains unknown about how these viruses function in their natural environments. Advances in full-genome sequencing technologies have generated a large collection of hundreds of genomes, which allows deep insight into their genetic evolution, and metagenomics technologies seem to promise more rewarding glimpses into their lifecycles and community structures. Previously we developed an automated approach to assemble a novel collection of orthologous gene families in phages, which gives us a powerful tool to study these fascinating organisms, and recently we have doubled its size and expanded its scope. Using this resource, we found that more than half of all gene families in phages are not shared with their cellular hosts, making them ideal candidates for diagnostic tests as sensitive and precise markers of specific viral families. Studies of the evolutionary processes of phages also have implications for their use as therapeutic agents, as well as expanding their role as tools for biotechnology and other applications. I will present a network analysis of genes shared between various groups of phages, as well as a procedure to identify such diagnostic marker genes. Together, these approaches reveal potential for bacteriophage taxonomy schemes that use genomic information rather than purely structural characteristics.

June 13, 2012, 18st: Systems Organization and Pathogen Perturbation of a Plant Interactome Network

Dr. Pascal Falter-Braun, Lehrstuhl für Systembiologie der Pflanzen, TU München

Elucidating mechanisms of life requires analysis of whole systems and understanding the complex interplay of the individual components. Proteins control and mediate the majority of biological activities and interactions among proteins play a decisive role in the dynamic modulation of cellular behavior. Protein-protein interactions are essential constituents of all cells and interactome analysis is an important component in the quest for a systems level understanding of life. Using a high-quality binary interactome mapping pipeline we explore interactome networks for yeast, human and plant at ever increasing completeness and quality. Based on benchmarking and standardized reference sets combined experimental approaches and mathematical modelling are used to ensure quality and asses the completeness of interactome maps. These models enable a critical assessment of current maps and guide development of a roadmap towards completion. We recently completed mapping of the first binary interactome network for the reference plant Arabidopsis thaliana. Using tools of graph theory we identify biologically relevant network communities from which a picture of the overall interactome network organization starts to emerge. Combination of interaction and comparative genomics data yielded insights into network evolution, and biological inspection resulted in many hypotheses for unknown proteins and revealed unexpected connectivity between previously studied components of phytohormone signaling pathways. Lastly, it was investigated how viral, bacterial and fungal pathogens perturb their host's network. For plants it was found that pathogen effectors from evolutionary distant pathogens converge on network hubs, which appear "guarded" by resistance proteins, and which are functionally important for the host's immune responses. Together, it will be shown how high-quality protein interactome network maps provide us with tools for elucidating fundamental laws underlying biological systems that need to be analyzed using both experimental and computational approaches.

May 23, 2012, 18st: Protein-Protein and Protein-Ligand Interactions: Combining Large Datasets With Structural Knowledge

Dr. Olga Kalinina, Department for Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken

The growing wave of data on protein sequences and structures poses new questions and opens new prospects for computational biology. In this talk, we will consider several examples of how a structural bioinformatics study can profit from using large datasets. First, we present a novel approach to predicting of drug-target interactions that makes use of all available protein-ligand complexes. In another study, we use a large body of known sequences for the HIV capsid protein to gain insight into quaternary arrangement of the HIV capsid structure.

May 9, 2012, 18st: Systems Analysis of Cellular Networks Under Uncertainty

Prof. Dr. Jörg Stelling, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland

Systems biology iteratively combines experimentation with mathematical modeling, and the complexity of cellular networks constitutes an important challenge for this approach. In addition, limited mechanistic knowledge, conflicting hypotheses, and relatively scarce experimental data hamper the development of mathematical models as systems analysis tools. From a structural network analysis point of view, defining and identifying suitable sub-structures, or motifs, in complex interaction networks significantly helps understanding biological functions. However, current approaches only allow for function prediction at the motif level: by recognizing a known motif in a given network, one can assign its previously established function. We introduce the concept of reaction motifs and develop probabilistic, predictive models for metabolic network functions. Importantly, the framework leverages information 'hidden' in the correlation structure of motifs in metabolic networks to automate, for instance, gap filling and genome annotation. At a more detailed level, methods to systematically develop and discriminate between predictive dynamic models are still lacking. To address this problem, we developed a computational method that incorporates all hypothetical mechanisms about the architecture of a biological system into a single model, and automatically generates a set of simpler models compatible with observational data. For the short-term dynamic control of the transcription factor Msn2 in yeast, iterations between model predictions and rationally designed phosphoproteomics and imaging experiments identified a single, highly plausible circuit topology. Overall, novel mathematical and computational methods may allow for the systematic construction of systems biology models despite the prevailing uncertainty in network components and functions.

January 25, 2012, 18st: Pharmacogenomics: do we need a systems biology approach?

Prof. Matthias Schwab, Institute of Clinical Pharmacology, Stuttgart and University Tuebingen

Variation in drug disposition and response among patients is a major concern associated with many therapeutical agents used in all disciplines of medicine. The clinical relevance of variability is most evident with drugs that have a narrow therapeutic window (i.e., the dose used is close to the dose probably resulting in drug-related toxicity in most individuals). With increasingly information available from the Human Genome Project and the HapMap Project, pharmacogenomics aims to elucidate the genomic determinants of drug efficacy and toxicity. For instance, variants in genes that are relevant for ADME processes such as drug metabolizing enzymes, drug transporters and nuclear receptors have profound effects on patient outcome. Recent clinically important examples are pharmacogenomics of tamoxifen, a well established drug for treatment of postmenopausal breast cancer, and pharmacogenomics of clopidogrel, an antiplatelet drug. However, it is unlikely that one single gene will affect exclusively disease or treatment outcome, and therefore a more comprehensive approach will be to consider genetic polymorphisms in entire biological/ pharmacological pathways. Recently developed '-omics' appoaches (e.g. genomics, transcriptomics, proteomics, metabonomics) will be helpful to identify further putative targets for better prediction of drug response and will complement each other. Array technologies (e.g. cDNA arrays, GWA), next generation sequencing and metabonomics have shown to be helpful for identifying novel genes, redefining disease diagnosis and predicting therapy response to specific drugs. Finally, non-genetic factors as well as epigenetics (e.g. methylation, miRNA) have to be considered more intensively in the future. Experimental as well as computational approaches are required to obtain holistic, mechanistic information on disease networks and drug response. Thus, only systems pharmacology allows the integration of the systems-level understanding of drug response with genome medicine to promote also drug discovery for personalized medicine.

December 14, 2011, 18st: From Pattern Discovery to Discovery Support

Prof. Dr. Michael Berthold, Department of Computer and Information Science, University of Konstanz

This talk will give an overview of over a decade of research on data analysis in the life sciences. I will present a number of explorative data analysis methods and how we have deployed them to users via the open source data analysis platform KNIME. Most of these methods focus on the discovery of interpretable, local patterns in large data sets.
In the second part of the talk I will present a new direction of research focusing on the exploration of large, heterogeneous information networks where the focus shifts from classic (local) pattern discovery to creativity support systems. I will discuss a few early example of methods that help to discover previously unknown and potentially helpful relations in large, diverse information repositories.

November 30, 2011, 18st: Haystack-omics: Analyzing Huge and Polyploid Cereal Genomes by Genome and Chromosomal Shotgun Sequencing

Dr. Klaus Mayer, MIPS/IBIS, Helmholtz Center Munich

Despite their outstanding economic and sociocultural importance no cereal genome has been fully sequenced and analyzed thus far. Challenges are their size which ranges between 5 and 17 Gigabases, the richness in long and highly repetive elements and, in part, the polyploid genomes that significantly complicate dissection of homeologous chromosomal groups. We developed strategies to fight and overcome these limitations that use creative cytogenetics, NGS-based sequencing strategies, massive bionformatic sequence analysis and, as an important glue, comparative genetics and genomics to get insight into the gene-omes of barley, wheat and rye. The presentation will strive to illustrate the value of comparative grass and cereal genomics and introduce novel concepts and approaches that enable to approach these huge genomes, to dissect similarities and dissimilarities among subgenomes and to address a range of functional and evolutionary questions. In particular the presentation will aim to report on a synteny driven cereal genome structuring approach as well as orthologous scaffolds to structure complex WGS from the hexaploid wheat genome. Besides new insights into the genesis of supernummerary B chromosomes from chromosome sorting will be reported.

November 16, 2011, 18st: Using Transcriptome Kinetics to Decipher Cellular Decisions - What do we see, What don't we see?

Dr. Hauke Busch, Freiburg Institute for Advanced Studies (FRIAS, LifeNet), University of Freiburg, Center for Biological Systems Analysis (ZBSA), University of Freiburg

Cells initiate and control decisions like migration, proliferation or differentiation through an intricate, yet coordinated, regulation of large gene interaction networks. Here we show that network topology plays an important role in the cellular regulation, imposing constraints on gene regulation. Using /in silico/ stimulus-response simulations of /E. Coli/ and Yeast gene networks we find that highly connected network hubs genes are responding weakly, while strongly responding genes have, on average, a low degree of network connectivity. Being furthermore located at the network periphery, the latter act as effector genes, tightly linking to the cellular phenotype, being under the control of the moderately responding hub genes.
As network topology is mostly conserved between species, a similar topology-dynamics relationship is expected in higher organisms. Hence, we applied our approach to migrating primary human keratinocytes under Hepatocyte Growth Factor stimulation as well as to dedifferentiating cells in the moss /Physcomitrella Patens/.
Analysis of time-resolved microarray of migrating keratinocytes revealed a strong correlation between differentially regulated genes and cell migration. When inhibiting strongly responding genes, a decrease in the migratory activity proportional to the genes' response strength was found, in line with our initial hypothesis.
To identify hub genes mediating dedifferentiation in /P. Patens/, we applied the idea of cell attractors in the search for genes that contribute to the coordinated, long-term change in gene expression, albeit responding moderately strong to the stimulus. The analysis highlighted nine novel transcription factors that possibly contribute to dedifferentiation, two of which have been experimentally verified already. Taken together, application of network theoretic ideas to transcriptome dynamics of 'deciding' cells identifies a hierarchical organization of key genes crucially involved in the respective cellular response, irrespective of the biological model under investigation.

November 2, 2011, 18st: A Microfluidic Platform for Massively Parallel Measurements of Biomolecular Interaction Kinetics

Dr. Marcel Geertz, University of Geneva, Ecole Polytechnique Federale de Lausanne

Systems and synthetic biology heavily rely on quantitative data. Most efforts in biology have thus far focused on inventorying and mapping genomes and proteomes. Genome sequencing and gene expression analysis have provided insight into genomic architecture, and functional genomics approaches have mapped network topologies. However, network topologies alone are not sufficient to model complex biological processes. Precise quantitative dynamic information describing each node of a network is instead necessary.
Here we present an integrated microfluidic device based on MITOMI (mechanically induced trapping of molecular interactions), capable of characterizing 768 independent biomolecular association and dissociation rates in parallel. MITOMI is a versatile detection platform capable of measuring a variety of biomolecular interactions including protein-protein, protein-DNA, protein-RNA, and protein-small molecule. We applied our platform to the high-throughput characterization of transcription factor (TF) - DNA interactions.
To measure kinetic rate parameters our kinetic MITOMI platform (MITOKI) uses rapid and repeated actuations of the MITOMI "buttons" to follow the association and dissociation of fluorescently labeled molecules to surface bound proteins. Using this process we measured the association and dissociation kinetics of 220 DNA sequences (all 3mer ZF variants) to ZIF268 in triplicates. We also measured the kinetics of 48 yeast TF DBD with their cognate DNA binding sequences in parallel on a single device. All 48 TF DBDs were synthesized, immobilized, and characterized on-chip. Our platform captures on average 22'000+ data points in a single experiment.

May 25, 2011, 18st: From Assembling Short DNA Reads to Protein Sequencing by Assembling Mass Spectra

Prof. Pavel Pevzner, University of California at San Diego

Increasing read length is viewed as the crucial condition for fragment assembly with next-generation sequencing technologies. However, introducing mate-paired reads (separated by a gap of length GapLength) opens a possibility to transform short mate-pairs into long mate-reads of length approximately GapLength, and thus raises the question as to whether the read length (as opposed to GapLength) even matters. We introduce Paired de Bruijn graphs that address this issue and provide an attractive alternative to a traditional de Bruijn assembly assembly used in existing NGS tools. We further describe recent advances in single cell DNA sequencing and demonstrate that with appropriate assembly tools, the quality of bacterial single cell sequencing may approach the quality of traditional multicell sequencing.
We further show how our approach to DNA sequencing can be generalized to Shotgun Protein Sequencing (SPS). We illustrate applications of SPS to sequencing of snake venoms and antibodies and show how mass-spectrometry enables de novo sequencing of peptide-like antibiotics.

May 18, 2011, 18st: The Role of Molecular Regulation in Transmitting Information in Small Gene Regulatory Networks

Dr. Aleksandra Walczak, Laboratoire de Physique Théorique, ENS Paris

Many of the biological networks inside cells can be thought of as transmitting information from the inputs (e.g., the concentrations of transcription factors or other signaling molecules) to their outputs (e.g., the expression levels of various genes). On the molecular level, the relatively small concentrations of the relevant molecules and the intrinsic randomness of chemical reactions provide sources of noise that set physical limits on this information transmission. Given these limits, not all networks perform equally well, and maximizing information transmission provides a optimization principle from which we might hope to derive the properties of real regulatory networks. Inspired by the precision of transmission of positional information in the early development of the fly embryo, I will discuss the properties of specific small networks that can transmit the maximum information. Concretely, I will show how the form of molecular noise drives predictions not just of the qualitative network topology but also the quantitative parameters for the input/output relations at the nodes of the network. I will show how the molecular details of regulation change the networks ability to transmit information.

February 16, 2011, 18st: Metabolic Network Analysis Helps to Unveil the Biology of TB Infection

Dr. January Weiner, Max Planck Institute for Infection Biology, Berlin

The human metabolic response to infection with M. tuberculosis has been investigated on the level of small metabolites found in the serum. The aim of the study was not only to propose new biomarkers that distinguish between latent infection and clinical TB, but also to study the underlying biological mechanism of disease progression. The characteristic metabolomes of three study groups were compared: healthy controls (TST-), latent infection (TST+) and clinical TB (TBactive). The relative levels of almost 500 distinct small molecular compounds were acquired, including amino acids, short peptides, fatty acids and nucleotides. These were analyzed in the context of TB infection and clinical TB progression. Classification analysis of the data set shows that the clinical TB patients can be dependably distinguished from the other groups and levels some compounds are characteristic for either healthy controls or subjects with latent infection. Several functional groups of compounds which differentiate the study groups could be reliably determined by clustering analysis. The functional links found in this study indicate a role of a number of biological processes in TB progression, and novel results are confirmed by targeted experiments.

February 9, 2011, 18st: Computer-Aided Drug Repurposing

Prof. Dr. Tudor Oprea, Prof. Biochemistry and Molecular Biology (BMB), University of New Mexico School of Medicine, Albuquerque, NM

Finding new uses for old drugs is a strategy embraced by the pharmaceutical industry, with increasing participation from the academic sector. Current drug repurposing efforts focus on identifying novelmodes of action, but not in a systematic manner. With intensive data mining, processing and curation, we aim to apply bio- and cheminformatics tools using an exhaustive DRUGS database, which contains 3,837 unique small molecules annotated on 1,759 proteins that are likely drug targets and antitargets (i.e., associated with adverse drug reactions, ADRs). Preclinical examples of drug repurposing include Raltegravir and Astemizole; we also discuss Cyclobenzaprine and the serotonin syndrome. Text mining algorithms and multivariate statistics were used to process the DailyMed collection (7,684 approved drug labels), matching 1,000 unique small molecule ADLs to 174 ADRs .

February 2, 2011, 18st: Schizophrenia

Prof. Dr. med Dan Rujescu, Psychiatrische Klinik der Ludwig-Maximilians-Universität (LMU) Molekulare und Klinische Neurobiologie

A major challenge in medicine is to understand genetic, molecular and cellular mechanisms underlying common mental disorders including schizophrenia, which involve complicated genetic and environmental determinants. Schizophrenia is a common mental disorder, affecting 0.5-1% of the population. Schizophrenia mostly presents with several episodes and tends to become chronic. Approximately 30% of patients with schizophrenia require support throughout their lives. Roughly 50% will have lifelong disabilities and social problems. Its direct costs in western countries range between 1.6-2.6% of total health care expenditures. The last few years have witnessed an explosion of interest in human genetics of complex diseases. The knowledge resulting from the availability of the complete sequence of the human genome, the systematic identification of single nucleotide polymorphisms (SNPs) throughout the genome, and the development of parallel genotyping technology (microarrays) established the conditions that brought about the current revolution in our ability to probe the genome for identifying disease genes. Genome-wide association (GWA) studies have opened a window into the biology of common complex diseases and have provided proof of principle and yielded several genes showing strong association with complex diseases or traits including Crohn's disease, diabetes and many others. These studies revealed genes involved in pathogenesis and identified entirely unexpected disease pathways. This is of utmost importance given that this knowledge can translate into the development of better treatment or even cure. The talk will especially focus on new found common and rare genetic variants presenting the newest and most promising results from large genome-wide efforts including tens of thousands of patients and controls.

December 8 2010, 18st: Analysis of Methylation and Allelic Imbalance Using Next Generation Sequencing

Priv.-Doz. Dr. Jochen Hampe, Med. Klinik I, University Kiel

Next generation sequencing heralds the promise of being a one-stop shop for the technological needs in genomic exploration. In this presentation, some of the methodological challenges in transcriptome sequencing and methylation analysis using NG sequencing are explored. Firstly, using publicly available data from the 1000 genomes project, technology-specific error signatures are explored using an entropy-based statistical framework. Estimates of the false discovery rate for novel variants and for the accuracy of allele calls are derived. This accuracy of allele calls has strong implications for the analysis of allelic imbalance, which may be a promising approach for the interpretation of genomic alterations in cancer using the intermediate RNA phenotype. Lastly, a targeted pipeline for methylation analysis using droplet-based PCR-enrichment is presented.

November 24 2010, 18st: Reducing Belief Simpliciter to Degrees of Belief

Prof. Dr. Hannes Leitgeb, Ludwig-Maximilians-Universität, Munich Center for Mathematical Philosophy

There are two kinds of belief: belief simpliciter - believing that A is the case - and degrees of belief - assigning subjective probabilities to propositions. We prove that given reasonable assumptions, it is possible to give an explicit definition of belief simpliciter in terms of subjective probability, such that it is neither the case that belief is stripped of any of its usual logical properties, nor is it the case that believed propositions are bound to have probability 1. Belief simpliciter is not to be eliminated in favour of degrees of belief, rather, by reducing it to assignments of consistently high degrees of belief, both quantitative and qualitative belief turn out to be governed by one unified theory. Turning to possible applications and extensions of the theory, we suggest that this will allow us to see: how the Bayesian approach in general philosophy of science can be reconciled with the deductive or semantic conception of scientific theories and theory change; how primitive conditional probability functions (Popper functions) arise from conditionalizing absolute probability measures on maximally strong believed propositions with respect to different cautiousness thresholds; how the assertability of conditionals can become an all-or-nothing affair in the face of non-trivial subjective conditional probabilities; and how high conditional chances may become the truthmakers of counterfactuals.

November 10 2010, 18st: Bioinformatics in Pharmaceutical Industry

Dr. Bertram Weiss, Principal Scientist in Global Drug Discovery Target Discovery, Bayer Schering Pharma AG

Using bioinformatics tools and databases has become a routine job for most biologists and the role of bioinformatics professionals in the pharmaceutical industry has changed accordingly in the last decade. This talk outlines the daily tasks and challenges of a bioinformatician at Bayer Schering Pharma.

One focus is of course the 'bioinformatics service': getting the right bioinformatics resources integrated in a meaningful way in order to make lab scientists more efficient in taking well-informed decisions within their drug discovery projects on a daily basis. Ideally they can do this without input of bioinformaticians. This requires on one hand to have all relevant public and in-house information (about drugs, disease, gene, patents, project-data, etc) available with a few mouse clicks. On the other hand bioinformatics needs to establish and organise adequate solutions for the collection, storage, adequate analysis and presentation of the in-house generated data especially data around genes like e.g. gene expression or RNAi screening results.

The main task, however, is to apply bioinformatics to identify novel targets within the different indications. This requires a deep understanding of the underlying biology of the diseases. Here, we use the available plethora of in-house and public data to filter out the most promising target candidates based an the different criteria. In collaboration with our wet lab scientists we then have to show that intervention at this point in particular improves the in-vitro or in-vivo disease parameters.

July 14 2010, 18st: From Human Phenotypes to Drug Targets

Dr. Monika Campillos, EMBL Heidelberg

Drug side effects are human phenotypic information. We have developed a method based on the comparison of side-effect profiles of drugs to predict whether two drugs share a target. Applied to 746 marketed drugs, 261 pairs of dissimilar drugs implicated in different therapeutic indications are predicted to share a target, hinting at new uses of marketed drugs. We experimentally tested 20 of these unexpected drug-drug relations and confirmed 13, implying a physiological relevance of the novel drug-target relations and documenting the feasibility of utilizing phenotypic information for inferring molecular interactions

July 7, 2010, 18st: Analysis of Methylation and Allelic Imbalance Using Next Generation Sequencing

Priv.-Doz. Dr. Jochen Hampe, Med. Klinik I, Universität Kiel

Canceled

Next generation sequencing heralds the promise of being a one-stop shop for the technological needs in genomic exploration. In this presentation, some of the methodological challenges in transcriptome sequencing and methylation analysis using NG sequencing are explored. Firstly, using publicly available data from the 1000 genomes project, technology-specific error signatures are explored using an entropy-based statistical framework. Estimates of the false discovery rate for novel variants and for the accuracy of allele calls are derived. This accuracy of allele calls has strong implications for the analysis of allelic imbalance, which may be a promising approach for the interpretation of genomic alterations in cancer using the intermediate RNA phenotype. Lastly, a targeted pipeline for methylation analysis using droplet-based PCR-enrichment is presented.

June 9, 2010, 18st: From Systems Biology to Systems Technobiology: Perspective and Challenges from a Modelized Point of View

Prof. Dr.-Ing. Andreas Kremling, Fachgebiet für Systembiotechnologie, Garching

Many human diseases are the result of evolutionary processes on time scales much shorter than the human lifetime. Prominent examples of pathogenic, measurably evolving populations are cancer cells in a tumor and infectious parasites, such as bacteria and viruses. Cancer progression is driven by mutation and selection in an asexually reproducing population of tumor cells. Treatment of these constantly changing ensembles of genetically heterogeneous cells is complicated by evolutionary escape from the selective pressure of drugs and immune responses. We present mathematical models for the evolutionary dynamics of escape and discuss applications to the genetic progression of cancer. A graphical model for the accumulation of genetic changes is shown to improve genetics-based survival predictions in patients with renal cell carcinoma. We also discuss statistical methods for inferring the genetic diversity of tumors from ultra-deep sequencing experiments and compare experimental results to model predictions.

May 12, 2010, 18st: Evolutionary Dynamics of Cancer

Prof. Dr. Niko Beerenwinkel, Computional Biology Group, ETH Zurich, Switzerland

Many human diseases are the result of evolutionary processes on time scales much shorter than the human lifetime. Prominent examples of pathogenic, measurably evolving populations are cancer cells in a tumor and infectious parasites, such as bacteria and viruses. Cancer progression is driven by mutation and selection in an asexually reproducing population of tumor cells. Treatment of these constantly changing ensembles of genetically heterogeneous cells is complicated by evolutionary escape from the selective pressure of drugs and immune responses. We present mathematical models for the evolutionary dynamics of escape and discuss applications to the genetic progression of cancer. A graphical model for the accumulation of genetic changes is shown to improve genetics-based survival predictions in patients with renal cell carcinoma. We also discuss statistical methods for inferring the genetic diversity of tumors from ultra-deep sequencing experiments and compare experimental results to model predictions.

April 28, 2010, 18st: Bio-Imaging Cerebellar Development and Degenerative Diseases in Zebrafish

Dr. Reinhard Köster, Institute of Developmental Genetics, Helmholtz Zentrum, München

In the past decades zebrafish has been added to the group of vertebrate model organisms commonly used in genetic research. Based on its small size, external development, fast embryogenesis, transparent body and high fecundity zebrafish has quickly become a popular model in the fields of developmental biology, cell biology and pharmacology. Since the discovery of fluorescent proteins in vivo microscopy in zebrafish has become a widely used technique to observe cellular and molecular processes in the living organism. We use this combination of genetics and in vivo imaging to investigate the development and differentiation of the vertebrate brain and in particular the cerebellum. Here we aim to understand how migration and terminal differentiation of neurons is orchestrated on the cellular and molecular level. Furthermore, for investigating the initiation and progression of neurodegenerative diseases we develop zebrafish models that are amenable for in vivo diagnosis by bio-imaging. This will not only help to unravel the etiology of severe human diseases, but also offer routes for therapeutic intervention by means of small molecule screening.

March 31, 2010, 16st: Computational Studies Revealed Determinants of the Specificity in the Swine and Avian Flu

Prof. Dr. Nir Ben-Tal, Department of Biochemistry, Tel Aviv University, Israel

Viral strains may differ from each other markedly in their phenotypic properties, in spite of the high similarity of their genomes. For example, the various influenza strains exhibit a wide spectrum of pathogenicities and specificities to different hosts. With the outbreak of the highly pathogenic avian H5N1 influenza in 1996, and the H1N1 swine origin influenza in 2009, considerable research has focused on studying these strains. We developed a computational approach for the identification of key amino acid positions that determine pathogenicity, species barriers, etc of selected groups of proteins. This is a novel application to protein sequence analysis of a well-established classification approach. In my seminar I will present applications of the method to the H5N1 and H1N1 influenza strains.

Highly pathogenic in humans, although yet to become widespread in the population, the H5N1 strain constitutes a major threat owing to significant similarity between avian and human infecting viruses. The hemagglutinin (HA) protein of influenza A is the main antigen on the viral surface, mediating binding to the host receptors and virus entry into the cell. An alteration from avian-to-human like recognition via HA is thought to be one of the changes that must take place before avian influenza viruses can replicate efficiently in human cells, thus acquiring the capability to cause a human pandemic. Through a computational approach, using a supervised learning algorithm and the complete H5N1 NCBI sequence database, we successfully identified all the known specificity determinants for avian to human transmissibility described in the literature. Interestingly, we also detected residues that form the known H5N1 antigenic sites as host-distinguishing positions, offering a possible immune-related mechanism for virus specificity. Our analysis also identified novel specificity determinant candidates that may help decipher the basis for human vs. avian host selectivity. A structural analysis identified amongst these novel positions, residues in which mutations can have a direct effect on the binding conformation of the HA receptor. These new findings may provide a better understanding of the species barrier of H5N1 and assist in designing antiviral agents.

Application of the method to the novel swine-origin influenza H1N1 virus, we surprisingly discovered that all identified residues were situated in and around the known H1N1 antigenic sites. Considering that many of the predicted substitutions significantly changed the physicochemical nature of amino acids in these positions, we suggest that the alterations would result in variation of the stereochemistry of the antibody binding sites, interfering with the ability of the immune system to recognize the HA protein. We suggest that the basis for immune avoidance of the novel virus possibly associates within the signature sites presented in our study, which best discriminate the new isolates of the novel human pandemic from previous ones. The computational analysis presented here is generic and can also be applied to gain insight into the molecular determinants of host discrimination, pathogenicity and transmissibility in other viral proteins and strains. A user-friendly MatLab implementation is available upon request.

January 20, 2010, 18st: The Era of Network Biology: Understanding Complex Cellular Pathways in Health and Disease

Prof. Erich E. Wanker, Neuroproteomics, Max Delbrueck Center for Molecular Medicine (MDC), Berlin-Buch, Germany

Recently, we have started to integrate data from protein interaction and gene expression studies to predict tissue-specific huntingtin (HTT) interactions that are altered in Huntington’s disease (HD) brains. Applying a method termed INFIDEX (interaction network filtering by differentially expressed genes), a disease-relevant, brain-specific PPI network was created, linking 14 potentially dysregulated proteins directly or indirectly to HTT. Analysis of literature data confirmed the predictive value of this unbiased network modeling strategy. One of the identified proteins that directly associate with HTT is the neuron-specific CRMP1 (collapsin response mediator protein 1), which we predicted to be abnormally down-regulated in HD brains. In a Drosophila model of HD, overproduction of CRMP1 efficiently suppressed polyQ-mediated HTT aggregation and photoreceptor degeneration. Moreover, motor impairment and survival phenotypes were improved by CRMP1 overexpression, indicating that this protein influences both polyQ-induced protein misfolding and neurotoxicity. The results were also confirmed in cell-free and cell-based assays.

Our studies indicate that perturbed, disease-relevant PPIs are predictable by network modeling strategies. We propose that this approach is applicable to a wider range of protein misfolding and other diseases.

December 9, 2009, 18st: In search of function of FTO, a gene strongly implicated in the obesity phenotype

Dr. Vladimir Saudek, Sr. Director Chemical Sciences, sanofi-aventis research & development, France

Drug development in the pharmaceutical industry is a chain of well-defined steps that should lead to bioactive molecules ready to enter clinical trials. The role of bioinformatics in this process will be briefly described. Bioinformatics has a role to play at any stage of the process but is essential at the initial phase of target selection and validation. The concept will be illustrated on a new emerging target. Genome wide association studies have pointed to Fat mass and obesity associated (FTO) gene as one of the most relevant non monogenic genes implicated in the obesity phenotype. The FTO association to obesity has been replicated in numerous studies on various populations suggesting that FTO might become an attractive drug target. However before embarking on a costly drug development path, the target needs to be validated. While the association is statistically robust, the physiological role of FTO emerges only slowly and the biochemical mode of function remains elusive. Bioinformatics analysis provided crucial clues for establishing experimentally that FTO gene codes for a DNA processing enzyme. Further progress can be expected from pooling all available and dispersed information. Bioinformatics can generate unexpected hypothesis but their experimental testing will remain always essential.

November 25, 2009, 18st: Visualisation — Epac's Impact on Calcium

Dr. Gregor Reither, Cell Biology and Biophysics Unit – Schultz Group European Molecular Biology Laboratory, Heidelberg

Stimulation of G protein coupled receptors activates a complex and highly dynamic network of proteins such as kinases, phosphatases, and small GTPases. In addition, the levels of several intracellular messengers, such as diacylglycerol, calcium ions, or cyclic nucleotides are altered. Cells are able to decode incoming stimuli by generating specific patterns of activity among the contributing signalling elements. Our current understanding of signal transduction is based on a huge amount of data of very different quality. There are data generated by classical biochemical in vitro assays, proteomics, genomics or observations of single living cells. Despite the amount of data we could access, our understanding of the dynamics of signal processing is quite low. One problem we have to face is theoretical understanding by means of simulations. Applied to signalling processes quantitative data like molecule concentrations, distribution of activity states, and binding constants are essential. The number of contributing signal elements makes measurements of the needed constants a challenge, especially under /in vivo/ conditions.

Therefore we introduce a tool utilised in modern research of complexity and based on a dynamical sensitivity analysis which allows to combine experimentally generated data of different quality to get a qualitative model of signal transduction processes. The mathematics we use is a new approach, providing some advantages to conventional modelling methods, being able to describe multiple dependencies and emphasising to dynamical aspects of the whole system therefore. We could make use of most of the available data sets and we are able to handle and reduce the underlying complexity without neglecting it. Our current model includes 45 different signalling elements pivoting around G protein coupled receptors. By describing each physical interaction of the signalling elements with a time independent functional relation it shows already specific characteristics of a complex system. In particular, we are able to more precisely localize the system's domains within it acts chaotically showing volatile and dramatic changes of state. Even more importantly, with this approach we can determine and test the conditions for the stability and viability of the whole system. The intrinsic 'behaviour' of our current model, means not trained or fitted to any data sets, already reflects the experimental findings. This dynamic view on a network of proteins and messengers allows to generate hypotheses and to test these immediately. Guided by this theoretical approach we performed live cell experiments to probe suppositions about new interactions. Here we present data of a so far not described effect of novel protein kinases C on Epac, a cAMP dependent guanine exchange factor, and its impact on intracellular calcium handling. Additionally our model revealed a non-linear and non-monotone relation between Epac activity levels and peak height of the elicited calcium transients.

November 11, 2009, 18st: Modern imaging concepts for advancing biological discovery

Prof. Dr. Vasilis Ntziachristos, Director of the Institute for Biological and Medical Imaging at the Technische Universität München

Fluorescence imaging is a powerful modality that is increasingly used for gene-expression profiling, probing protein function and elucidating cellular pathways. Fluorescence generated in in-vitro assays can be easily quantified using fluorometers or charge coupled devices (CCD). Similarly, fluorescence of superficial structures has been imaged in vivo using intravital, confocal or multiphoton microscopy. Quantitation and imaging of fluorescence in deeper tissues however has been more elusive.

This talk describes current progress with instruments and methods for in-vivo tomography of whole animals using MultiSpectral Opto-acoustic Tomography (MSOT). We show the capacity to resolve fluorescent objects embedded deep in mouse-like phantoms achieving sub-millimeter resolution. We further demonstrate how quantification and high molecular specificity can be achieved and that penetration depths of several centimetres are feasible. Examples of imaging enzyme up-regulation, induced apoptosis and gene-expression in intact animals are given. Limitations of the method and future directions are also discussed.

October 28, 2009, 18st: Untangling biological complexity using Time-Scale Separation of cellular processes

Dr. Hauke Busch, FRIAS, Albert-Ludwigs-Universität, Freiburg

The development of mathematical models to predict cellular behavior is currently hampered by the enormous complexity of biological systems. Building such models thus goes hand in hand with strategies to reduce biological complexity to a computationally and experimentally manageable amount. Most strategies approach this task in terms of network topology. Cellular gene and/or protein networks are modularized and the individual subnetworks, such as signaling pathways, are then investigated in detail. Here, we propose a different approach to reduce biological complexity based on time scale separation. Dynamic interaction processes within a cell can be categorized according to their characteristic time to complete: from seconds to minutes for protein signaling, to hours, days and months for gene expression kinetics and tissue growth, respectively. Focusing on cellular subsystems evolving on a particular time scale, slower processes then remain quasi static, while fast processes follow instantaneously and can be adiabatically eliminated. The decision time for mammalian cells to differentiate, migrate or proliferate is usually on the time scale of hours. Adiabatically eliminating faster processes such as protein signaling, we show how gene expression kinetics can be employed to obtain a global, holistic view on cellular decisions on this time scale. Taking HGF-induced migration of primary human keratinocytes as an example, we infer a dynamic model from time-resolved microarray data that predicts in silico the time-ordered events necessary and sufficient to start, sustain or stop cell migration. Briefly highlighting further cell-fate examples, we propose that this approach provides a new way of obtaining insight into the dynamic orchestration of diverse signaling pathways and gene expression that control cellular decisions in general.

July 16, 2009, 18st: Toolbox model of evolution of prokaryotic metabolic networks and their regulation

Prof. Dr. Sergei Maslov, Brookhaven National Laboratory, Dept. of Condensed Matter Physics and Material Science, New York

It has been reported [1] that the number of transcription factors encoded in a prokaryotic genome scales approximately quadratically with the total number of its genes. As a result in small genomes (<500 genes) the fraction of regulators among all proteins is less than 0.5 %, while in the largest genome (~10,000 genes) this fraction reaches as high as 10 %.

We proposed [2] a simple mode explaining this empirical scaling law. In our model the repertoire of proteins encoded in the genome of a prokaryotic organism is viewed as its collection of tools. Adapting to a new environmental condition (e.g. learning to use a new nutrient source) involves acquiring or evolving new genes/enzymes as well as reusing some of the tools that are already encoded in the genome. As the toolbox of an organism grows larger, it can reuse its existing enzymes more often, and thus needs to acquire fewer new enzymes to master each new functional task. From this analogy follows that, in general, the number of functional tasks an organism sould accomplish increases faster than linearly with its number of protein-coding genes. Under the assumption that the number of regulators is proportional to the number of functional tasks our model explains the quadratic scaling [1] between the number of transcription factors and the number of all genes in prokaryotic genomes. Our model also includes transcriptional regulation of metabolic enzymes, which is assumed to be tightly coordinated with their associated pathways. The distribution of length of co-regulated metabolic pathways in our model is in agreement with the empirically observed broad distribution of regulon sizes in E.coli.

[1] E. van Nimwegen, Scaling laws in the functional content of genomes. Trends in Genetics 19:479 (2003)
[2] S. Maslov, S.Krishna, and K. Sneppen, Toolbox model of evolution of prokaryotic metabolic networks and their regulation, PNAS (under review)

July 1, 2009, 18st: The use of Metabolomics in biomarker discovery and drug mechanism elucidation

Dr. Mike Milburn, Metabolon Inc. Research Triangle Park, North Carolina, USA

Metabolon`s platform technology uses a combination of chromatography and mass spectrometry to separate and identify a wide range of biochemicals and metabolites including amino acids, carbohydrates, lipids, nucleic acids and cofactors. This analysis results in a very large number of biochemicals identified in a given sample. The analytical platform incorporates two separate UHPLC/MS/MS2 methods and a GC/MS method which increases the overall coverage of small molecules in a biological sample. The resulting MS and/or MS2 data are searched against an in-house generated authentic standard library which includes retention time, molecular mass to charge (m/z), preferred adducts and in-source fragment information as well as the associated MS fragmentation spectra for all biochemicals in the library. The library enables the rapid and accurate identification of the experimentally detected small molecules based on a multi-parameter match without need for the time consuming further analysis. This global approach allows researchers to measure system-wide changes in biochemistry due to disease, drug or diet. Here we will discuss the applications of Metabolons platform in the areas of biomarker discovery, drug mechanism of action and early indicators in toxicity.

June 17, 2009, 18st: Merging Chemical and Biological Space

Prof. Dr. Gerhard Klebe, Philipps University of Marburg, AG Wirkstoffdesign

Structure-based drug design tries to mutually map pharmacological space populated by putative target proteins onto chemical space comprising possible small molecule drug candidates. Both spaces are connected where proteins and ligands recognize each other: in the binding pockets. Therefore it is highly relevant to study the properties of the space composed by binding cavities. Analysis of this space helps to predict possible biochemical functions of novel proteins independent from sequence- or fold homologies. It can be used to cluster proteins of a particular family with respect to selectivity determining features. Furthermore, it can be exploited in structure-based design to propose novel leads as a starting point for a design project.

June 3, 2009, 18st: Computational interpretation of (meta-)genomes

Thomas Rattei, Department of Genome Oriented Bioinformatics, Technische Universität München

The sequencing of many eukaryotic, bacterial, archaeal and viral genomes during the last decade has revolutionized our understanding of biology, ecology and evolution. The emerging technique of metagenomics allows studying the genomes of living communities directly from their natural environments, avoiding the need for isolation and cultivation of individual species.

The exponentially growing number of genomic and metagenomic sequences demanded the implementation of fully automated software systems for genome annotation, as the recently re-engineered “mips PEDANT” system. Due to very high computational costs, the calculation of sequence similarities and detection of protein domains limits the processing and update performance in genome annotation. Therefore we have developed the Similarity Matrix of Proteins (SIMAP) database, providing a comprehensive and up-to-date dataset of a pre-calculated sequence similarity matrix and sequence-based features like InterPro domains for all proteins contained in the major public sequence databases. This includes all 129 metagenomes currently deposited in NCBI Genbank. As of May 2009, SIMAP covers ~43 million protein entries and provides a complete annotation based on InterPro 19. In order to structure the protein sequence space, SIMAP provides an integrated clustering based on sequence similarities and domain architectures. Mapping Gene Ontology Annotations (GOA) of known proteins to these clusters provides reasonable protein function predictions for large parts of the protein sequence space. Besides PEDANT, SIMAP accelerates several other bioinformatics resources, as PFAM, Gene3D, MEGAN and Blast2GO, and is publicly accessible.

In addition to large-scale computational approaches as PEDANT and SIMAP, the computational interpretation of genomes has to keep up with the emerging knowledge about the molecular basis of specific biological processes. Our recent development of sequence based prediction methods of Type III and Type IV secreted effector proteins has addressed two key mechanisms for infection, pathogenesis and modulation of infected hosts by pathogenic bacteria. Although a number of expensive genome wide screens for novel effector proteins have been performed, no computational model had been published for the general de novo prediction of Type III and Type IV secreted proteins. Based on comprehensive and manually curated databases of known effectors, we discovered sequence signals that are typical for these proteins. Modeling of the signals using a machine learning approach enabled genome-wide predictions of Type III and Type IV secretomes in many bacterial genomes and will contribute to the development of novel, specific antibiotics.

May 20, 2009, 18st: Understanding Protein Networks by Analyzing and Visualizing Interacting Protein Regions

Dr. Mario Albrecht, Max Planck Institute for Informatics, Saarbrücken

Proteins are at the center of life's processes. Sophisticated bioinformatics methods are required to integrate, analyze, and visualize the rapidly increasing amounts of molecular data for proteins and their interactions. In this talk, I will present our recent research advances, particularly on interacting protein regions for understanding protein networks and complexes.

April 29, 2009, 18st: Meta-servers at GeneSilico.PL: Predicting structures and their quality

Prof. Dr. Janusz M. Bujnicki, International Institute of Molecular and Cell Biology and Adam Mickiewicz University, Poland

I will review the bioinformatics methods developed at the IIMCB and UAM and available via the genesilico.pl server. In particular, I will focus on meta-servers for prediction of protein structure, disorder, and evaluation of protein structure models. I will also briefly review our recent developments in the area of RNA 2D and 3D structure prediction and software for analyzing contact maps in proteins, nucleic acids and their complexes. I will introduce the GeneSilico TOOLKIT (available at https://genesilico.pl/toolkit/), a suite of tools for protein structural bioinformatics, available as an integrated web server. TOOLKIT provides a 'one stop shop' for researchers interested in predicting protein tertiary structure from sequence and in accurate estimation to which extent the structural models can be trusted.

January 28, 2009, 18st: Quantitative genetics and systems biology of metabolomic traits of insulin resistance and ageing

Dr. Marc-Emmanuel Dumas, University of Lyon

The study of human multifactorial diseases like insulin resistance, or complex biological processes such as ageing, represent a real healthcare challenge for the western and developing world. In this regard, high-throughput "-omics" biotechnologies like genomics, transcriptomics and metabolomics are invaluable tools for investigating insulin resistance-related (type 2 diabetes, obesity, non-alcoholic fatty liver diseases) and ageing-related pathologies. Integration of metabolic profiles with genome-wide genotyping2, 3 and expression profiling data provides a platform to identify biomarkers and susceptibility genes for pathological components of the cardio-metabolic syndrome± (glucose intolerance, insulin resistance, dyslipidemia, hypertension, obesity) through the combined use of physiological methods, functional genomic technologies and bioinformatic tools applied to the study of rodent models of the human disease.

January 14, 2009, 18st: Modelling Downstream Effects of Signalling Pathway Deregulation

Prof. Dr. Rainer Spang, Institute for Functional Genomics, University of Regensburg

Functional genomics has a long tradition of inferring the inner working of a cell through analysis of its response to various perturbations. Observing cellular features after knocking out or silencing a gene reveals which genes are essential for an organism or for a particular pathway. A key obstacle to inferring genetic networks from perturbation screens is that phenotypic profiles generally offer only indirect information on how genes interact. I will discuss an algorithm to infer pathway features based on differential gene expression in silencing assays. In this approach, I distinguish two kinds of genes: the candidate pathway genes, which are silenced by RNAi, and the genes, which show effects of such interventions in expression profiles. I call the first S-genes (S for "silenced" or "signalling") and the second E-genes (E for "effects"). Because large parts of signalling pathways are non-transcriptional, there will be little or no overlap between S-genes and E-genes. Elucidating relationships between S-genes is the focus of our analysis; the E-genes are only needed as reporters for signal flow in the pathway. E-genes can be considered as transcriptional phenotypes. S-genes have to be chosen depending on the specific question and pathway of interest. E-genes are identified by comparing measurements of the stimulated and non-stimulated pathway; genes with a high expression change are taken as E-genes. Our approach models how interventions interrupt the information flow through the pathway. Thus, S-genes are silenced while the pathway is stimulated to see which E-genes are still reached by the signal. I will show the applicability of our methodology for two real world datasets, an RNAi study of immune response in Drosophila melanogaster and a study on BCR signalling in immature B-cells in mice.

December 17, 2008, 18st: Molecular Biology of Mental Disorders and Systems Biology - basics and perspectives

Prof. Felix Tretter, Dept. Psychology, LMU, Isar Amper Clinic.

Several percent of a population are affected by mental disorders: 1 % has schizophrenia, 2 % are alcohol addicted and about 10 % have depression. Family studies, twin studies and adoption studies showed that there is a strong genetic component of the causes of mental disorders -- e.g. regarding schizophrenia a 50 to 70 % concordance rate was observed in monozygotic twins, whereas a more than two times higher prevalence then expected was observed in adopted children of schizophrenic parents. However, still a certain percentage of schizophrenia is determined by environmental factors. "Soft" diagnostic criteria, rating scales etc. result in different rates of prevalence.* *New biological methods -- imaging, electro-physiology etc. -- are completing diagnostic psychiatric procedures. They also have provided new insight into brain correlates of symptoms of mental disorders. Very recently molecular biological data sets were generated by genomics, proteomics etc. The high complexity of these data (high throughput data) cannot be understood and the temporal pattern of on and off genes or the presence of proteins etc. cannot be detected. Additionally, the question is what is the appropriate level of analysis of networks of neurons, their coupling (synapses), the morphology of dendrites (spines), receptors, re uptake mechanisms etc? Therefore, not only computational methods but also systems thinking must be developed in order to retain a comprehensive understanding of mental disorders. Some research strategies are presented.

December 10, 2008, 18st: Transcriptional feedback regulation in mammalian signal transduction

Dr. Nils Blüthgen, Humboldt University Berlin

Signalling pathways in mammalian cells relay information about the cellular state and the cell's surrounding to nuclear transcription factors that control the expression of genes. In this talk I will illustrate how we used a combination of bioinformatics and mathematical modelling to generate new hypothesis for the feedback design of a particular signalling pathway, the so-called classical MAP-kinase cascade. We predicted that this pathway is controlled by transcriptional feedbacks, and that the proteins involved in the feedback need to have a short half-live. We confirmed this prediction experimentally. We then investigated by mining expression panels and literature data whether this feedback design is generic. We found that all major signalling pathways are controlled in a similar manner: through the induction of short-lived negative feedback regulators. The talk will conclude with results from mathematical models showing the functional relevance of transcriptional negative feedback regulation in swift and reliable expression of target genes.

November 19, 2008, 18st: Impact of genetic variability on regulatory pathways

Dr. Andreas Beyer, Biotechnologisches Zentrum der TU Dresden

Analysis of expression quantitative trait loci (eQTLs) is an emerging technique in which individuals are genotyped across a panel of genetic markers and, simultaneously, phenotyped using DNA microarrays. The eQTL technology is potentially very powerful, as it allows searching for transcriptional regulators for all genes probed on a microarray. However, a range of experimental and statistical problems compromise the interpretation of the results. Existing eQTL detection techniques search for only one correlated marker at a time. We propose models explicitly taking into account combinatorial effects of several regulators. Because of the spacing of markers and linkage disequilibrium, each marker may be near many genes making it difficult to finely map which of these genes are the causal factors responsible for the observed changes in the downstream expression. We present an efficient method for prioritizing candidate genes at a locus. This approach, called 'eQTL electrical diagrams' (eQED), integrates eQTLs with protein interaction networks by modeling the two data sets as a wiring diagram of current sources and resistors.

November 5, 2008, 18st: Genes, Environment and Disease

Prof. Dr. Thomas Meitinger, Institute for Human Genetics, Helmholtz Zentrum, Munich

Genetic variation can cause susceptibility to common diseases. Given constant environmental conditions, the influence of genetic factors can be examined. The ideal situation for measuring the impact of environmental factors on disease phenotypes are constant genetic factors. In both cases, it is necessary to assign quantitative parameters for genetic factors, environmental factors, and disease phenotypes.

The disease phenotype can be classified according to qualitative and quantitative attributes and their progression over time. Environmental factors are more difficult to measure experimentally. Genetic factors are most easily accessible through measures of genomic and transcriptomic variation. Technical progress has allowed genome-wide studies to be performed in large numbers with allows to address both genetic heterogeneity and the small effect sizes of the individual genetic factors. In the presentation, I will give examples from metabolic and neurodegenerative diseases for study designs aimed to achieve a better understanding of the interplay between genes, environment and disease etiology and for strategies to improve our ability for risk prediction of both genetic and environmental factors.

October 22, 2008, 18st: Computional prediction of proteotypic peptides for quantitative proteomics

Prof. Dr. Bernhard Küster, Chair of Bioanalytics, Technische Universität München, Freising

Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Although such analyses typically assume that a protein's peptide fragments are observed with equal likelihood, only a few so-called "proteotypic" peptides are repeatedly and consistently identified for any given protein present in a mixture. Using >600,000 peptide identifications generated by four proteomic platforms, we empirically identified >16,000 proteotypic peptides for 4,030 distinct yeast proteins. Characteristic physicochemical properties of these peptides were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism, for a given platform, with >85% cumulative accuracy. Possible applications of proteotypic peptides include validation of protein identifications, absolute quantification of proteins, annotation of coding sequences in genomes, and characterization of the physical principles governing key elements of mass spectrometric workflows (e.g., digestion, chromatography, ionization and fragmentation).

July 9, 2008, 18st: Systems Biology and its Role in Predictive Health and Personalized Medicine

Prof. Dr. Eberhard O. Voit, Director, Integrative BioSystems Institute, Atlanta, Georgia

Recent advances in the natural and computational sciences have made it possible to study health and disease with a new and powerful arsenal of molecular and computational tools. The enormous potential of these tools has focused the spotlight on the possibilities of predicting health processes and personalizing treatment. Bioinformaticians have admirably managed the flood of biomedical data and made methods available for information mining and data driven analysis. However, data and their management alone will be insufficient for a comprehensive understanding of how cells or organisms function and which molecular or physiological changes might lead from health to disease. The reasons for the remaining gap in understanding are manifold but are ultimately associated with the enormous complexity of cells and organisms. A promising approach toward closing the gap in understanding is the construction of integrative mathematical and computational models. With advances in hardware and software as well as in systems analytical methods, these models are on the brink of becoming sufficiently accurate and comprehensive to yield deep insights into specific disease processes. While the current models have not yet reached this point, their potential and utility are clearly visible on the horizon. Under this premise, the presentation will suggest how models can aid our thinking about health and disease beyond the observation that one state is "normal" and the other is somehow abnormal and therefore diseased. Specifically, I will define health and disease simplexes, which represent the multi-factorial nature of health and disease and permit the characterization of interpersonal variability, genetic predisposition, life style choices, risk assessments, the investigation of reversible and irreversible trajectories from health to disease and back, and for assessments of alternative treatment strategies. I will formalize these concepts mathematically in the language of Biochemical Systems Theory, which has been used in the past for a variety of analyses of biomedical systems and also provided the first objective rationale for Cox's proportional hazard model and the linear-logistic disease risk model of epidemiology.

June 25, 2008, 18st: Calculation of Protein Structure and Dynamics from NMR Chemical Shifts

Prof. Dr. David Wishart, Depart. Of Computing Science and Biological Sciences, University of Edmonton, AB, Canada

Chemical shifts are the "mileposts" of NMR spectroscopy. Not only are they important as spectral markers, but their dependency on multiple electronic and geometric factors means that chemical shifts can potentially provide a rich source of structural and dynamic information. However, these multiple dependencies also make the interpretation of chemical shifts exceedingly difficult -- particularly for large molecules such as proteins. In this presentation Prof. Wishart will describe a number of developments that have taken place in his laboratory over the past two years that allow chemical shifts to be more fully interpreted and utilized in the direct calculation of protein structure and dynamics. In particular the presentation will focus on the methods and algorithms associated with two programs:

  1. RCI and
  2. CS23D.

The RCI program uses chemical shifts to accurately calculate backbone flexibility and dynamics while the CS23D program uses chemical shifts to accurately calculate protein structures. Some of the strengths and limitations of these programs as well as their potential impact on the future of NMR and structural biology will be discussed.

June 11, 2008, 18st: Sequence Harmony and Multi-RELIEF: two methods for function specificity site prediction with an application to protein-protein interaction

Prof. Dr. Jaap Heringa, Centre for Integrative Bioinformatics, University Amsterdam, Netherlands

In this seminar I will discuss two approaches to the prediction of specificity conferring sites from multiple sequence alignments. The first, Sequence Harmony, is an entropy-based method that focuses on differences between two protein sequence groups within a multiple alignment. The second method, Multi-RELIEF, incorporates a new implementation of a feature selection technique that exploits a notion of evolutionary conservation. The performance of both methods relative to their most widely used contenders will be discussed. An application to protein-protein interaction (PPI) binding site prediction will be presented using a set of multiple sequence alignments, each time comprising a group of binding proteins and a paralogous group of non-binding protein sequences.

May 28, 2008, 18st: From molecular machines to gene-regulatory networks in mammalian

Prof. Thomas Höfer, German Cancer Research Center, Heidelberg

The complexity of regulatory networks in mammalian cells -- exemplified by the vast number of components and their dynamic interactions on a wide range of time and space scales -- requires mathematical models at different levels of organization. I will discuss experimentally-based models directed at understanding: (i) gene-regulatory networks in T-cell differentiation, and (ii) the multi-protein machinery that recognizes and repairs UV-damaged DNA. Iterative theoretical and experimental analyses of network dynamics have led us to uncover novel regulatory interactions and to identify fundamental properties for the action of chromatin-associated molecular machines.

May 7, 2008, 18st: Mapping quantitative traits in mice and plants

Dr. Richard Mott, Wellcome Trust Centre for Human Genetics, Oxford University, GB

I discuss methods for mapping quantitative trait loci (QTL), from both an experimental and statistical standpoint, in the mouse (the primary animal model), and in arabidopsis thaliana, (the main plant model).

From a statistical viewpoint both species share characteristics which make their analysis similar. The key feature is the availability of stable inbred lines which can be crossed to make experimental populations suitable for mapping. I will also compare the properties of these species with humans, in the context of whole genome association studies.

April 16, 2008, 18st: Stem cell biology at three levels

Dr. Miguel Andrade, Max Delbrück Centre for Moleculare Medicine, Berlin

We have produced a database of stem cell microarray data (StemBase) describing gene expression in more than 100 samples of stem cells and their derivatives in mouse and human. We illustrated how to use these data to study stem cell biology at three levels: cells, genetic networks, and particular genes. As part of this project we have done data analysis and generated methods for this. For example, we developed a method for the detection of marker genes in large heterogeneous collections of gene expression data and we studied gene expression during early differentiation of mouse embryonic stem cells (mESC) to discover genes important for this process.

March 5, 2008, 16st: Decoding regulatory gene networks – towards a systems biology of development

Dr. Ulrike Gaul, The Rockerfeller University, New York

Biology in general and Developmental Biology in particular are currently experiencing a paradigm shift - from studying individual genes towards analyzing the behaviour of entire gene networks. We are interested in understanding the regulation of gene expression and pattern formation, which lie at the heart of animal development, at a systems level. Over the past several years, we have developed a range of experimental and computational methodologies aimed at deciphering the 'regulatory code': Where do the cis-regulatory elements lie in the genomic sequence and how do these regulatory elements 'compute' expression? Much of our work focuses on transcription regulation, but our more recent studies on microRNA-mediated translational regulation have revealed interesting mechanistic parallels between the two processes. In my talk, I will discuss these approaches and present a novel thermodynamic model for pattern formation in the early embryo.

February 13, 2008, 18st: Biomedical information extraction, EBI's text mining infrastructure and expected changes to the publishing process

Dr. Dietrich Rebholz-Schuhmann, European Bioinformatics Institute, Hinxton

Nowadays, biomedical scientific literature is made available in electronic form right after the acceptance of the manuscript. Text processing techniques filter out contained facts and deliver them to the scientists. Electronic biomedical data resources (e.g., scientific databases, ontologies) form an important link between the scientific literature and the bioinformatics infrastructure to integrate the results into the workflows of the scientist.

This talk will give insights into the infrastructure of text mining solutions available from the EBI for information extraction. All presented services automatically process the documents and interlink the literature with bioinformatics data resources. In addition, they can be integrated into external IT solutions to directly couple experimental results with annotations from the scientific literature. The outlook is concerned with the integration of ontological resources and the changes to the publishing processes. Note: The Rebholz group is offering positions for an internship at the EBI from Spring onwards.

January 23, 2008, 18st: Dynamic Modelling of Cellular Stress Response

Prof. Dr. Edda Klipp, HU Berlin and MPI Molecular Genetics, Berlin

Life is change. In order to study and understand life, it is necessary, but not sufficient to study genes, proteins or metabolites, and networks thereof in static conditions. Instead, we must handle the dynamic action. Stress and external perturbations are means to study the wiring of biochemical networks or signal transduction and to understand the underlying regulatory principles.

Over the last years, we have studied various signal transduction and regulatory pathways in different organisms and investigated the response of cells to external perturbations on various levels. To this end, we have established mathematical models, mainly in form of ordinary differential equation systems. Their structure and parameters are based on publicly available information and a new dynamic data measured by our experimental collaborators.

Here, I will focus on results with respect to a model organism, the yeast Saccharomyces cerevisiae. I will discuss new aspects in cell cycle regulation and the interaction of stress-activated signalling pathways with cell cycle progression. The results indicate that yeast cells have developed different mechanisms for coping with external stress during different periods of their life time.

January 9, 2008, 18st: Inferring genomic footprints of adaptation from SNP data

Prof. Dr. Wolfgang Stephan, Ludwig-Maximilians-Universität, München

Drosophila melanogaster consists of ancestral population(s) in Africa and derived populations outside Africa. The colonization of geographical regions outside Africa occurred relatively recently (for instance, in Europe about 10,000 years ago). Using the hitchhiking effect, we are studying the adaptive process associated with this “out-of-Africa” migration. That is, we seek to identify regions along the genome in which nucleotide variation is reduced through the recent occurrence of beneficial substitutions (“selective sweeps”). We have performed scans of single nucleotide polymorphism (SNP) of D. melanogaster populations from Africa and Europe, using the X and the third chromosome. This allowed us to infer the recent demographic history of this species (including the ratio of males and females). Furthermore, we performed a genome-wide scan of gene expression variation. Based on this information, we identified genomic regions that have been the target of positive selection in the adaptation of D. melanogaster and estimated the frequency of beneficial substitutions using maximum likelihood approaches.

December 19, 2007, 18st: Molecular interactions of the GPCR Rhodopsin - From the proteome inventory towards analysis of functional protein networks

Dr. Marius Ueffing, Institute of Human Genetics, GSF, Neuherberg

In spite of the abundance of protein interaction data information on physiologically meaningful interactomes is limited. How do protein complexes constitute? How are mammlian proteomes wired? Proteomic as well as genomic experimental techniques, each with their own inherent experimental errors, have evolved to address this, yet additional functional as well as structural criteria must be considered to delineate a trustworthy protein interaction network from experimental data.

A main objective of our efforts has been to develop adequate, yet generally applicable methods for analysis of membrane-associated proteins and subsequently apply them towards a systematic proteome analysis of specific membrane-rich functional sub-proteomes.

We have applied and newly developed methods for the fractionation of cells, membrane rich organelles organellar sub-fractions and large protein complexes. We aimed at generating protein inventories, as well as gaining information on composition and dynamics of multiprotein complexes and whole, intact organelles. Based on this work, we have attained a systematic description of the protein repertoire of light receptive membranes from mammalian retinal photoreceptors. The outcome of this analysis can be used for an improved understanding of timing and tuning of photoreceptor function and cell anatomy as well as contributing to the understanding of molecular perturbations that lead to visual defects or blinding diseases.

December 5, 2007, 18st: Translational Control: Non-Coding RNAs and Alternative Splicing

Prof. Dr. Rolf Backofen, Lehrstuhl für Bioinformatik, University Freiburg

In the past years, the common view of RNA has evolved from a more or less boring intermediate in protein translation to a very important player in cell regulation. The changed role was primarily the result of the detection of thousands of non-coding RNAs with many different regulatory functions in post-translational regulation.

The detection of new functional RNAs requires new comparative methods for motif detection since the RNA sequence is much less conserved than the RNA structure. Hence, purely sequence-based methods for finding RNA-motifs such as multiple sequence alignment will fail, and we will discuss various approaches (e.g. MARNA, LocaRNA) to sequence-structure alignment developed in our group.

Another important aspect of translation control is alternative splicing. In the second part of the talk, we will concentrate on modulation of alternative splicing and discuss the importance of RNA structure in this regulation.

November 21, 2007, 18st: Systematic refinement of a global metabolic model of Acinetobacter baylyi using gene essentialities

Vincent Schachter, Genoscope (CEA) CNRS-Genoscope-Université d'Evry, Evry, France

Gene essentiality screens can be used to significantly upgrade our knowledge on the metabolism of a given species. Genome-scale metabolic models can predict reactions essentiality by analyzing the capabilities of the underlying reaction network in a simulated environment: the relationship between these two types of essentialities is encoded in a genereaction correspondence. Previous work has shown that the identification of inconsistencies between experimental and predicted phenotypes can guide the expert search for model corrections. We introduce here a method, AutoGPR, that automatically corrects the genereaction correspondence in global metabolic models, by reasoning on a suitable model representation together with the set of experimental facts. This refinement strategy was applied to an initial metabolic model reconstruction of Acinetobacter baylyi ADP1, a recently sequenced soil bacterium for which a genome-wide single-gene knock-out library was phenotyped on several growth media. Two rounds of refinement yield a significant set of model corrections and new annotations, showing that large-scale genetics data can be used to rapidly and systematically obtain an accurate metabolic model of a recently sequenced bacterium.

November 7, 2007, 18st: Systematic discovery of phosphorylation networks: combining linear motifs and protein interactions

Dr. Lars Juhl Jensen, EMBL, Heidelberg

Protein kinases control cellular decision processes by phosphorylating specific substrates. Thousands of in vivo phosphorylation sites have been identified, mostly by proteomewide mapping. However, systematically matching these sites to specific kinases is presently infeasible, due to limited specificity of consensus motifs, and the influence of contextual factors, such as protein scaffolds, localization, and expression, on cellular substrate specificity. We have thus developed an approach (NetworKIN) that augments motif-based predictions with a network context of kinases and phosphoproteins. The latter is constructed based on known and predicted functional interactions from the STRING database, which integrates evidence from high-throughput experiments, automatic literature mining and curated pathway databases. The context network provides 60%– 80% of the computational capability to assign in vivo substrate specificity, and thereby yields a 2.5-fold improvement in the accuracy with which phosphorylation networks can be constructed.

October 24, 2007, 18st: New insights into the evolutionary behaviour of giant virus genomes

Dr. Hiroyuki Ogata Information Génomique et Structurale, CNRS-UPR2589, Marseille

The study of virus evolution has been neglected to the point where virus evolution most often refers to population genetics studies, such as the worldwide inspection of new polymorphisms appearing daily in avian flu viruses, than to the fundamental question of where viruses come from. Phylogenetic studies on viruses have long been considered unfeasible for two main reasons: 1) their reputed propensity to randomly acquire genetic material from their host and 2) their reputed very high sequence divergence rate. The generality of this vision (probably inspired by the study of RNA viruses), now deserves to be revisited for DNA viruses in light of the increasing amount of available genomic sequence data. I will first talk about very old origins of DNA viruses along with evidences from mimivirus genomics.

Then, I will present a couple of our recent bioinformatics studies. I will show that giant virus genomes are under strong functional constraints that are comparable to those on our own genome. I will argue against frequent genetic transfers, if any, to large DNA viruses from their hosts. I will show that the large genome sizes of these viruses are not simply explained by an increased propensity to acquire foreign genes. Finally, I will talk about a potential dichotomy of the viral gene pool, one part shared with cellular and distantly related viral genomes and the other having been confined for a long period of time in small groups of viruses.

July 18, 2007, 18st: SimulFold-a novel method for predicting conserved RNA structures (as well as a multiple-sequence alignment and an evolutionary tree)

Prof. Dr. Irmtraud Meyer, UBC Bioinformatics Centre & Department of Computer Science, Vancouver, Canada

Computational methods for predicting functional rather than thermodynamic RNA structures have recently attracted increased interest. These methods are not only indispensable for elucidating the regulatory roles of known RNA transcripts, but also for detecting RNA genes. Devising them has been notoriously difficult because a number of computational and conceptual challenges have to be overcome.

In this talk, I will introduce a novel computational framework, called SimulFold, that allows us to detect conserved RNA structures including pseudo-knots while simultaneously predicting a multiple-sequence alignment and a phylogenetic tree.

References:
I.M. Meyer, I. Miklos Simulfold: Simultaneously Inferring an RNA Structure Including Pseudo-Knots, a Multiple Sequence Alignment and an Evolutionary Tree Using a Bayesian Markov Chain Monte Carlo Framework, PLoS Computational Biology (2007), in press
I.M. Meyer: A practical guide to the art of RNA gene prediction, Briefings in Bioinformatics (2007), in press

June 27, 2007, 18st: Human evolution

Prof. Dr. Peter J. Oefner, Institute of Functional Genomics, University of Regensburg

Binary polymorphisms on the Y-chromosome and mtDNA preserve the paternal and maternal genetic histories of our species. The comparative phylogenetic analysis of 73 kb of unique Y-chromosome and 15.6 kb of mtDNA sequences in 103 globally representative males yielded fairly similar times of coalescence of the mtDNA and Y-chromosome phylogenies of 157,000±12,000 years and 115,000±22,000 years, respectively, while those of the youngest mtDNA and Y-chromosome clades containing both African and non-African sequences were 63,000±6,000 years and 56,000±11,000 years, respectively. Analysis of standardized variance revealed FST values for the Y-chromosome that were significantly higher than those for mtDNA, reflecting most likely higher female migration rate and effective population size. Interestingly, significant differences in the ratio of gains and losses of threonine and valine residues in mtDNA encoded proteins were observed between hunter-gatherers and agriculturists. Evidence for the occurrence of directional selection over evolutionary times was also found in several autosomal genes. BRCA1, for instance, shows significant adaptive selection in the RAD51 interaction domain in the human lineage only. The concurrent lack of low-frequency alleles, however, suggests the additional occurrence of balancing selection in this chromosomal region.

Progress in our understanding of human evolution drives on technological advances in genome analysis. The best example is denaturing HPLC, without which the rapid discovery of hundreds of SNPs on the Y-chromsome would not have been feasible in the ninties. Today we explore the use of DNA arrays for the resequencing of mtDNA and that of mass spectrometry for the simultaneous phasing of short tandem repeats (STRs) and tightly linked SNPs. The latter has shedded new light on the still controversial origin of the Samaritans. Estimation of genetic distances between the Samaritans and seven Jewish and three non-Jewish populations from Israel, as well as populations from Africa, Pakistan, Turkey, and Europe, revealed that the Y-chromosomes of Samaritans were closely related to those of Cohanim, supporting the Samaritans� view that they are descendants of the Ten Lost Tribes of Israel who remained in the area of the kingdom of Israel following its conquest by the Assyrians in 722�720 BCE.

June 13, 2007, 18st: Predicting novel structural features of membrane proteins

Prof. Dr. Arne Elofsson, Department of Biochemistry and Biophysics, Stockholm University

For a long time the general view was that membrane proteins in principle existed in a two-dimensional space, with the TM helices perpendicularly penetrating the membrane. But as the increasing amount of solved 3D structures shows, TM proteins are often too complex to fit into the constraints of topology, where all transmembrane segments are helices of between 15-35 residues and all loop regions in between are situated on opposite sides of the membrane. We have in a set of recent papers analyzed membrane protein structures and shown that membrane proteins certainly not can be seen as constrained into two dimensions (Granseth, 2005). Instead it is clear that membrane proteins contain a similar amount of structural complexity as globular proteins have. For instance a common feature is re-entrant regions (Viklund, 2006). This puts new challenges on the development of membrane predictors. Using these novel findings we have developed novel methods to predict 2.5D structural features of membrane proteins (Granseth, 2006).

May 23, 2007, 18st: Recent Tools for Protein and RNA

Prof. Dr. Peter Clote, Computer Science Department, Boston College, USA

Structural bioinformatics is an area of computational biology that deals with the structure, function and evolution of protein (proteomics) and RNA (RNA-omics). In this talk, we describe a number of structural bioinformatics tools recently developed in our lab. In proteomics, we describe machine learning software to determine cysteine state (disulfide-bonded, ligand bound, free), disulfide topology and transmembrane beta barrel structure. In RNA-omics, we describe a paradigm shift from minimum free energy structure computation towards a parametric viewpoint that yields information about the folding landscape, distribution of kinetic traps and the mutational process of RNA. Since it is now understood that over 90% of the genome is transcribed, and that RNA plays a number of previously unsuspected catalytic and regulatory roles, these tools may help to elucidate the function of new RNA molecules.

May 9, 2007, 18st: Structure, dynamics and molecular recognition of biomacromolecules using NMR spectroscopy

Prof. Dr. Michael Sattler, EMBL, Heidelberg, Lehrstuhl für Biomolekulare NMR Spektroskopie

NMR spectroscopy is an essential tool to characterize the structure, (conformational) dynamics and molecular recognition of biomacromolecules in solution. (Structural) Bioinformatics and computational methods can be combines with NMR analysis to provide a fast characterization of protein structure and molecular interactions. Recent developments in NMR methodology, hardware and biochemistry allow the investigation of high molecular weight protein complexes. For these systems, NMR can be combined with complementary techniques, such as x-ray crystallography and small angle scattering. Applications of our studies of protein-RNA complexes involved in the regulation of gene expression will be presented.

April 25, 2007, 18st: From Data to Network Modeling in Biostatistics

Dr. Dr. Fabian J. Theis, Max Planck Institute for Dynamics and Self-Organization, Göttingen

Explorative statistical methods are discussed for analyzing data sets ranging from neuroscience to systems biology. As extension of the commonly used second-order methods, we focus on higher-order statistics and spatiotemporal patterns. These models involve intricate optimization problems, and corresponding approximative algorithms are devised. After modeling the subspaces in a statistical fashion, graph-theoretic techniques are applied to understand the substructure. We illustrate the resulting information-theoretic algorithms by analyzing networks from genomics and proteomics, epidemiology and software engineering.


January 31, 2007, 18st: De Novo Protein Structure Prediction and Folding with Free Energy Models

Dr. Wolfgang Wenzel, Research Center Karlsruhe, Institute for Nanotechnology

De novo prediction of protein tertiary structure on the basis of amino acid sequence remains one of the outstanding problems in biophysical chemistry. According to the thermodynamic hypothesis, the native conformation of a protein can be predicted as the global optimum of its free energy surface with stochastic optimization methods orders of magnitude faster than by direct simulation of the folding process.

We have developed an all-atom free energy forcefield PFF01/02 [1] with stabilizes a wide array of proteins. With efficient stochastic optimization methods we ware able do predictively and reproducibly fold a variety of proteins containing both alpha-helices and beta-sheets from random starting conformations: the trp-cage protein [2], the villin headpiece [3], the HIV accessory protein [4], protein A, the 60 amino acid, 4-helic bacterial ribosomal protein L20[5] and several beta-sheet peptides (14-28 amino acids)[6] and zinc-finger motifs[7].

We used several stochastic optimization methods: the stochastic tunnelling method, an adapted version of parallel tempering, basin hopping techniques and distributed evolutionary optimization strategies. We will discuss advantages and limitations with respect to further improvements of this approach to in-silico all-atom protein structure prediction.

We have also extended our approach to larger proteins by combining our free energy model with heuristic techniques that generate large libraries of protein conformations on the basis of the amino acid sequence. When we ranked ROSETTA decoy sets for 30 different proteins according to their energy in our model, we find that near-native conformations are selected for all high-quality decoy-sets (see figures for an example). For low-quality decoy sets, the approach generates usable low-resolution models in over 80 % of the cases, but still has difficulty treating disulfide-bridged proteins, protein-protein complexes and proteins which are stabilized only in complex with other molecules.


January 24, 2007, 18st: Combining Sequence Information with T-Coffee

Dr. Cedric Notredame, CNRS, Marseille

Well integrated biological data lends itself to the identification of biologically meaningful patterns. Multiple Sequence Alignments constitute one of the most powerful ways of carrying out such a task. In this context, the integration takes the form of simultaneously aligning related sequences in order to reveal evolutionary conserved patterns. Multiple Sequence Alignments have so many applications that they have become household items in biology and few data processing pipelines exist that do not require the assembly of an alignment. Yet, the wealth of available alternative methods means that the user is not only faced with the problem of selecting and aligning sequences, but also with the necessity of choosing one method or integrating the results delivered by many. In the course of this seminar I will discuss how various methods can be integrated into one. I will also go further and show that a multiple sequence alignments can be used to integrate much more than sequence information, as long as this information is properly mapped onto the sequences. This concept, named template-based multiple sequence alignment will be illustrated with a simple example: the combination of sequences and structures within multiple sequence alignments. I will finally discuss how multiple sequence alignment methods are currently validated and why I believe we need to challenge these procedures in order to take further our understanding of biological sequences. Most of the tools discussed in this talk are available from www.tcoffee.org.

January 10, 2007, 18st: Rationalizing Transcriptional Regulation: From Binding Strength to Combinatorial Control

Dr. Thomas Manke, Max Planck Institute for Molecular Genetics, Berlin

Attempts to rationalize gene expression data through regulatory sequence analysis are plagued by the abundance of potential transcription factor binding sites with very different specificities. In this talk I will describe our work on evolutionary sequence conservation and combinatorial control as guiding principles that help to refine the bioinformatic search for functional binding sites. In the second part I will present a biophysical model which predicts the relative binding strengths of transcription factors to a given sequence region. I will show that this approach allows to rationalize experimental ChIP-chip data more accurately than traditional hit-based methods.

December 20, 2006, 18st: Function of Animal MicroRNAs

Prof. Dr. Nikolaus Rajewsky, Max-Delbrück-Centrum, Berlin

MicroRNAs are a large class of recently discovered transacting factors that regulate protein-production of mRNAs. I will summarize our work on the identification and characterization of microRNA targets in metazoans. Specific examples regarding the role of microRNAs in metabolism and the immune system will be presented. Finally I will show that genotyped human single nucleotide polymorphisms (SNP) and population genetics techniques can be used to infer the functionality of both conserved and non-conserved microRNA targets. This approach has the potential to be extended to other types of cis-regulatory sites such as transcription factor binding sites.

December 6, 2006, 18st: Apoptosis Signaling: Mathematical Modeling and Analysis as a Bistable System

Dipl.-Biol. Thomas Eißing, Inst. f. Systems Theory and Automatic Control, Univ. of Stuttgart

Biological signal transduction is essential to coordinate behavior. One interesting aspect is that graded input signals can be converted to all-or-non output signals, i.e. certain signaling pathways show a bistable behavior allowing for switching phenomena or memory. We explore simple biochemical mechanisms to generate such a bistable behavior and study one model in more detail. This model represents the core reaction network of an apoptotic pathway. Apoptosis is a form of programmed cell death present in every cell. The program is essential to remove cells that are old, infected or potentially dangerous. Misregulation is implicated in severe pathological alterations.

Using bifurcation studies, we can show minimal requirements for a bistable behavior of the apoptosis model. Combining this information with reported kinetic values, we can further show that the biological data available is not consistent and that the pathway requires additional regulation. We propose an accordingly extended model, which is now supported by recent experimental findings.

The critical role of apoptosis in an uncertain environment with diverse external influences requires a robust performance of the pathway to allow proper biological function. We investigate the robustness of the bistable performance with respect to parameter changes and noise comparing the apoptosis models and the additional biochemical mechanisms introduced.

The insights of the model analysis allow reconciliation of different kinetics observed during apoptosis on the single cell and population level both in terms of understanding and modeling. Several findings are not restricted to apoptosis but conceptually relevant to bistable cell singling in general.

November 22, 2006, 18st: Modularity and networks in systems biology

Dr. Thomas Wilhelm, FLI Jena

Modular organization is a ubiquitous phenomenon in nature. Subcellular systems accomplish complex tasks with comparably simple elements by combinatorial use of different modules. I present our recently published DASS algorithm [1] to explore the combinatorial complexity of subcellular systems. Different applications [2,3] are shortly discussed.

Cellular functioning largely depends on the complex dynamic interplay of different cellular networks, such as protein interaction networks (overlap with signal transduction networks), transcriptional regulatory networks, and metabolic networks. These networks are analysed to understand their modular organization (implies some functional annotation), node importance (e.g. gene essentiality) and evolutionary conservation (genes, metabolites), and features underlying the robustness of different systems. I present a new information theoretic method for network descriptions. Moreover, issues of network complexity and robustness are discussed.

October 25, 2006, 18st: New Methods for Protein Homology Detection

Dr. Johannes Soeding, Max Planck Institute for Developmental Biology, Department Protein Evolution, Tübingen

The seminar will introduce pairwise comparison of hidden Markov models (HMMs) as a generalization of popular methods for sequence similarity search such as PSI-BLAST, HMMER, or FFAS. We present several applications in protein fold recognition, structure modeling and function prediction, as well as preliminary results from the CASP7 benchmark. Several methodological extensions of HMM-HMM comparison that we are working on will be discussed: (1) a very sensitive method for de-novo repeat detection that is able to detect the sequence signature of structural repeats in proteins that have as yet not been known to possess internal sequence symmetry (such as TIM barrels or outer membrane beta barrels); (2) a method for exhaustive transitive profile search that can discover whole protein superfamilies starting from a single sequence; (3) a network method for remote homology detection that makes use of the transitivity of homology and which, in contrast to exisiting network methods, uses rigorous statistics to take into account the degree of redundancy of the information from different network paths.

July 19, 2006, 18st: Exploring the Unseen Majority on Planet Earth: Metagenomics and Functional Analyses of Uncultured Bacteria

Prof. Dr. Michael Wagner, Department of Microbial Ecology, University of Vienna

There are more bacterial cells in a few grams of soil than there are human beings on planet Earth. In every flower pot tens of thousands of bacterial species co-occur. A human body consists of more bacterial than human cells and several thousand different bacterial species colonize our body and influence our immune system. In contrast, to date only about 5000 bacterial species have been validly described by microbiologists, because the vast majority of bacteria, including those responsible for the functioning of all ecosystems, cannot be cultivated in the laboratory.

During the last decade an array of molecular methods were developed which now allow microbial ecologists to identify and characterize bacterial cells in the environment, independent of their culturability. Most importantly, it is now even possible to retrieve entire genome sequences from uncultured bacteria which thrive in complex microbial communities containing hundreds to thousands of different bacterial genomes. This so-called metagenomics or community genomics approach holds enormous potential for ecology, biotechnology and medicine but also poses dramatic new challenges for bioinformaticians and microbiologists alike.

July 5, 2006, 18st: Chemical Biology and Chemogenomics in Drug Discovery

Prof. Dr. Hugo Kubinyi, Weisenheim am Sand

Chemical biology, chemical genetics, and chemogenomics are recent strategies in drug discovery. Although definitions in literature are somehow diffuse and not consistent, a differentiation of the terms shall be attempted here: Chemical biology may be defined as the study of biological systems, e.g. whole cells, under the influence of chemical libraries. If a new phenotype is discovered by the action of a certain substance, the next step is the identification of the responsible target. Chemical genetics is the dedicated study of protein function, e.g. signaling chains, under the influence of ligands which bind to certain proteins or interfere with protein-protein interaction; sometimes orthogonal ligand-protein pairs are generated to achieve selectivity for a certain protein. Chemogenomics defines, in principle, the screening of the chemical universe, i.e. all possible chemical compounds, against the target universe, i.e. all proteins and other potential drug targets. Whereas this task can never be achieved, due to the almost infinite size of the chemical universe, the systematic screening of libraries of congeneric compounds against members of a target family offers unprecedented chances in the search for compounds with significant target or subtype specificity. The presentation will focus on chemogenomics applications in the search for active and highly selective ligands within families of proteases, GPCRs, nuclear receptors, transporters, and kinases.

June 21, 2006, 18st: Integrated Molecular Network for System Analysis of Cellular Processes

Prof. An-Ping Zeng, Bioprocess and Biochemical Engineering, Technical University of Hamburg-Harburg

Recent advances in genome sequencing and functional genomics make it now possible to reconstruct large-scale biological networks at different molecular levels (e.g. metabolic, regulatory and protein-protein interaction networks). Studies of these genome scale networks have revealed several intrinsic structural and functional properties of biological processes. In this presentation, a brief overview will be first given on the reconstruction and structural analysis (especially network decomposition and modular analysis) of metabolic and regulatory networks from genomic and functional genomic data. Some of our network-based studies of industrially relevant organisms (e.g. E. coli and Bacillus megaterium) will be briefly introduced. It will then be shown in more detail that an integrated analysis of metabolic and regulatory networks is particularly important for understanding both cellular metabolism and its dynamic regulation (e.g. feedback regulation). The need and possibility for a more quantitative understanding of regulatory mechanisms at network level will also be demonstrated with the example of cell cycle of yeast.

June 7, 2006, 18st: Towards an Understanding of Genome Evolution

Dr. David Liberles, Department of Molecular Biology, University of Wyoming

A systematic characterization of Chordate genes has been undertaken. Genes were grouped into gene families, with multiple sequence alignments and phylogenetic trees calculated. The ratio of nonsynonymous to synonymous nucleotide substitution rates was calculated for each branch of every phylogenetic tree and mapped onto the NCBI taxonomy to present a picture of rapidly evolving genes and pathways along each branch of the tree of life. One candidate gene for phenotypic adaptation, myostatin in Artiodactyls, was studied in more detail and that analysis will also be presented.

Gene duplication appears to play a major role in dictating the evolution of novelty in Chordate genomes. Using lattice models with a binding function, an evolutionary model was established to characterize protein functional evolution after duplication and how positive selection via neofunctionalization as well as drift via subfunctionalization can drive the retention of duplicate copies and how this changes the functionality of the encoded lattice.

May 17, 2006, 18st: From Microarray Data to Gene Networks in Yeast

Dr. Alvis Brazma, European Bioinformatics Institute, Hinxton, UK

We use public datasets for yeast mutants and array based chromatin immunoprecipitation experiments. We show that the network derived from these data can be used to predict gene functions. We discuss approaches to modelling gene regulation networks, which can be categorized, according to increasing detail, as network parts lists, network topology models, network control logic models, or dynamic models. We discuss the current state of the art for each of these approaches. We study relationship between different topology models. We introduce a new simple way of describing dynamic models. We explore the gap between the parts list and topology models on one hand, and network logic and dynamic models on the other hand. The first two classes of models have reached a genome-wide scale, while for the other model classes high throughput technologies are yet to make a major impact.

References

  1. T. Schlitt and A. Brazma. Modelling gene networks at different organisational levels, FEBS Letters 579 (2005) 1859-1866
  2. Thomas Schlitt and Alvis Brazma. (2006). Modelling in molecular biology: describing transcription regulatory networks. Philosophical Transactions Royal Society B, 361 (1467), 483-494

May 3, 2006, 18st: Protein Movement and Flexibility: Normal Mode Analysis in the Era of Structural Genomics

Dr. Karsten Suhre, CNRS, Information Génomique & Structurale, Marseille

Today it is clear that most proteins undergo more or less important conformational changes to perform their individual tasks in the cell. Even globular proteins have been shown to be far from static rigid bodies. Flexible movement of the entire protein, or of some sub-domain, may facilitate for example catalytic activities, promote protein-protein interaction, or allow for substrate recognition by induced fitting. Most interesting in this context is the discovery that functional protein movements can often be well described by only a few, sometimes only one or two, low-frequency normal modes. Moreover, a simple one-parameter elastic network model most often suffices to model these modes accurately, which allows the application of normal mode analysis (NMA) to large protein complexes, such as the ribosome or viral particles.

After a brief introduction to NMA, I will present application examples that are based on this inherent nature of the lowest frequency modes to precisely model functional movements of a protein, that is, I will show that NMA perturbed protein models can be used to break difficult molecular replacement problems in X-ray crystallography. I will further describe a software tool for flexible fitting of high resolution X-ray models into 3D reconstructions from cryo-electron microscopy. Finally, I will present the elNémo web server (www.igs.cnrs-mrs.fr/elnemo/) that provides a fast and easy-to-use interface to online normal mode analysis.

February 8, 2006, 18st: Dimension reduction, feature selection and visualization - Analysis of gene expression and metabolite data

Prof. Dr. Joachim Selbig, Institute of Biochemistry and Biology, University Potsdam

Modern experimental methods enable the production of large data sets describing molecular cell components and their activities. The analysis and interpretation of these data requires bioinformatics methods by means of which patterns and relationships are discovered and correlated to existing knowledge about metabolic and regulatory networks. The availability of information about the totality of genetic materials (genome), transcribed genes (transcriptome), translated proteins (proteome) and participating metabolites (metabolome) promises new chances of understanding the reaction of organisms to environmental changes on one hand but this also means new requirements to data processing and modeling on the other hand. In the talk we will discuss issues like dimension reduction, feature selection and visualization related to the integrated analysis of gene expression and metabolite data.

January 25, 2006, 18st: How to program Molecular Computers

Prof. Dr. Joost N. Kok, Leiden Institute of Advanced Computer Science, Leiden University

Biomolecular computing has emerged as an interdisciplinary field that draws together chemistry, computer science, mathematics, molecular biology, and physics. Molecular computation has many advantages, including small size, a biological interface and massive parallelism.

During our presentation

  • we will give an overview of the different techniques that are used in DNA computing,
  • we will discuss in detail a number of algorithmic problems that have been attacked using DNA computing,
  • we will provide a classification of the five main approaches in DNA computing,
  • we will discuss the experimental challenges and experimental progress,
  • we will look at the implementation of evolutionary computation using DNA.

Finally, we also provide an outlook to the future.

January 11, 2006, 18st: Automatic detection of genome rearrangements

Prof. Dr. Enno Ohlebusch, Fakultät für Informatik, Universität Ulm

During evolution, genomes are subject to genome rearrangements that alter the ordering and orientation of genes on the chromosomes. If a genome consists of a single chromosome (like mitochondrial, chloroplast or bacterial genomes), the biologically relevant genome rearrangements are (1) inversions, where a section of the genome is excised, reversed in orientation, and reinserted and (2) transpositions, where a section of the genome is excised and reinserted at a new position in the genome; this may or may not also involve an inversion. In order to reconstruct ancient events in the evolutionary history of organisms, one tries to find the most plausible genome rearrangement scenario between (the genomes of) two or multiple species. To achieve this goal, one must first either determine orthologous genes or syntenic regions between the genomes under consideration.

The genomes are then modeled by signed permutations of the genes or syntenic regions, where the sign indicates the orientation (the strand). Given two genomes, one wants to find an optimal sequence of genome rearrangements that transforms one genome into the other. It is well known that this problem is equivalent to the problem of optimally sorting a signed permutation into the identity permutation.

In this talk, it will be shown how to efficiently determine syntenic regions by means of (a) the data structure enhanced suffix array and (b) chaining algorithms that use techniques from Computational Geometry. Moreover, the talk outlines a new 1.5 approximation algorithm for sorting by weighted inversions and transpositions.

December 7, 2005, 18st: Regulatory genomic signals. Function and Evolution

Dr. Alexander Kel, BIOBASE, Wolfenbüttel

Regulation of fundamental molecular genetic processes such as replication, transcription, translation and processing is assisted by short DNA signals, which are located in specific genomic sites providing, so called genomic punctuation and often arranged into complex regulatory regions. A short survey of wide verity of different regulatory signals in genomes of pro- and eukaryoitic organisms will be given in the talk. The focus will be made on the discussion of the principles of structural organization and functioning of the one of the most important class of regulatory signals - cis-elements of transcription regulation. It is known that, functionally related genes involved in the same genetic, biochemical, or physiological process are often regulated coordinately by combinations of transcription factors (TFs) that bind to specifically arranged binding sites on DANN (composite elements (CEs) and modules (CMs)). Several modern computational approaches will be presented as examples of applying knowledge on structural organization of regulatory signals to identification of novel signals and their functional combinations in genome. Practical examples will be given on analyzing gene expression data with tools provided by the databases on gene regulation: TRANSFAC, TRANSCompel and TRANSPATH. Combinatoric principles of regulation will be discussed also in a broader context of the immanent evolutional plasticity of the gene regulation of multi-cellular organisms.

November 23, 2005, 18st: Spectral Methods for Clustering Proteins and Predicting Protein-Protein Interactions

Dr. Alberto Paccanaro, Molecular Biophysics and Biochemistry Department, Yale University

In the first part of my talk I'll introduce a spectral method for clustering protein sequences according to their functional similarity. I'll analyze the functional groupings defined by SCOP superfamilies by showing the distribution of inter-cluster and intra-cluster distances between pairs of sequences. I'll then describe how to use this information to learn a similarity measure for pairs of protein sequences. These pairwise similarity values are in turn used by a spectral clustering method to cluster the proteins. I'll present the results obtained by this method on a set of difficult problems and show that it can identify proteins with similar functions which are missed by other methods.

In the second part of my talk I'll present a spectral method that uses the topology of protein interaction graphs to predict protein-protein interactions. This is done by computing the diffusion distance between each pair of proteins and then inferring an interaction when such distance is below a given threshold. When applied to experimental data, this method can correctly recover a significantly large fraction of protein-protein interactions that have been missed by large-scale experiments.

November 09, 2005, 18st: Genome Structure and Dynamics

Prof. Dr. Jens Stoye, Faculty of Technology, Genome Informatics, Bielefeld University

The comparison of genomes based on their gene content and gene order is an important means to track large-scale evolutionary events and to annotate gene function. In this presentation, we will discuss several new models and algorithms for such genome comparison "at a higher level".

We will discuss the use of common intervals, i.e. intervals containing the same set of genes in multiple genomes, and give efficient algorithms to find them in permutations and in sequences. We we will also mention a special subtype of common intervals, conserved intervals, which not only act as a basis of a new whole-genome phylogenetic distance measure, but also are a key feature of genome rearrangement theories like sorting by reversals or transpositions.

October 26, 2005, 18st: Detection of Alternative Splicing Events Using Machine Learning

Dr. Gunnar Rätsch, Friedrich Miescher Laboratory of the Max Planck Society, Tübingen

Eukaryotic pre-mRNAs are spliced to form mature mRNA. Pre-mRNA alternative splicing greatly increases the complexity of gene expression. Estimates show that more than half of the human genes and at least a third of the genes of less complex organisms such as nematodes or flies are alternatively spliced. In the talk I will present some recent results on employing state-of-the-art machine learning techniques to the problem of in silico predictions of alternative splicing events.

June 29, 2005, 18st: From metabolome analysis and Bioinformatics towards systems biology

Prof. Dr. Dietmar Schomburg, Bioinformatics Centre, Institute for Biochemistry, University of Cologne

Complex biological processes and their failure cannot be understood and predicted based on current molecular biology or cell biology models that are mainly based on a qualitative description of biological functions. Being reactions of whole cellular networks these can be only understood by simulations of the cell or at least major molecular networks of the cell. Whereas the goal of the "virtual cell" or the simulation of whole organisms in the computer will not be reached within the coming years research projects today have to be designed with this "grand challenge" in mind. At the Cologne University Bioinformacs Centre (CUBIC) a number of theoretical and experimental projects are presently underway on the path towards this goal, including metabolome analysis, the development of simulation methods, and the curation of the - worldwide unique - enzyme database BRENDA. An overview of recent results and development at CUBIC will be given.

June 15, 2005, 18st: Evolution of protein structure and function: a SCOP perspective

Dr. Alexey G. Murzin, MRC Centre, Centre for Protein Engineering, Cambridge, UK

I will return to the subject of my earlier review (Murzin AG. "How far divergent evolution goes in proteins" Curr Opin Struct Biol. 8:380-387, 1998) and will present even more striking records of protein evolution that have been discovered since during the classification of new structures in the SCOP database.

May 25, 2005, 18st: When "Unstructure Determines Function"

Dr. Toby J. Gibson, Biological Sequence Analysis, EMBL, Heidelberg

The classic dogmas - DNA makes RNA makes protein; One gene - one enzyme; Structure determines function; - do not operate in sensu stricto: the latter two are overly restrictive outside the realm of intermediary metabolism. Metabolic enzymes are a minority of the proteomes of higher eukaryotes, where a rather fuzzier dogma is much more typical: one gene > multidomain protein > many discrete functional segments. Further, with recent estimates that up to 30% of animal proteomes consist of segments of intrinsically unstructured protein (IUP) that function in a natively unfolded state, the assertion "Unstructure determines function" cannot be negated in biochemistry.

There are several varieties of IUP. An increasing number of "induced fit" modules are being reported by structural biologists: these are solved in complexes - never, of course, as monomeric structures. Proteins such as Tau, that lack any native order, point to another role of IUP: as repositories of probably the most abundant category of protein functional module, the "linear motifs". These are short peptides that embody autonomous function independently of tertiary structure. They are used for regulatory interactions and many are post-translationally modified. Since linear motifs are statistically insignificant in sequence searches, making them hard to handle, they have been to some extent ignored by computational biologists. My guesstimate is in the range 100,000-300,000 instances in the human proteome. The power of linear motifs in regulatory processes derives from a combination of low affinity binding interactions, cooperativity, combinatorics and ease of de novo evolution. I contend that systems biology approaches will not be widely fruitful until our understanding of the role of linear motifs in biology has matured.

In my presentation, I will review the various protein structure classes and present our tools for protein disorder and linear motifs. I will also review the currently unsatisfactory state of the teaching material available in text books for University level teaching. How will the next generation of researchers be made properly aware of the way that protein structure is used in cell signaling (and elsewhere too) if they are not being taught it?

May 11, 2005, 18st: Reticulate Networks: Hybridization and Recombination Phylogenies

Prof. Dr. Daniel Huson, Center for Bioinformatics, University Tübingen

In simple models of evolution, sequences evolve via mutation and speciation events, and phylogenetic trees are the appropriate representation of such histories. More realistic models incorporate gene duplication and loss, or reticulate events such as hybridization, horizontal gene-transfer or recombination. Here, phylogenetic networks play an important role. In this talk, we first give an overview of the different types of phylogenetic networks and then describe a general approach to the problem of computing reticulate networks, that is, networks that explicitly represent an evolutionary history involving reticulate events. We will present new algorithms for computing such networks, in particular hybridization networks and recombination networks, and illustrate their application on various published datasets.

April 20, 2005, 18st: Functional Assessments of Alternative Spliceforms

Prof. Dr. Rolf Backofen, Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena

It is estimated that up to 60% of human genes are alternative spliced. Nevertheless only for a minor part of genes alternative spliceforms are known. Splice variants can be found by EST alignment and lab technologies like PCR or microarrays.

In this talk, we will talk about two aspects of our work on alternative splicing. First, we will present a homology based computational approach to the predicition of alternative splice forms that is based on protein domain composition. To this end, we investigate if there are homologies among all possible exon that concatenates to Protein families from the Pfam database currently not associated with a given gene.

Second, we have investigated a special form of short spliceforms that occur on tandem acceptors, which are sequences of form NAGNAG. Albeit they introduce only subtile changes, we could demonstrate that the widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity.

February 2, 2005, 18st: Efficiently solving large scale matching problems on biosequence databases

Prof. Dr. Stefan Kurtz, Zentrum für Bioinformatik, Universität Hamburg

Many sequence analysis tools developed in the pre-genomic age cannot handle the large data sizes of the biosequence databases in the genomic age. For this reason, we have developed the concept of enhanced suffix arrays. These provide a versatile indexing technique allowing to efficiently solve many large scale sequence comparison and matching problems on current biosequence databases. In this talk we will describe a new software tool which provides an efficient implementation of enhanced suffix arrays. We will shortly sketch some algorithmic aspects of this software tool, and focus on some applications in the following areas:

  • multiple genome alignment
  • development of signatures for pathogen bacteria
  • gene structure prediction

January 19, 2005, 18st: A structural perspective on interactions & complexes

Dr. Rob Russell, Structural Bioinformatics, EMBL, Heidelberg

Protein interactions are central to most biological processes, and are currently the subject of great interest. Yet despite the many recently developed methods to identify protein interactions, little attention has been paid to one of the best sources of data: complexes of known three-dimensional (3D) structure. In this talk I will discuss how such complexes can be used to study and predict protein interactions & complexes, and to interrogate interaction networks proposed by methods such as two-hybrid screens or affinity purifications. I will also discuss how EM data can be combined with bioinformatics & modelling to predict structures for complexes or for parts of interaction networks.

Related Publications:

  1. P. Aloy, B. Bottcher, H. Ceulemans, C. Leutwein, C. Mellwig, S. Fischer, A.C. Gavin, P. Bork, G. Superti-Furga, L. Serrano, R.B. Russell, Structure-based assembly of protein complexes in yeast, Science, 303, 2026-2029, 2004.
  2. P. Aloy, R.B. Russell, Ten thousand interactions for the molecular biologist, Nature Biotechnology, 22, 1317-1321, 2004.
  3. P. Aloy, H. Ceulemans, A. Stark, R.B. Russell, The relationship between sequence and interaction divergence in proteins. J. Mol. Biol., 332, 989-998, 2003.
  4. P. Aloy, R.B. Russell, InterPreTS: Protein interaction prediction through tertiary structure. Bioinformatics, 19, 161-162, 2003.
  5. P. Aloy, R.B. Russell, The third dimension for protein interactions and complexes. Trends Biochem. Sci., 27, 633-638, 2002.
  6. P. Aloy, R.B. Russell, Interrogating protein interaction networks through structural biology. Proc. Natl. Acad. Sci. USA, 99, 5896-5901, 2002.

December 15, 2004, 18st: Analytical frameworks to elucidate structure and functionalities of plant genomes

Dr. Klaus F.X. Mayer, GSF-Institute for Bioinformatics

Availability of large scale genomic information of plants is a foundation to study functionalities of individual genes and their relations, their evolutionary relatedness and builds a foundation for crop improvement and molecular assisted plant breeding. In contrast to animals genome sizes in plants vary 1000 fold and the in part highly repetitive nature of plant genomes poses severe challenges onto current analytical frameworks to analyse and structure the content of plant genomes in functional, evolutionary and functional context. The presentation will illustrate analytical strategies for plant genomes and will put a special emphasis on recent work focussed on the analysis and elucidation of structure and pecularities of the maize genome. A recent large scale effort has generated a rich data resource which allows to apply analytical, and comparative frameworks and for the first time gives detailed insights into pecularities and structure of maize and other grass genomes.

December 1, 2004, 18st: Bioinformatics in Pharma

Dr. Bertram Weiss, Senior Scientist, Schering AG, Berlin

Bioinformatics in a pharmaceutical environment is vigorously project driven and as such can be correctly described as "Applied Bioinformatics". Therefore, bioinformatics in pharma manages not only the scientific challenges but is also very much focussed on providing projects with critical information at the right time.

Firstly, the talk will present this specific bioinformatics environment and explain the expectations a bioinformatics department is confronted with and also the needs it is supposed to satisfy. Subsequently, challenges and solutions are presented and where possible also their impact on drug discovery projects is discussed. Here, the topics will range from genotype/phenotype relationships, information extraction or microarray analysis to viewing data in genomic context.

Some of these solutions have benefited from internships or bachelor studies of bioinformatics students from different German universities. The talk will highlight the success and impact of their work. By emphasising these bridges between pharma and academia, hopefully, this talk will be encouraging for young and enthusiastic students to perceive an industrial work experience as an interesting opportunity.

November 17, 2004, 18st: Prediction and discrimination of membrane proteins with HMMs and neural networks

Prof. Dr. P. L. Martelli, Laboratory of biocomputing, University of Bologna, Italy

Membrane proteins are a functionally relevant subset of the proteome: it has been estimated that their amount ranges between 20 and 30% of the proteins expressed in a cell. Two types of membrane proteins have been characterized so far: the first includes proteins that interact with the lipid bilayer by means of alpha-helices, the second includes proteins that contain antiparallel beta-strands forming a barrel inserted into the membrane.

Due to the paucity of structures solved with atomic resolution so far, it is difficult to apply standard sequence comparison techniques in order to discriminate membrane proteins starting from genomic sequences and to predict their structure. For this reason several machine learning tools have been developed for the discrimination of membrane proteins and for the prediction of their topography (i.e. the position of the transmembrane segments along the sequence) and topology (i.e. the position of the N and C termini with respect to the lipid bilayer). In particular Neural Networks (NN) are useful for analyzing the local information along the sequence while Hidden Markov Models (HMM) are efficient in modelling the global grammar of the mapping between the sequence and the structure. We developed a system based on HMMs for the prediction beta-barrel membrane proteins and an ensemble system of HMMs and NNs for the prediction of all-alpha membrane proteins.We take advantage of the evolutionary information contained in sequence profiles deriving from multiple sequence alignments and we developed new algorithms in order to exploit this information with HMMs. These systems correctly predict the topography and the topology for 71% and 73% of all-alpha membrane proteins and of beta-barrel membrane proteins respectively. Moreover their discriminative capabilities can be exploited for implementing an integrate system, called HUNTER, for the annotation of the proteins of Gram-negative bacteria. This suite of programs, that also includes a predictor for signal peptides, correctly classify the 95% of the proteins, when tested on the well annotated proteins of E.coli.

November 3, 2004, 18st: Phylogenetic Combinatorics

Prof. Dr. Andreas Dress, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig

Phylogenetic analysis aims at elucidating kinship relationships between the elements of a given set of objects under consideration, i.e., species, orders, genera, ..., genes, RNA molecules, proteins, ..., languages, words, manuscripts, ... or whatever may result from a process of repeated replication and mutation.

These kinship relationships are usually represented in terms of a phylogenetic tree. However, they can also be represented in terms of

  • split systems,
  • quartet systems, and
  • metrics.

Phylogenetic Combinatorics investigates the mutual relationships between these four classes of objects (phylogenetic trees, split systems, quartet systems, and metrics), their respective relevance for phylogenetic data analysis, their ramifications regarding the construction of phylogenetic trees, including the construction of phylogenetic networks if the "true phylogenetic tree" cannot clearly be discerned (or may -- in view of hybridization, horizontal gene transfer or other forms of "reticulated evolution" -- not even exist), and the resulting algorithmic consequences.

The lecture will present an introduction into the basic concepts of phylogenetic combinatorics and the pertinent taxonomic applications of this rapidly evolving field.

July 21, 2004, 18st: Mathematical Modeling and Simulation of Biosystems - a Contribution to the Quantitative Biosciences

Prof. Dr. Willi Jäger, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg & Center for Modeling and Simulation in the Biosciences (BIOMS), Heidelberg

Rapid advances of the biosciences in understanding molecular structures and processes in living organisms have led to significant progress in biotechnology and medical diagnostics and therapy. In the past decades, modern techniques of biophysics, biochemistry and information processing have enabled the capture and processing of tremendous amounts of data. However, for further progress, it will be necessary to develop and use quantitative methods and tools similar to those used with great success in physics and chemistry. Modeling approaches, which build on theoretical and experimental insights, are formulated in mathematical language and can be simulated in a computer, need to be developed, for example, for important processes in a cell or in collections of cells. Through a combination of real experiments, model analysis and "virtual" experiments on computers, research can progress systematically, improve the theoretical and quantitative understanding of complex biological systems, plan more efficiently new experiments and optimize biotechnological processes and medical techniques. This talk will illustrate the importance of mathematical modeling and simulation for more quantitative Biosciences through a number of case studies. It will also report on the Heidelberg experience in establishing the cooperation between the university and other research institutes such as the German Cancer Research Center (DKFZ), the European Molecular Biology Laboratory (EMBL), Max Planck Institute for Medical Research (MPI) and European Media Laboratory (EML), which, with the support of the State of Baden-Württemberg and the Klaus-Tschira-Stiftung led to the establishment of BIOMS as part of the BIOQUANT research initiative.

July 14, 2004, 18st: From 2D to 4D Bioinformatics

Prof. Dr. Christoph W. Sensen, Faculty of Medicine, Department of Biochemistry and Molecular Biology, University of Calgary

The Sun Center of Excellence for Visual Genomics at the University of Calgary is one of Canada's premier Bioinformatics laboratories. The main research focus in on how to deal with the large amounts of data produced by Genomics, Proteomics and Functional Genomics studies. The laboratory develops tools for two-dimensional analyses (e.g. genome analysis and annotation) as well as four-dimensional analyses (e.g. gene chip analysis and post-translational protein modification data processing). Prof. Sensen will introduce both aspects during this talk.

June 30, 2004, 18st: Mathematical models of biochemical networks: a case study

Prof. Dr. Reinhard Laubenbacher, Research Professor, Virginia Bioinformatics Institute, Virginia Tech

The goal of systems biology is to understand organisms at the system level, by examining the structure and dynamics of cellular and organismal function, rather than the characteristics of isolated parts. Recent technological advances have brought this vision within reach. An important component of the systems biology program is the development of analytical tools to analyze data and to organize them into mathematical models of system structure and dynamics. Key to successful modeling projects is to match the experimental data with the modeling method to be employed.

After a brief introduction to different modeling methods for biochemical networks, this talk will describe a project to understand the regulatory network responsible for oxidative stress response in S. cerevisiae, baker's yeast. The experimental data being generated include time series of transcription, protein, and metabolite measurements for the wild type as well as for several deletion mutants. A special feature of the project is that the experiments are designed specifically to be used with a combination of complementary discrete and continuous mathematical modeling approaches.

June 16, 2004, 18st: Protein Association Networks: Prediction, Value and Limitations

Dr. Christian v. Mering, EMBL Heidelberg

For many applications ranging from basic science to drug discovery, knowledge of the complete functional context of a protein is highly desirable. This context is to a large extent defined by the interaction partners of a protein - direct binding partners, but also more indirect, functionally associated partners such as pathway partners or regulators. Today, such protein-protein association information is scarce for many proteins, and is often scattered over a variety of information resources. Here, I will discuss the systematic use of genome comparisons to extend the knowledge about protein-protein interactions. Firstly, genome comparisons can help in transferring interaction information from one organism to another, for example from high-throughput experiments performed in model organisms to the human proteome. Secondly, genome comparisons can uncover shared selective pressures acting on groups of genes - which is often a strong predictor for shared function and protein-protein association. The selective pressures are inferred by examining the "genomic context" of genes, i.e. by searching for recurring genomic neighborhood, gene fusions, and for above-random similarities in species coverage.

A web-based system is introduced which integrates both concepts, the prediction of novel interactions and the collection and transfer of known interactions between organisms. I will demonstrate applications of the system, and discuss limitations and future developments.

May 26, 2004, 18st: The multiprotein machinery for gene transcription: Structure and Function

Prof. Dr. Patrick Cramer, Gene Center, University of Munich

Transcription of all protein-coding genes is conducted by RNA polymerase II, the large central enzyme that synthesizes mRNA. Our lab investigates the mechanism of mRNA transcription by determining three-dimensional structures of transient multiprotein complexes of RNA polymerase II in functional states. Recent achievements include the complete atomic model of the 12-subunit polymerase and the first structure of a polymerase-transcription factor complex. The aim of this work is to obtain a three-dimensional movie of the dynamic transcription machinery in action, and a mechanistic understanding of transcriptional regulation.

May 12, 2004, 18st: Metabolic Pathway Analysis with Special Reference to Evolution

Prof. Dr. Stefan Schuster, Faculty of Biology and Pharmacy, Jena University

A major challenge in biology is to clarify the relationship between structure and function in complex intracellular networks. Topics of current interest include robustness, optimality and biotechnological relevance of living cells and organisms. Metabolic pathway analysis have recently attracted much interest, partly because they only require network topology. A central concept in this analysis is that of elementary flux modes. It is shown that elementary modes are well-suited for determining routes enabling maximum yields of bioconversions and for analysing redundancy and robustness properties of living cells. Another application is the assessment of the impact of enzyme deficiencies in medicine.

To understand the present-day architecture of metabolic pathways, evolutionary history should be taken into account. Besides molar yield, synthesis rate is an important objective in evolutionary optimization. There are metabolic pathways, such as fermentative sugar degradation, that allow a high ATP production rate but a low yield, and others, such as respiration, to which the opposite case applies. Two species (or strains) of micro-organisms that use the same nutrient, but may choose between two different pathways of ATP production, can be studied from a game-theoretical point of view. In a certain parameter range, the fitness functions fulfil the conditions for the prisoner's dilemma. Therefore, cooperative behaviour is unlikely to occur, unless additional factors interfere. In fact, the yeast Saccharomyces cerevisiae uses a competitive strategy by fermenting sugars even under aerobic conditions, thus wasting its own resource. Several ideas generalizing the above results are discussed, in particular, with respect to possible scenarios of transition to cooperative behaviour.

April 28, 2004, 18st: Gene Prediction by comparative Genomics, the case of Selenoproteins

Roderic Guigó, the Institut Municipal dŽ Investigacio Medica, Barcelona

Although the genome sequence and gene content are available for an increasing number of organisms, eukaryotic selenoproteins remain poorly characterized. In these proteins, selenium (Se) is incorporated in the form of selenocysteine (Sec), the 21st amino acid. Selenocysteine is cotranslationally inserted in response to UGA codons (a stop signal in the canonical genetic code). The alternative decoding is mediated by a stem-loop structure in the 3'UTR of selenoprotein mRNAs (the SECIS element). Selenium is implicated in male infertility, cancer and heart diseases, viral expression and ageing. In addition, most selenoproteins have homologues in which Sec is replaced by cysteine (Cys). Genome biologists rely on the high-quality annotation of genomes to bridge the gap from the sequence to the biology of the organism. However, for selenoproteins, which mediate the biological functions of selenium, the dual role of the UGA codon confounds both the automatic annotation pipelines and the human curators. In consequence, selenoprotein are misannotated in the majority of genome projects. Furthermore, the finding of novel selenoprotein families remains a difficult task in the newly released genome sequences.

In the last few years, we have contributed to the exhaustive description of the eukaryotic selenoproteome (set of eukaryotic selenoproteins) through the development of a number of ad hoc computational tools. Our approach is based on the capacity of predicting SECIS elements, standard genes and genes with a UGA codon in-frame in one or multiple genomes. Indeed, the comparative analysis plays an essential role because 1) SECIS sequences are conserved between close species (eg. human-mouse); and 2) sequence conservation across a UGA codon between genomes at further phylogenetic distance strongly suggests a coding function (eg. human-fugu). Our analysis of the fly, human and fugu genomes have resulted in 8 novel selenoprotein families. Therefore, 19 distinct selenoprotein families have been described in eukaryotes to date. Most of these families are widely (but not uniformly) distributed across eukaryotes, either as true selenoproteins or Cys-homologues. The recent completion of the Tetraodon nigroviridis and Fugu rubripes genomes has allowed us to investigate the eukaryotic selenoproteome in a restricted and largely unexplored window within the vertebrate phylogeny. Our investigation has resulted in the identification of a novel selenoprotein family, currently under study, which appears to be restricted to actinopterygians among vertebrates.

The correct annotation of selenoproteins is thus providing insight into the evolution of the usage of Sec. Our data indicate a discrete evolutionary distribution of selenoproteins in eukaryotes and suggest that, contrary to the prevalent thinking of an increase in the number of selenoproteins from less to more complex genomes, Sec-containing proteins scatter all along the complexity scale. We believe that the particular distribution of each family is mediated by an ongoing process of Sec/Cys interconversion, in which contingent events could play a role as important as functional constraints. The characterization of eukaryotic selenoproteins illustrates some of the most important challenges involved in the completion of the gene annotation of genomes. Notably among them, the increasing number of exceptions to our standard theory of the eukaryotic gene and the necessity of sequencing genomes at different evolutionary distances towards such a complete annotation.

February 11, 2004, 18st: The Robot Scientist Project

Prof. Ross King, University of Wales, Aberystwyth

The question of whether it is possible to automate the scientific process is of both great theoretical interest and increasing practical importance because, in many scientific areas, data are being generated much faster than they can be effectively analysed. We describe a physically implemented robotic system that applies techniques from artificial intelligence to carry out cycles of scientific experimentation. The system automatically originates hypotheses to explain observations, devises experiments to test these hypotheses, physically runs the experiments using a laboratory robot, interprets the results to falsify hypotheses inconsistent with the data, and then repeats the cycle.

Here we apply the system to the determination of gene function using deletion mutants of yeast (Saccharomyces cerevisiae) and auxotrophic growth experiments. We built and tested a detailed logical model (involving genes, proteins and metabolites) of the aromatic amino acid synthesis pathway. In biological experiments that automatically reconstruct parts of this model, we show that an intelligent experiment selection strategy is competitive with human performance and significantly outperforms, with a cost decrease of 3-fold and 100-fold (respectively), both cheapest and random-experiment selection. We are currently extending this methodology to try to automatically discover new yeast functional genomics knowledge.

January 28, 2004, 18st: Mammalian promoter logic - how to find it, how to use it

Dr. Thomas Werner, CEO & CSO Genomatix Software GmbH München

Mammalian promoters are usually multifunctional as they have to perform differently in development, differentiated cell types and have to respond to different signaling pathways. The most important features of promoters responsible for this functional flexibility are transcription factor binding sites. Complex and variable synergistic and antagonistic combinations of such binding sites represents a functional network, including logic AND and exclusive OR combinations. We have discerned five such functional combinations, for the RANTES promoter and were able to model these so-called frameworks computationally. Subsequent database searches revealed that promoter frameworks are indeed suitable to detect functional connections between genes solely by analysis of genomic sequences.

January 14, 2004, 18st: High-accuracy prediction of protein-carbohydrate interactions

Prof. Dr. Oliver Kohlbacher, Department for Simulation of Biological Systems, University Tübingen

Carbohydrates are probably the most seriously neglected class of biomolecules. They encode biological information just as nucleic acids and polypeptides do, however this information is less obvious to decode and even more complex due to the non-linear nature of polysaccharides. In particular in cellular recognition they are known to play a crucial role. Over the last years, an increasing number of X-ray and NMR structures of sugar-protein complexes have not only shed light on the details of the protein-carbohydrate interaction, but made molecular modelling of these interactions tractable. A wide range of medical and pharmaceutical applications drive the current interest in modelling approaches. Understanding the sugar-protein interaction can give rise to new drugs against microbial infection, inflammatory diseases, or pharmaceutical technology to target specific cell types.

We have performed a thorough analysis of the binding of sugars to a family of sugar-binding proteins (plant lectins). These studies reveal new details of the binding and based thereupon we have developed a new scoring function for sugar docking, SLICK (Sugar Lectin InteraCKtion). In contrast to existing scoring functions, SLICK has been designed for precise prediction of binding energies, as its primary application area is the in silico design of sugar-binding lectin mimetics for targeted drug delivery. We have assembled the most comprehensive benchmark set for sugar-protein interactions currently available. On these test data, SLICK performs remarkably well with average errors in leave-one-out tests on the order of 1.5 kJ/mol in the predicted binding free energy.

December 10, 2003, 18st: Evolution of Multi-Domain Proteins

Sarah Teichmann, MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK

Two thirds of all prokaryote proteins and eighty percent of eukaryote proteins are multi-domain proteins. The composition and interaction of the domains within a multi-domain protein determine its function. Using structural assignments to the proteins in completely sequenced genomes, we have insight into the domain architectures of a large fraction of all multi-domain proteins. Thus we can investigate the patterns of pairwise domain combinations, as well as the existence of evolutionary units larger than individual protein domains.

Structural assignments provide us with the sequential arrangement of domains along a polypeptide chain. In order to fully understand the structure and function of a multi-domain protein, we also need to know the geometry of the domains relative to each other in three dimensions. By studying multi-domain proteins of known three-dimensional structure, we can gain insight into the conservation of domain geometry, and the prediction of the structures of domain assemblies.

November 26, 2003, 18st: Perspectives for Systems Biology in the Munich region

Dr. Eduardo R. Mendoza, Physics Department, Ludwig-Maximilians-University of Munich

In the past few years, Systems Biology has emerged as a unifying concept for diverse efforts in understanding biological systems in general and the cell in particular. Its integrative aspects involve relating different levels of cellular structure as well as data from the various "omics" and are commonly characterized by the "network of interactions" paradigm. Given the complexity of biological systems, Systems Biology also advocates integration on the level of scientific work organization - between experimental and modelling groups as well as between multiple disciplines. In particular, with the growing importance of computational methods, it emphasizes the need to include "systems engineering" approaches both in the analysis and the construction of models. Two key development here are the Systems Biology Mark-Up Language (SBML) as well as large-scale SBML-based software platforms such as BioSPICE (a $ 50 million US DARPA project). The talk will

  • discuss these trends in the field of Systems Biology an din particular provide an update on the emerging software standards and platforms
  • some Systems Biology-relevant experimental & modelling research activities at the LMU Center for NanoScience
  • introduce the proposal for an open forum on Systems Biology in the Munich region
  • November 5, 2003, 18st: From Domains to Functions and back

    Prof. Dr. J. Schultz; Bioinformatik, Am Hubland; Würzburg

    July 9, 2003, 18st: From peaks to peptides � Proteomics: Differential analysis of complex mixtures using high-throughput mass-Spectrometry

    Prof. Dr.-Ing. Knut Reinert, Institute of Computer Science, Freie Universität Berlin

    In recent years, DNA micro-arrays have become commonly used tools to analyze the behaviour of cells when exposed to certain changes (infection with disease, environmental change, etc.) However, it is well known that mRNA expression can be very different from protein expression, hence a direct measurement of protein expression is crucial in proteomics.

    In this talk we describe algorithms for analyzing protein expression data obtained from multi-dimensional liquid chromatography coupled with mass spectrometry (LC-MS). We propose new methods to analyze the resulting complex mass spectra of peptides through a thorough analysis of the isotope distribution of the peaks caused by peptides. Based on this detection step, we can then aggregate the peptide signals across multiple LC-MS runs to improve statistical results and also quantify the differential expression of proteins across several samples (e.g. normal vs. pathological state).

    June 25, 2003, 18st: Interpretation, error modeling and calibration of microarray gene

    Dr. Wolfgang Huber, Division of Molecular Genome Analysis, German Cancer Research Center, Heidelberg

    Data from microarray experiments is often summarized in the form of logarithmic ratios, or of logarithm-transformed intensities. This amounts to the assumption that an increase from, say, 100 units to 200 units has the same biological significance as one from 1000 to 2000. While this approach is useful for large intensities, it fails when the level of expression of a gene in one of the conditions is small or zero. However, these situations may be biologically relevant, perhaps even the most relevant ones.

    We derive a measure of expression and of differential expression that has comparable resolution across the whole dynamic range of expression. Mathematically, this can expressed in terms of a variance stabilizing transformation. We present a statistical modelling approach that leads to a robust estimator for the transformation parameters. At the same time, this permits the more precise estimation of between-array or between-dye normalization parameters.

    On several example data sets, we show that this approach leads to superior sensitivity and specificity for the identification of differentially transcribed genes.

    June 11, 2003, 16:30(!): The human genome - a moving target for annotation

    Dr. Peer Bork, EMBL (European Molecular Biology Laboratory), Heidelberg

    Dr. Peer Bork is senior scientist (bioinformatics)/group leader at the EMBL (Heidelberg); joint coordinator of the EMBL Structural and Computational Biology programme; visiting group leader at the MDC (Berlin-Buch).

    May 28, 2003, 18st: Three New Bioinformatics Tools

    Prof. Dr. Daniel Huson, Uni Tübingen

    The development of powerful visualization tools is a major challenge in bioinformatics. Although many good special purpose viewers exit, there is a need for configurable meta-viewers that provide enough flexibility to support many different types of data and visualizations. Here we present CGViz, a new software tool that fulfils many of the requirements placed on such a configurable meta-viewer.

    There are many different methods and algorithms for performing phylogenetic analysis. Most are based on the assumption that phylogenetic relationship are best described by a tree. However, phylogenetic datasets usually contain a number of different and often incompatible signals (e.g. generated by recombination) and so methods that produce networks rather than just trees can be useful in data exploration. He describes a new java program jSplits that provides a framework for implementing phylogenetic tree- and network methods.

    Another tool recently developed by our group (in cooperation with Andrei Lupas at the MPI for developmental biology in Tuebingen) is an interactive tool for exploring and crafting multiple alignments of protein alignments called iPet.

    In all three cases, the main design goal was to develop a flexible and extensible framework that supports the rapid development and release of new algorithms in comparative genomics, phylogenetics and protein analysis.

    May 14, 2003, 18st: Studying the duplication past of plant and vertebrate genomes

    Prof. Dr. Yves Van de Peer, Ghent University

    Analysis of the genome sequence of Arabidopsis thaliana shows that its genome, like that of several other eukaryotic genomes, has undergone large-scale gene or even entire genome duplications in its evolutionary past. However, the high frequency of gene loss after duplication events reduces colinearity and therefore the chance of finding duplicated regions that, to the extreme, no longer share homologous genes. We have shown that heavily degenerated block duplications that can no longer be recognized by directly comparing both segments due to differential gene loss, can still be detected through indirect comparison with other segments. When these so-called hidden duplications in Arabidopsis are taken into account, many homologous genomic regions can be found in five up to eight copies. This strongly implies that Arabidopsis has undergone three, but probably not more, rounds of genome duplications.

    Therefore, adding such hidden blocks to the duplication landscape of Arabidopsis sheds a new light on the number of polyploidy events that this model plant genome has undergone in its evolutionary past. Using similar techniques, we have also analyzed the genome of the monocotyledonous model plant species rice (Oryza sativa), for which a draft of the genomic sequence has recently been published. Although a substantial fraction of all rice genes, i.e. about 15%, are found in duplicated segments, dating of these block duplications, their non-uniform distribution over the different rice chromosomes, and comparison with the duplication history in Arabidopsis suggest that rice is not an ancient polyploid as previously suggested, but an ancient aneuploid that has experienced one large segmental or chromosomal duplication in its evolutionary past, approximately 70 million years ago. This date predates the divergence of most of the cereals and relative dating by phylogenetic analysis indeed shows that the duplication event is shared by most, if not all, of them. Apart from plants such as Arabidopsis and rice, we are also analyzing the Fugu genome. In the Fugu genome, we can still find traces of large scale gene duplications that have occurred in the vertebrate lineage more than 500 million years ago.

    March 5, 2003, 18st: "Informatik und Wirkstoffentwurf: Neue Methoden für das computergestützte molekulare Design"

    Prof. Dr. Matthias Rarey, Zentrum für Bioinformatik, Universität Hamburg

    Die Entwicklung bioaktiver Verbindungen gehört aufgrund der Komplexität lebender Organismen zu den schwierigsten Herausforderungen unserer Zeit. Gleichzeitig erhöht sich durch neue Hochdurchsatzexperimente und groß angelegte Projekte wie die Genomprojekte unser Kenntnisstand über genetische und molekular-biologische Sachverhalte mit einer enormen Geschwindigkeit. Die Nutzung dieses Wissens für das molekulare Design ist aufgrund der hohen Datenmengen eng mit der Informatik verknüpft. Nur informatische Methoden sind in der Lage, die Daten zielgerichtet auszuwerten und zur Entwicklung neuer Wirkstoffe einzusetzen. In dem Vortrag gebe ich einen Überblick über unsere Forschungsthemen und -resultate, die sich im Spannungsfeld zwischen Wirkstoffentwurf und Informatik bewegen. Die Themen reichen von der Vorhersage von Protein-Ligand-Wechselwirkungen (molekulares Docking) über die Ähnlichkeitsanalyse von Molekülen und chemischen Räumen bis zu Visualisierungsaspekten. In allen Bereichen spielt neben der Problemmodellierung die Entwicklung effizienter, maßgeschneiderter Algorithmen und Datenstrukturen eine Schlüsselrolle. Einige Resultate aus der praktischen Anwendung der entwickelten Methoden werden präsentiert.

    February 19, 2003, 18st: "Pair Algebras: A (***)-Lecture on Dynamic Programming"

    Prof. Dr. Robert Giegerich, Bielefeld University

    Dynamic programming is a well established technique of solving combinatorial optimization problems, widely used in bioinformatics. The recently developed algebraic discipline of dynamic programming provides a perfect separation between the search space considered, and the objective of optimization. The former is described formally by a yield grammar, the latter by an evaluation algebra including the choice function. Such separation of concerns provides great flexibility, as various problems the same search space can be accomplished merely by a change of the algebra.

    We introduce a pairing operation (***) on evaluation algebras, whose clue lies in the definition of the combined choice function. This is a practical convenience to compute multiple results. But it is also more: We show how this technique can be used to validate important properties of a given DO algorithm, such as uniqueness of solutions or canonicity of the recurrences involved. As a practical way to demonstrate algorithmic properties, this helps to understand our programs better, and is beneficially used in teaching dynamic programming.

    February 5, 2003, 18st: "Bioinformatical approaches to pathogenomics"

    Prof. Dr. Thomas Dandekar, Biozentrum, Universität Würzburg

    Typical challenges in the bioinformatical analysis of genomes in infectious diseases are illustrated and discussed.

    • A primary information is often the genome sequence, however, to interpret it correctly, much more than just the sequence information is needed (requirement of data integration).
    • A higher level assembles detected enzyme activities to pathways. To interpret these correctly, comparative genomics become important as well as direct biochemical data.
    • Finally, at the network level, consistency of the putative enzyme interactions has to be tested and rechecked by suitable algorithms to properly understand the network.

    These three tasks are illustrated by different examples from current and previous work on pathogenic genomes.

    December 11, 2002, 18st: "Highly Specific Protein Motifs for Analyzing Proteins & Proteomes"

    Prof. Douglas Brutlag, Department of Biochemistry, Stanford University

    Our group has developed three databases of highly specific protein functional motifs eBLOCKs, eMOTIFs and eMATRICES. These motifs have associated specificities so one can calculate the expectation that they would occur by chance in a database search. They are so specific that one can search entire proteomes with no false predictions (expectation 0.01). The databases are also very extensive so that there is a very high sensitivity. Over 70% of the human proteins in the Refeq database have significant functional assignments using these databases. Over 95% of the training set (SwissProt) has highly significant functional assignment as well.
    The eBLOCKs database of protein alignments was built by performing an automatic PSI-BLAST search with each SwissProt sequence compared to all the others. Short ungapped regions of conserved sequence were saved as individual eBLOCKs. This resulted in 20,000 protein families and 81,000 conserved regions. About 30,000 of these overlapped known conserved regions from BLOCKS+, PRINTS, and InterPro. About 52,000 of the eBLOCKs are novel. The eBLOCKs database was converted into a regular expression database (eMOTIFs) and position-specific scorning matrix database (eMATRIX) for searching proteins and proteomes. We generated a fourth database (ePROTEOME) containing all known protein coding regions (proteomes) from 75 completely sequenced genomes analyzed for function by eMOTIFs and eMATRICES.

    November 27, 2002, 18st: "Towards Discovering Structural Signatures of Protein Folds Based on Logical Hidden Markov Models"

    Dr. Stefan Kramer, Institute for Computer Science, Albert-Ludwigs-Universität, Freiburg

    With the growing number of determined protein structures and the availability of classification schemes, it becomes increasingly important to develop computer methods that automatically extract structural signatures for classes of proteins. In this talk, I will introduce a new Machine Learning technique, Logical Hidden markow Models (LOHMMs), and present its application to the task of finding structural signatures of folds according to the classification scheme SCOP. Our results indicate that LOHMMs are applicable to this task and possess several advantages over other approaches.

    November 13, 2002, 18st: "On the evolution of protein domains"

    Prof. Dr. Andrei N. Lupas, Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen

    The domain is generally considered the unit of protein structure. However, the evolution of domains from random polypeptide chains presents substantial conceptual problems. Here, we will discuss an alternative hypothesis, which proposes that domains evolved from peptides with secondary structure propensity. These peptides may have originated as cofactors of ribozymes in a primitive RNA world. This hypothesis originated in the 70s, with the discovery of introns and the observation that some folds could be seen as the result of successive gene duplications. Recent advances in genomics and bioinformatics have made it possible to explore this "evolution from peptides" hypothesis in considerably greater detail. The results suggest that bioinformatics may provide a means to reconstruct an ancestral vocabulary of peptides, from which present-day proteins originated.

    October 10. 2002, 18st: "Computational analysis of microarray data"

    Prof. Dr. Martin Vingron, Max-Planck-Institut für Molekulare Genetik, Berlin

    Recently introduced technology to determine simultaneously the expression levels of large numbers of genes has created new challenges for the computational analysis of the resulting data. This talk will discuss the different problems like data normalization, identification of differential genes, clustering and classification of expression profiles. Special emphasis will be given to a novel normalization method based on variance stabilization and to the application of a planar embedding method, correspondence analysis, to the study of association between genes and conditions. We will present a case study on the yeast cell cycle associated genes.

    Artikelaktionen