142,99 €
The book introduces to the reader a number of cutting edge statistical methods which can e used for the analysis of genomic, proteomic and metabolomic data sets. In particular in the field of systems biology, researchers are trying to analyze as many data as possible in a given biological system (such as a cell or an organ). The appropriate statistical evaluation of these large scale data is critical for the correct interpretation and different experimental approaches require different approaches for the statistical analysis of these data. This book is written by biostatisticians and mathematicians but aimed as a valuable guide for the experimental researcher as well computational biologists who often lack an appropriate background in statistical analysis.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 842
Veröffentlichungsjahr: 2011
Contents
Cover
Related Titles
Title Page
Copyright
Preface
References
List of Contributors
Part One: Modeling, Simulation, and Meaning of Gene Networks
Chapter 1: Network Analysis to Interpret Complex Phenotypes
1.1 Introduction
1.2 Identification of Important Genes based on Network Topologies
1.3 Inferring Information from Known Networks
1.4 Conclusions
References
Chapter 2: Stochastic Modeling of Gene Regulatory Networks
2.1 Introduction
2.2 Discrete Stochastic Simulation Methods
2.3 Discrete Stochastic Modeling
2.4 Continuous Stochastic Modeling
2.5 Stochastic Models for Both Internal and External Noise
2.6 Conclusions
References
Chapter 3: Modeling Expression Quantitative Trait Loci in Multiple Populations
3.1 Introduction
3.2 IGM Method
3.3 CTWM
3.4 CTWM-GS Method
3.5 Discussion
References
Part Two: Inference of Gene Networks
Chapter 4: Transcriptional Network Inference Based on Information Theory
4.1 Introduction
4.2 Inference Based on Conditional Mutual Information
4.3 Inference Based on Pairwise Mutual Information
4.4 Arc Orientation
4.5 Conclusions
References
Chapter 5: Elucidation of General and Condition-Dependent Gene Pathways Using Mixture Models and Bayesian Networks
5.1 Introduction
5.2 Methodology
5.3 Applications
5.4 Conclusions
References
Chapter 6: Multiscale Network Reconstruction from Gene Expression Measurements: Correlations, Perturbations, and “A Priori Biological Knowledge”
6.1 Introduction
6.2 “Perturbation Method”
6.3 Network Reconstruction by the Correlation Method from Time-Series Gene Expression Data
6.4 Network Reconstruction from Gene Expression Data by A Priori Biological Knowledge
6.5 Examples and Methods of Correlation Network Analysis on Time-Series Data
6.6 Examples and Methods for Pathway Network Analysis
6.7 Discussion
6.8 Conclusions
References
Chapter 7: Gene Regulatory Networks Inference: Combining a Genetic Programming and H∞ Filtering Approach
7.1 Introduction
7.2 Background
7.3 Methodology for Identification and Algorithm Description
7.4 Simulation Evaluation
7.5 Conclusions
Appendix 7.A: Comparison between the Kalman Filter and H∞ Filter
Note
References
Chapter 8: Computational Reconstruction of Protein Interaction Networks
8.1 Introduction
8.2 Protein Interaction Networks
8.3 Characterization of Computed Networks
8.4 Conclusions
References
Part Three: Analysis of Gene Networks
Chapter 9: What if the Fit is Unfit? Criteria for Biological Systems Estimation Beyond Residual Errors
9.1 Introduction
9.2 Model Design
9.3 Concepts and Challenges of Parameter Estimation
9.4 Conclusions
Acknowledgments
References
Chapter 10: Machine Learning Methods for Identifying Essential Genes and Proteins in Networks
10.1 Introduction
10.2 Definitions and Constructions of the Network
10.3 Network Descriptors
10.4 Machine Learning
10.5 Some Examples of Applications
10.6 Conclusions
References
Chapter 11: Gene Coexpression Networks for the Analysis of DNA Microarray Data
11.1 Introduction
11.2 Background
11.3 Construction of GCNs
11.4 Integration of GCNs with Other Data
11.5 Analysis of GCNs
11.6 GCNs for the Study of Cancer
11.7 Conclusions
Acknowledgments
References
Chapter 12: Correlation Network Analysis and Knowledge Integration
12.1 Introduction
12.2 Systems Biology Data Quandaries
12.3 Semantic Web Approaches
12.4 Correlation Network Analysis
12.5 Knowledge Annotation for Networks
12.6 Future Developments
References
Chapter 13: Network Screening: A New Method to Identify Active Networks from an Ensemble of Known Networks
13.1 Introduction
13.2 Methods
13.3 Example Applications
13.4 Discussion
References
Chapter 14: Community Detection in Biological Networks
14.1 Introduction
14.2 Centrality Measures
14.3 Study of Complex Systems
14.4 Overview
14.5 Proposed Algorithm
14.6 Experiments
14.7 Further Improvements
14.8 Conclusions
Acknowledgments
References
Chapter 15: On Some Inverse Problems in Generating Probabilistic Boolean Networks
15.1 Introduction
15.2 Reviews on BNs and PBNs
15.3 Construction of PBNs from a Prescribed Stationary Distribution
15.4 Construction of PBNs from a Prescribed Transition Probability Matrix
15.5 Conclusions
Acknowledgments
References
Chapter 16: Boolean Analysis of Gene Expression Datasets
16.1 Introduction
16.2 Boolean Analysis
16.3 Main Organization
16.4 StepMiner
16.5 StepMiner Algorithm
16.6 BooleanNet
16.7 BooleanNet Algorithm
16.8 Conclusions
Acknowledgements
References
Part Four: Systems Approach to Diseases
Chapter 17: Representing Cancer Cell Trajectories in a Phase-Space Diagram: Switching Cellular States by Biological Phase Transitions
17.1 Introduction
17.2 Beyond Reductionism
17.3 Cell Shape as a Diagram of Forces
17.4 Morphologic Phenotypes and Phase Transitions
17.5 Cancer as an Anomalous Attractor
17.6 Shapes as System Descriptors
17.7 Fractals of Living Organisms
17.8 Fractals and Cancer
17.9 Modifications in Cell Shape Precede Tumor Metabolome Reversion
17.10 Conclusions
References
Chapter 18: Protein Network Analysis for Disease Gene Identification and Prioritization
18.1 Introduction
18.2 Protein Networks and Human Disease
18.3 ToppGene Suite of Applications
18.4 Conclusions
References
Chapter 19: Pathways and Networks as Functional Descriptors for Human Disease and Drug Response Endpoints
19.1 Introduction
19.2 Gene Content Classifiers and Functional Classifiers
19.3 Biological Pathways and Networks Have Different Properties as Functional Descriptors
19.4 Applications of Pathways as Functional Classifiers
19.5 Single Pathway Learning for Identifying Functional Descriptor Pathways
19.6 Multiple-Path Learning (MPL) Algorithm for Pathway Descriptors
19.7 Applications of MPL-Deduced Pathway Descriptors
19.8 Combining Advantages of Pathways and Networks
19.9 Key Upstream and Downstream Interactions of Genetically Altered Genes and “Universal Cancer Genes”
19.10 Conclusions
References
Index
Related Titles
Emmert-Streib, F., Dehmer, M. (eds.)
Medical Biostatistics for Complex Diseases
2010
ISBN: 978-3-527-32585-6
Dehmer, M., Emmert-Streib, F. (eds.)
Analysis of Complex Networks
From Biology to Linguistics
2009
ISBN: 978-3-527-32345-6
Emmert-Streib, F., Dehmer, M. (eds.)
Analysis of Microarray Data
A Network-Based Approach
2008
ISBN: 978-3-527-31822-3
Junker, B. H., Schreiber, F.
Analysis of Biological Networks
2008
ISBN: 978-0-470-04144-4
Stolovitzky, G., Califano, A. (eds.)
Reverse Engineering Biological Networks
Opportunities and Challenges in Computational Methods for Pathway Inference
2007
ISBN: 978-1-57331-689-7
The Editors
Matthias Dehmer
UMIT
Institute for Bioinformatics
and Translational Research
Eduard Wallnöfer Zentrum 1
6060 Hall, Tyrol
Austria
Frank Emmert-Streib
Queen's University Belfast
Center for Cancer Research and Cell Biology
97, Lisburn Road
Belfast BT9 7BL
United Kingdom
Armin Graber
UMIT
Institute for Bioinformatics
and Translational Research
Eduard Wallnöfer Zentrum 1
6060 Hall, Tyrol
Austria
and
Novartis Pharmaceuticals Corporation
Oncology Biomarkers and Imaging
One Health Plaza
East Hanover, NJ 07936
USA
Armindo Salvador
University of Coimbra
Center for Neuroscience and
Cell Biology, Department of Chemistry
3004-535 Coimbra
Portugal
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose.
No warranty can be created or extended by sales representatives or written sales materials. The Advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Card No.: applied for
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.
© 2011 Wiley-VCH Verlag & Co. KGaA,
Boschstr. 12, 69469 Weinheim, Germany
Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical, and Medical business with Blackwell Publishing.
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
ISBN: 978-3-527-32750-8
Preface
For the field of systems biology to mature, novel statistical and computational analysis methods are needed to deal with the growing amount of high-throughput data from genomics and genetics experiments. This book presents such methods and applications to data from biological and biomedical problems. Nowadays, it is widely recognized that networks form a very fruitful representation for studying problems in systems biology [1, 2]. However, many traditional methods do not make explicit use of a network representation of the data. For this reason, the topics treated in this book explore statistical and computational data analysis aspects of networks in systems biology [3–6].
Biological phenotypes are mediated by very intricate networks of interactions among biological components. This book covers extensively what we view as two complementary but strongly interrelated challenges in network biology. The first lies in inferring networks from experimental observations of state variables of a system. Interactions among molecular components are traditionally characterized through equilibrium binding or kinetic experiments in vitro with dilute solutions of the purified components. However, such experiments are typically low throughput and unable to properly account for the conditions prevailing in vivo, where factors such as molecular crowding, spatial heterogeneity, and the presence of ligands might strongly modify the interactions of interest. The possibility of inferring network connectivity and even quantitative interaction parameters from observations of intact living systems is attracting considerable research interest as a way of escaping such shortcomings. The fact that biological networks are complex, that problems are often poorly constrained, and that data are often high dimensional and noisy makes this challenge daunting. The second and perhaps equally difficult challenge lies in deriving results that are both biologically relevant and reliable from incomplete and uncertain information about biological interaction networks. We hope that the contributions in the subsequent chapters will help the reader understand and meet these challenges.
This book is intended for researches and graduate and advanced undergraduate students in the interdisciplinary fields of computational biology, biostatistics, bioinformatics, and systems biology studying problems in biological and biomedical sciences. The book is organized in four main parts: Part One: Modeling, Simulation, and Meaning of Gene Networks; Part Two: Inference of Gene Networks; Part 3: Analysis of Gene Networks; and Part Four: Systems Approach to Diseases. Each part consists of chapters that emphasize the topic of the corresponding part, however, without being disconnected from the remainder of the book. Overall, to order the different parts we assumed an intuitive – problem-oriented – perspective moving from Modeling, Simulation, and Meaning of Gene Networks to Inference of Gene Networks and Analysis of Gene Networks. The last part presents biomedical applications of various methods in Systems Approach to Diseases.
Each chapter is comprehensively presented, accessible not only to researchers from this field but also to advanced undergraduate or graduate students. For this reason, each chapter not only presents technical results but also provides background knowledge necessary to understand the statistical method or the biological problem under consideration. This allows to use this book as a textbook for an interdisciplinary seminar for advanced students not only because of the comprehensiveness of the chapters but also because of its size allowing to fill a complete semester.
Many colleagues, whether consciously or unconsciously, have provided us with input, help, and support before and during the preparation of this book. In particular, we would like to thank Andreas Albrecht, Gökmen Altay, Subhash Basak, Danail Bonchev, Maria Duca, Dean Fennell, Galina Glazko, Martin Grabner, Beryl Graham, Peter Hamilton, Des Higgins, Puthen Jithesh, Patrick Johnston, Frank Kee, Terry Lappin, Kang Li, D. D. Lozovanu, Dennis McCance, James McCann, Alexander Mehler, Abbe Mowshowitz, Ken Mills, Arcady Mushegian, Katie Orr, Andrei Perjan, Bert Rima, Brigitte Senn-Kircher, Ricardo de Matos Simoes, Francesca Shearer, Fred Sobik, John Storey, Simon Tavaré, Shailesh Tripathi, Kurt Varmuza, Bruce Weir, Pat White, Kathleen Williamson, Shu-Dong Zhang, and Dongxiao Zhu and apologize to all who have not been named mistakenly. We would also like to thank our editors Andreas Sendtko and Gregor Cicchetti from Wiley-VCH who have been always available and helpful.
Finally, we hope that this book will help to spread out the enthusiasm and joy we have for this field and inspire people regarding their own practical or theoretical research problems.
March 2011
Belfast, Hall/Tyrol, and Coimbra
Matthias Dehmer,Frank Emmert-Streib,Armin Graber,and Armindo Salvador
References
1. Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding the cell's functional organization. Nat. Rev. Genet., 5, 101–113.
2. Emmert-Streib, F. and Glazko, G. (2011)Network biology: a direct approach to study biological function. WIREs Syst. Biol. Med., in press.
3. Alon, U. (2006) An Introduction to Systems Biology: Design Principles of Biological Circuits, Chapman & Hall/CRC.
4. Bertalanffy, L. von. (1950) An outline of general systems theory. Br. J. Philos. Sci., 1 (2)
5. Kitano, H. (ed.) (2001) Foundations of Systems Biology, MIT Press.
6. Palsson, B.O. (2006) Systems Biology: Properties of Reconstructed Networks, Cambridge University Press.
List of Contributors
Andreas Bernthaler
Vienna University of Technology
Institute of Computer Languages
Theory and Logics Group
Favoritenstrasse 9
1040 Vienna
Austria
Marina Bessarabova
Thomson Reuters
Healthcare & Life Sciences
169 Saxony Road
Encinitas, CA 92024
USA
Mariano Bizzarri
Sapienza University
Department of Experimental Medicine
Viale Regina Elena 324
00161 Rome
Italy
Gianluca Bontempi
Université Libre de Bruxelles
Computer Science Department
Machine Learning Group
Boulevard du Triomphe
1050 Brussels
Belgium
Gastone Castellani
Università di Bologna
Department of Physics
INFN Bologna Section and
Galvani Center for Biocomplexity
40127 Bologna
Italy
Jing Chen
University of Cincinnati
Department of Environmental Health
Cincinnati, OH 45229
USA
Xi Chen
The University of Hong Kong
Department of Mathematics
Pok Fu Lam Road
Hong Kong
China
Wai-Ki Ching
The University of Hong Kong
Department of Mathematics
Pok Fu Lam Road
Hong Kong
China
Zoltan Dezso
Thomson Reuters
Healthcare & Life Sciences
169 Saxony Road
Encinitas, CA 92024
USA
Cathy S. J. Fann
Academia Sinica
Institute of Biomedical Sciences
Academia Road, Nankang
115 Taipei
Taiwan
Raul Fechete
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Rudolf Freund
Vienna University of Technology
Institute of Computer Languages
Theory and Logics Group
Favoritenstrasse 9
1040 Vienna
Austria
Alessandro Giuliani
Istituto Superiore di Sanità
Department of Environment
and Health
Viale Regina Elena 299
00161 Rome
Italy
Erich Gombocz
IO Informatics Inc.
2550 Ninth Street
Berkeley, CA 94710-2549
USA
Jing-Dong J. Han
Chinese Academy of Sciences
Institute of Genetics and
Developmental Biology
Center for Molecular Systems Biology
Key Laboratory of
Molecular Developmental Biology
Lincui East Road
100101 Beijing
China
and
Chinese Academy of Sciences–
Max Planck Partner Institute for
Computational Biology
Shanghai Institutes for
Biological Sciences
Chinese Academy of Sciences
320 Yue Yang Road
200031 Shanghai
China
Katsuhisa Horimoto
National Institute of Advanced
Industrial Science Technology
Computational Biology Research Center
2-4-7, Aomi, Koto-ku
135-0064 Tokyo
Japan
Ching-Lin Hsiao
Academia Sinica
Institute of Biomedical Sciences
Academia Road, Nankang
115 Taipei
Taiwan
Jialiang Huang
Chinese Academy of Sciences
Institute of Genetics and
Developmental Biology
Center for Molecular Systems Biology
Key Laboratory of
Molecular Developmental Biology
Lincui East Road
100101 Beijing
China
Anil G. Jegga
Cincinnati Children's Hospital
Medical Center
Division of Biomedical Informatics
Cincinnati, OH 45229
USA
and
University of Cincinnati
Department of Biomedical Engineering
Cincinnati, OH 45229
USA
and
University of Cincinnati
College of Medicine
Department of Pediatrics
Cincinnati, OH 45229
USA
Eugene Kirillov
Thomson Reuters
Healthcare & Life Sciences
169 Saxony Road
Encinitas, CA 92024
USA
Younhee Ko
University of Illinois at
Urbana-Champaign
Department of Animal Sciences
1207 W. Gregory Dr.
Urbana, IL 61801
USA
and
University of Illinois at
Urbana-Champaign
Institute for Genomic Biology
1205 W. Gregory Drive
Urbana, IL 61801
USA
Rainer König
University of Heidelberg
Institute of Pharmacy and Molecular
Biotechnology
Bioquant
Im Neuenheimer Feld 267
69120 Heidelberg
Germany
Xiangfang Li
Texas A&M University
Genomic Signal Processing Laboratory
TAMU 3128
College Station, TX 77843
USA
Arno Lukas
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Bernd Mayer
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Patrick E. Meyer
Université Libre de Bruxelles
Computer Science Department
Machine Learning Group
Boulevard du Triomphe
1050 Brussels
Belgium
Konrad Mönks
Vienna University of Technology
Institute of Computer Languages
Theory and Logics Group
Favoritenstrasse 9
1040 Vienna
Austria
and
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Irmgard Mühlberger
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Tatiana Nikolskaya
Thomson Reuters
Healthcare & Life Sciences
169 Saxony RD
Encinitas, CA 92024
USA
Yuri Nikolsky
Thomson Reuters
Healthcare & Life Sciences
169 Saxony Road
Encinitas, CA 92024
USA
Catharina Olsen
Université Libre de Bruxelles
Computer Science Department
Machine Learning Group
Boulevard du Triomphe
1050 Brussels
Belgium
Paul Perco
Emergentec Biodevelopment GmbH
Gersthofer Strasse 29-31
1180 Vienna
Austria
Kitiporn Plaimas
University of Heidelberg
Institute of Pharmacy and Molecular
Biotechnology
Bioquant
Im Neuenheimer Feld 267
69120 Heidelberg
Germany
Thomas N. Plasterer
Northeastern University
Department of Chemistry and
Chemical Biology
360 Huntington Ave.
Boston, MA 02115
USA
and
Pharmacogenetics Clinical Advisory
Board
2000 Commonwealth Avenue, Suite 200
Auburndale, MA 02466
USA
Lijun Qian
Texas A&M University System
Prairie View A&M University
Department of Electrical and
Computer Engineering
MS2520, POB 519
Prairie View, TX 77446
USA
Daniel Remondini
Università di Bologna
Department of Physics
INFN Bologna Section and
Galvani Center for Biocomplexity
40127 Bologna
Italy
Sandra Rodriguez-Zas
University of Illinois at
Urbana-Champaign
Department of Animal Sciences
1207 W. Gregory Drive
Urbana, IL 61801
USA
Debashis Sahoo
Instructor of Pathology and Siebel
Fellow at Institute of Stem Cell Biology
and Regenerative Medicine
Lorry I. Lokey Stem Cell Research
Building
265 Campus Drive, Rm G3101B
Stanford, CA 94305
USA
Shigeru Saito
Infocom Corp.
Chem & Bio Informatics Department
Sumitomo Fudosan Harajuku Building
2-34-17, Jingumae, Shibuya-ku
150-0001 Tokyo
Japan
Robert Stanley
IO Informatics Inc.
2550 Ninth Street
Berkeley, CA 94710-2549
USA
Gautam S. Thakur
University of Florida
Department of Computer and
Information Science and Engineering
Science
PO Box 116120
Gainsville, FL 32611-6120
USA
Tianhai Tian
University of Glasgow
Department of Mathematics
University Gardens
Glasgow G12 8QW
UK
Nam-Kiu Tsing
The University of Hong Kong
Department of Mathematics
Pok Fu Lam Road
Hong Kong
China
Eberhard O. Voit
Georgia Tech and Emory University
The Wallace H. Coulter Department
of Biomedical Engineering
313 Ferst Drive
Atlanta, GA 30332
USA
Haixin Wang
Fort Valley State University
Department of Mathematics and
Computer Science
CTM 101A
Fort Valley, GA 31030
USA
Matthew Weirauch
University of Toronto
Banting and Best Department
of Medical Research and
Donnelly Centre for
Cellular and Biomolecular Research
160 College Street
Toronto, ON, M5S 3E1
Canada
Hong Yu
Chinese Academy of Sciences
Institute of Genetics and
Developmental Biology
Center for Molecular Systems Biology
Key Laboratory of
Molecular Developmental Biology
Lincui East Road
100101 Beijing
China
Wei Zhang
Chinese Academy of Sciences
Institute of Genetics and
Developmental Biology
Center for Molecular Systems Biology
Key Laboratory of
Molecular Developmental Biology
Lincui East Road
100101 Beijing
China
Part One
Modeling, Simulation, and Meaning of Gene Networks
Chapter 2
Stochastic Modeling of Gene Regulatory Networks
Tianhai Tian
2.1 Introduction
Recent studies through biological experiments have indicated that noise plays a very important role in determining the dynamic behavior of biological systems. Since the research work on stochastic modeling of the regulatory network of phage [1, 2], there have been an increasing number of studies in the last decade investigating the origins of noise in biological networks and its crucial role in determining the key properties of biological networks [3–5]. Experimental studies have demonstrated that noise in cellular processes may result from a small number of molecular species, intermittent gene activity, and fluctuations of experimental conditions [6–9]. Empirical discoveries have stimulated explosive research interests in developing stochastic models for a wide range of biological systems, including gene regulatory networks [10–12], cell signaling pathways [13–15], and metabolic pathways [16, 17].
It has been proposed that noise in the form of random fluctuations arises in biological networks in one of two ways: internal (intrinsic) noise or external (extrinsic) noise [18, 19]. The internal noise is mainly derived from the chance events of biochemical reactions in the system due to small copy numbers of certain key molecular species. External noise mainly refers to the environmental fluctuations or the noise propagation from the upstream biological pathways. In addition, there are two major types of response of biological systems to noise. In the first case, living systems are optimized to function in the presence of stochastic fluctuations, and biochemical networks must withstand considerable variations and random perturbations of biochemical parameters [20–22]. Such a property of biological systems is known as “robustness” [23, 24]. On the other hand, biological systems are also sensitive to environmental fluctuations and/or intrinsic noise in certain time periods. For example, noise in gene expression could lead to qualitative differences in a cell's phenotype if the expressed genes act as inputs to downstream regulatory thresholds [8, 25, 26].
One of the major challenges in systems biology is the development of quantitative mathematical models for studying regulatory mechanisms in complex biological systems [27]. Although deterministic models have been widely used for analyzing gene regulatory networks, cell signaling pathways, and metabolic systems [28, 29], a deterministic model can only describe the averaged behavior of a system based on large populations, but cannot realize fluctuations of the system behavior in different cells. Recently, there has been an accelerating interest in the investigation of the effect of noise in genetic regulation through stochastic modeling. Although stochastic models have been developed based on detailed knowledge of biochemical reactions, data availability and regulatory information usually cannot provide a comprehensive picture of biological regulations. In recent years, a number of approaches have been proposed to develop either continuous or discrete stochastic models for the study of noise in large-scale gene regulatory networks. These methods include stochastic Boolean models [30, 31], probabilistic hybrid approaches [32], stochastic Petri nets [33, 34], stochastic differential equations (SDEs) [35, 36], and multiscale (hybrid) models that include both stochastic and deterministic dynamics [37, 38].
Systems of ordinary differential equations (ODEs) have been widely used to model biological systems and there are a large number of well-developed deterministic models for a broad range of biological systems. An important question in stochastic modeling is how to develop stochastic models by introducing stochastic processes into deterministic models for the external and/or internal noise. This chapter will use a number of modeling approaches and biological systems to address this issue. The remaining part of this chapter is organized as follows. Section 2.2 discusses numerical methods for simulating chemical reaction systems. These methods are the theoretical basis for designing stochastic models in the following sections. A general modeling approach for developing discrete stochastic models is discussed in Section 2.3. Section 2.4 provides a number of techniques for designing continuous stochastic models by using SDEs.
2.2 Discrete Stochastic Simulation Methods
Since many cellular processes are governed by effects associated with small numbers of certain key molecules, the standard chemical framework described by systems of ODEs breaks down. The stochastic simulation algorithm (SSA) represents a discrete modeling approach and an essentially exact procedure for numerically simulating the time evolution of a well-stirred reaction system [39]. The advances in stochastic modeling of gene regulatory networks and cell signaling transduction pathways have stimulated growing research interests in the development of effective methods for simulating chemical reaction systems. These effective simulation methods in return provided innovative methodologies for designing stochastic models of biological systems.
2.2.1 SSA
It is assumed that a chemical reaction system is a well-stirred mixture at constant temperature in a fixed volume . This mixture consists of molecular species that chemically interact through reaction channels . The dynamic state of this system is denoted as , where is the molecular number of species in the system at time . For each reaction (), a propensity function is defined for a given state and the value of represents the probability that one reaction will fire somewhere inside in the infinitesimal time interval . In addition, a state change vector is defined to characterize reaction . The element of represents the change in the copy number of species due to reaction . The matrix with elements is called the stoichiometric matrix.
The SSA is a statistically exact procedure for generating the time and index of the next occurring reaction in accordance with the current values of the propensity functions. In each time step, two random numbers are generated to determine the time step and the index of the next reaction. There are several forms of this algorithm. The widely used direct method works as described in Method 2.1.
Method 2.1 Direct Method [39]
Step 1: Calculate the values of propensity functions based on the system state at time and .
Step 2: Generate a sample of the uniformly distributed random variable and determine the time of the next reaction:
Step 3: Generate an independent sample of to determine the index of the next reaction occurring in :
Step 4: Update the state of the system by:
(2.1)
Step 5: Go to Step (1) if , where is the end time point. Otherwise, the system state .
Another exact method is the first reaction method that uses random numbers at each step to determine the possible reaction time of each reaction channel [40]. The reaction firing in the next step is that needing the smallest reaction time. Compared to the direct method, the first reaction method is not effective since it discards random numbers at each step. To improve the efficiency of the first reaction method, Gilson and Bruck [41] proposed the next reaction method by recycling the generated random numbers. The putative step size of a reaction channel is updated based on the step size of this channel at the previous step and values of the propensity function at these two steps. In addition, a so-called dependency graph was designed to reduce the computing time of propensity functions. Numerical results indicated that the next reaction method is effective for simulating systems with many species and reaction channels.
The SSA assumes that the next reaction will fire in the next reaction time interval with small values of . For systems including both fast and slow reactions, however, this assumption may not be valid if the slow reactions take a much longer time than the fast reactions. The large reaction time of slow reactions should be realized by time delay if we hope to put both fast and slow reactions in a system consistently and to study the impact of slow reactions on the system dynamics [42]. Recently, the delay SSA (delay stochastic simulation algorithmDSSA) was designed to simulate chemical reaction systems with time delays [43–45]. These methods have been used to validate stochastic models for biological systems with slow reactions [46, 47]. However, compared with the significant progress in designing simulation methods for biological systems without time delay [48, 49], only a few simulation methods have been designed to improve the efficiency of the DSSA [50, 51]. Similar to the effective methods for simulating biological systems without time delay, it is expected the progress in designing effective methods for simulating systems with time delay will also provide methodologies for modeling biological systems with time delay.
2.2.2 Accelerating -Leap Methods
Since the SSA can be very computationally inefficient, considerable attention has been paid recently to reducing the computational time for simulating stochastic chemical kinetics. Gillespie [52] proposed the -leap methods in order to improve the efficiency of the SSA while maintaining acceptable losses in accuracy. The key idea of the -leap methods is to take a larger time step and allow for more reactions to take place in that step. In the Poisson -leap method, the number of times that the reaction channel will fire in the time interval is approximated by a Poisson random variable () based on the present state at time [52]. Here, the leap size should satisfies the Leap Condition: a temporal leap by will result in a state change such that for every reaction channel , is “effectively infinitesimal” [52]. This method is given in Method 2.2.
Method 2.2 Poisson -Leap Method [52]
Step 1: Calculate the values of propensity functions based on the system state at time .
Step 2: Choose a value for the leap size that satisfies the Leap Condition.
Step 3: Generate a sample value of the Poisson random variable for each reaction channel ().
Step 4: Perform the updates of the system by:
(2.2)
A major step of the Poisson -leap method is to choose an appropriate step size that satisfies the Leap condition. Gillespie first proposed a simple procedure to determine the leap size [52]. In this formula, the expected change of each propensity function during should be bounded by with a given error control parameter :
(2.3)
where is the expected net change in state in , which can be calculated by . Later, more sophisticated methods have been proposed in order either to select the optimal leap size or to avoid the possible negative molecular numbers in simulation. For example, Gillespie and Petzold [53] proposed a method by considering both the mean and standard deviation of the expected change in the propensity functions. This method is an extension of the method (Equation 2.3) that only considered the mean of the expected change. It is worth noting that the leap size is a preselected deterministic value and is determined by the error control parameter . Like many other numerical methods, the leap size is related to the balance between computational efficiency and accuracy. In addition, our simulation results [54] indicated that the computing time for selecting the leap size is about a half of the total computing time when using the method of Gillespie and Petzold [53].
Since the samples of a Poisson random variable are unbounded, negative molecular numbers may be obtained if certain species have small molecular numbers and the propensity function involving that species has a large value. There are two ways of obtaining negative molecular numbers in stochastic simulations [55]. The first case is that the generated sample of reaction number is greater than one of the molecular numbers in that reaction channel. In the second case, a species involves a number of reaction channels and the total reaction number of these channels is greater than the copy number of that species, although the reaction number of each channel may be smaller than the molecular number.
For tackling the problem of negative numbers, binomial random variables were introduced to avoid the negative numbers of the first case by restricting the possible reaction numbers in the next time interval [55, 56]. In the binomial -leap method, the reaction number of channel is defined by a sample value of the binomial random variable under the condition . The maximal possible reaction number has been defined for the widely used three types of elementary reactions. In addition, a sampling technique was designed for sampling the total reaction number of a group of reaction channels if a reactant species involves these reaction channels [55]. The binomial -leap method is given in Method 2.3.
Method 2.3 Binomial -Leap Method [55]
Step 0: Define the maximal possible reaction number for each reaction channel. If a species involves two or more reaction channels , define a maximal possible total reaction number for these reaction channels.
Step 1: Calculate the values of propensity functions based on the system state at time .
Step 2: Use a method to determine the value of leap size . Check the step size conditions of the binomial random variables. If necessary, reduce the step size to satisfy these conditions.
Step 3: Generate a sample value of the binomial random variable for reaction channels in which species involve one single reaction. When a species involves two or more reaction channels, generate a total reaction number for these reaction channels and then generate the reaction number of each reaction channel in this group.
Step 4: Perform the updates of the system by:
(2.4)
In the -leap methods, it is assumed that, during a preselected time step , the number of fires of each reaction channel is a sample of a random variable. Another major type of leap method is the -leap method [52]. In an implementation of the -leap method, which is the so-called -leap method [57], it was proposed to select a predefined number of firings that may span several reaction channels. Then the leap step of these reactions is a sample of the Γ random variable . Over the time interval , the number of firings of each reaction channel (satisfying ) follows the correlated binomial distributions. A number of techniques have been proposed in the -leap method to determine the total reaction number and to sample the firing number of each reaction channel [57]. A similar approach, which is called the -leap method, was also proposed to achieve the computing efficiency over the exact SSA [58].
2.2.3 Langevin Approach
When the molecular numbers () in a chemical reaction system are quite large, the value of in the Poisson -leap method may be large for an appropriately selected step size . In this case, the Poisson random variable can be approximated by a normal random variable with the same mean and variance, given by [59]:
Then the Poisson -leap method (2.2) can be approximated by the following formula with normal random variables:
(2.5)
where the normal variables are all statistically independent. The above scheme is the explicit Euler method [60] for solving the chemical Langevin equation:
where is the Wiener process. If the molecular numbers in the system are very large, the value of may be still very large for a given step size . In this case, compared with the drift term , the diffusion term in (2.5) is neglectable. Finally, we obtained the explicit Euler method for solving the chemical rate equation:
The chemical Langevin equation links three types of important modeling regimes, namely the discrete stochastic models simulated by the SSA or Poisson -leap method, continuous stochastic models in terms of SDEs, and continuous deterministic systems of ODEs. In addition, the Langevin approach provides a method to describe internal noise of chemical reactions in the continuous SDE framework. When a reaction system has relatively large molecular numbers, the SDE models can be used to describe the system dynamics more efficiently than the discrete stochastic models. The chemical Langevin equation is also the theoretical basis of the multiscale simulation methods [61, 62]. Based on the molecular numbers and values of propensity functions, chemical reactions can be partitioned into a few reaction subsets at different time steps and then different simulation methods can be employed to simulate different subsets of chemical reactions. For example, Burrage et al. [63] proposed an adaptive approach to divide a reaction system into slow, intermediate, and fast reaction subsets, and used the SSA, Poisson -leap method, and SDEs to simulate the reactions in different subsets. Different partitioning techniques and different simulation methods have led to a number of effective methods and software for simulating chemical reaction systems [64–66].
2.3 Discrete Stochastic Modeling
Due to the lack of detailed knowledge of biochemical reactions, kinetic rates, and molecular numbers, stochastic models based on elementary chemical reactions may not always be the practical method to study chemical reaction systems. This section discusses a general approach to develop stochastic models based on widely used deterministic ODE models [67]. Instead of studying noise from detailed information of biochemical reactions, stochastic models will be developed by using macroscopic variables at some intermediate levels. Based on the stochastic simulation methods discussed in the previous section, the key idea of this method is to use Poisson random variables to represent chance events in protein synthesis, degradation, molecular diffusion and other biological processes. This technique is also consistent with other stochastic modeling approaches where Poisson random variables have been used for realizing the chance events in transcription and translation [68].
2.3.1 Stochastic Modeling Method
We first use a simple system to illustrate the relationship between a stochastic model, simulated by the Poisson -leap method, with the corresponding deterministic ODE model simulated by the Euler method. This system includes two reactions:
(2.6)
By using the Poisson -leap method, the number of molecules within the time interval is updated by
By assuming the independence of and , the mean of molecular numbers in the above Poisson -leap method can be obtained by:
which is the Euler method for solving the ODE with respect to , given by:
The above ODE is the chemical kinetic rate equation of species in the reaction system (2.6).
A further example is the enzymatic reaction:
(2.7)