Applied Statistics for Network Biology -  - E-Book

Applied Statistics for Network Biology E-Book

0,0
142,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

The book introduces to the reader a number of cutting edge statistical methods which can e used for the analysis of genomic, proteomic and metabolomic data sets. In particular in the field of systems biology, researchers are trying to analyze as many data as possible in a given biological system (such as a cell or an organ). The appropriate statistical evaluation of these large scale data is critical for the correct interpretation and different experimental approaches require different approaches for the statistical analysis of these data. This book is written by biostatisticians and mathematicians but aimed as a valuable guide for the experimental researcher as well computational biologists who often lack an appropriate background in statistical analysis.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 842

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Related Titles

Title Page

Copyright

Preface

References

List of Contributors

Part One: Modeling, Simulation, and Meaning of Gene Networks

Chapter 1: Network Analysis to Interpret Complex Phenotypes

1.1 Introduction

1.2 Identification of Important Genes based on Network Topologies

1.3 Inferring Information from Known Networks

1.4 Conclusions

References

Chapter 2: Stochastic Modeling of Gene Regulatory Networks

2.1 Introduction

2.2 Discrete Stochastic Simulation Methods

2.3 Discrete Stochastic Modeling

2.4 Continuous Stochastic Modeling

2.5 Stochastic Models for Both Internal and External Noise

2.6 Conclusions

References

Chapter 3: Modeling Expression Quantitative Trait Loci in Multiple Populations

3.1 Introduction

3.2 IGM Method

3.3 CTWM

3.4 CTWM-GS Method

3.5 Discussion

References

Part Two: Inference of Gene Networks

Chapter 4: Transcriptional Network Inference Based on Information Theory

4.1 Introduction

4.2 Inference Based on Conditional Mutual Information

4.3 Inference Based on Pairwise Mutual Information

4.4 Arc Orientation

4.5 Conclusions

References

Chapter 5: Elucidation of General and Condition-Dependent Gene Pathways Using Mixture Models and Bayesian Networks

5.1 Introduction

5.2 Methodology

5.3 Applications

5.4 Conclusions

References

Chapter 6: Multiscale Network Reconstruction from Gene Expression Measurements: Correlations, Perturbations, and “A Priori Biological Knowledge”

6.1 Introduction

6.2 “Perturbation Method”

6.3 Network Reconstruction by the Correlation Method from Time-Series Gene Expression Data

6.4 Network Reconstruction from Gene Expression Data by A Priori Biological Knowledge

6.5 Examples and Methods of Correlation Network Analysis on Time-Series Data

6.6 Examples and Methods for Pathway Network Analysis

6.7 Discussion

6.8 Conclusions

References

Chapter 7: Gene Regulatory Networks Inference: Combining a Genetic Programming and H∞ Filtering Approach

7.1 Introduction

7.2 Background

7.3 Methodology for Identification and Algorithm Description

7.4 Simulation Evaluation

7.5 Conclusions

Appendix 7.A: Comparison between the Kalman Filter and H∞ Filter

Note

References

Chapter 8: Computational Reconstruction of Protein Interaction Networks

8.1 Introduction

8.2 Protein Interaction Networks

8.3 Characterization of Computed Networks

8.4 Conclusions

References

Part Three: Analysis of Gene Networks

Chapter 9: What if the Fit is Unfit? Criteria for Biological Systems Estimation Beyond Residual Errors

9.1 Introduction

9.2 Model Design

9.3 Concepts and Challenges of Parameter Estimation

9.4 Conclusions

Acknowledgments

References

Chapter 10: Machine Learning Methods for Identifying Essential Genes and Proteins in Networks

10.1 Introduction

10.2 Definitions and Constructions of the Network

10.3 Network Descriptors

10.4 Machine Learning

10.5 Some Examples of Applications

10.6 Conclusions

References

Chapter 11: Gene Coexpression Networks for the Analysis of DNA Microarray Data

11.1 Introduction

11.2 Background

11.3 Construction of GCNs

11.4 Integration of GCNs with Other Data

11.5 Analysis of GCNs

11.6 GCNs for the Study of Cancer

11.7 Conclusions

Acknowledgments

References

Chapter 12: Correlation Network Analysis and Knowledge Integration

12.1 Introduction

12.2 Systems Biology Data Quandaries

12.3 Semantic Web Approaches

12.4 Correlation Network Analysis

12.5 Knowledge Annotation for Networks

12.6 Future Developments

References

Chapter 13: Network Screening: A New Method to Identify Active Networks from an Ensemble of Known Networks

13.1 Introduction

13.2 Methods

13.3 Example Applications

13.4 Discussion

References

Chapter 14: Community Detection in Biological Networks

14.1 Introduction

14.2 Centrality Measures

14.3 Study of Complex Systems

14.4 Overview

14.5 Proposed Algorithm

14.6 Experiments

14.7 Further Improvements

14.8 Conclusions

Acknowledgments

References

Chapter 15: On Some Inverse Problems in Generating Probabilistic Boolean Networks

15.1 Introduction

15.2 Reviews on BNs and PBNs

15.3 Construction of PBNs from a Prescribed Stationary Distribution

15.4 Construction of PBNs from a Prescribed Transition Probability Matrix

15.5 Conclusions

Acknowledgments

References

Chapter 16: Boolean Analysis of Gene Expression Datasets

16.1 Introduction

16.2 Boolean Analysis

16.3 Main Organization

16.4 StepMiner

16.5 StepMiner Algorithm

16.6 BooleanNet

16.7 BooleanNet Algorithm

16.8 Conclusions

Acknowledgements

References

Part Four: Systems Approach to Diseases

Chapter 17: Representing Cancer Cell Trajectories in a Phase-Space Diagram: Switching Cellular States by Biological Phase Transitions

17.1 Introduction

17.2 Beyond Reductionism

17.3 Cell Shape as a Diagram of Forces

17.4 Morphologic Phenotypes and Phase Transitions

17.5 Cancer as an Anomalous Attractor

17.6 Shapes as System Descriptors

17.7 Fractals of Living Organisms

17.8 Fractals and Cancer

17.9 Modifications in Cell Shape Precede Tumor Metabolome Reversion

17.10 Conclusions

References

Chapter 18: Protein Network Analysis for Disease Gene Identification and Prioritization

18.1 Introduction

18.2 Protein Networks and Human Disease

18.3 ToppGene Suite of Applications

18.4 Conclusions

References

Chapter 19: Pathways and Networks as Functional Descriptors for Human Disease and Drug Response Endpoints

19.1 Introduction

19.2 Gene Content Classifiers and Functional Classifiers

19.3 Biological Pathways and Networks Have Different Properties as Functional Descriptors

19.4 Applications of Pathways as Functional Classifiers

19.5 Single Pathway Learning for Identifying Functional Descriptor Pathways

19.6 Multiple-Path Learning (MPL) Algorithm for Pathway Descriptors

19.7 Applications of MPL-Deduced Pathway Descriptors

19.8 Combining Advantages of Pathways and Networks

19.9 Key Upstream and Downstream Interactions of Genetically Altered Genes and “Universal Cancer Genes”

19.10 Conclusions

References

Index

Related Titles

Emmert-Streib, F., Dehmer, M. (eds.)

Medical Biostatistics for Complex Diseases

2010

ISBN: 978-3-527-32585-6

Dehmer, M., Emmert-Streib, F. (eds.)

Analysis of Complex Networks

From Biology to Linguistics

2009

ISBN: 978-3-527-32345-6

Emmert-Streib, F., Dehmer, M. (eds.)

Analysis of Microarray Data

A Network-Based Approach

2008

ISBN: 978-3-527-31822-3

Junker, B. H., Schreiber, F.

Analysis of Biological Networks

2008

ISBN: 978-0-470-04144-4

Stolovitzky, G., Califano, A. (eds.)

Reverse Engineering Biological Networks

Opportunities and Challenges in Computational Methods for Pathway Inference

2007

ISBN: 978-1-57331-689-7

The Editors

Matthias Dehmer

UMIT

Institute for Bioinformatics

and Translational Research

Eduard Wallnöfer Zentrum 1

6060 Hall, Tyrol

Austria

Frank Emmert-Streib

Queen's University Belfast

Center for Cancer Research and Cell Biology

97, Lisburn Road

Belfast BT9 7BL

United Kingdom

Armin Graber

UMIT

Institute for Bioinformatics

and Translational Research

Eduard Wallnöfer Zentrum 1

6060 Hall, Tyrol

Austria

and

Novartis Pharmaceuticals Corporation

Oncology Biomarkers and Imaging

One Health Plaza

East Hanover, NJ 07936

USA

Armindo Salvador

University of Coimbra

Center for Neuroscience and

Cell Biology, Department of Chemistry

3004-535 Coimbra

Portugal

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose.

No warranty can be created or extended by sales representatives or written sales materials. The Advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Card No.: applied for

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.

© 2011 Wiley-VCH Verlag & Co. KGaA,

Boschstr. 12, 69469 Weinheim, Germany

Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical, and Medical business with Blackwell Publishing.

All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.

ISBN: 978-3-527-32750-8

Preface

For the field of systems biology to mature, novel statistical and computational analysis methods are needed to deal with the growing amount of high-throughput data from genomics and genetics experiments. This book presents such methods and applications to data from biological and biomedical problems. Nowadays, it is widely recognized that networks form a very fruitful representation for studying problems in systems biology [1, 2]. However, many traditional methods do not make explicit use of a network representation of the data. For this reason, the topics treated in this book explore statistical and computational data analysis aspects of networks in systems biology [3–6].

Biological phenotypes are mediated by very intricate networks of interactions among biological components. This book covers extensively what we view as two complementary but strongly interrelated challenges in network biology. The first lies in inferring networks from experimental observations of state variables of a system. Interactions among molecular components are traditionally characterized through equilibrium binding or kinetic experiments in vitro with dilute solutions of the purified components. However, such experiments are typically low throughput and unable to properly account for the conditions prevailing in vivo, where factors such as molecular crowding, spatial heterogeneity, and the presence of ligands might strongly modify the interactions of interest. The possibility of inferring network connectivity and even quantitative interaction parameters from observations of intact living systems is attracting considerable research interest as a way of escaping such shortcomings. The fact that biological networks are complex, that problems are often poorly constrained, and that data are often high dimensional and noisy makes this challenge daunting. The second and perhaps equally difficult challenge lies in deriving results that are both biologically relevant and reliable from incomplete and uncertain information about biological interaction networks. We hope that the contributions in the subsequent chapters will help the reader understand and meet these challenges.

This book is intended for researches and graduate and advanced undergraduate students in the interdisciplinary fields of computational biology, biostatistics, bioinformatics, and systems biology studying problems in biological and biomedical sciences. The book is organized in four main parts: Part One: Modeling, Simulation, and Meaning of Gene Networks; Part Two: Inference of Gene Networks; Part 3: Analysis of Gene Networks; and Part Four: Systems Approach to Diseases. Each part consists of chapters that emphasize the topic of the corresponding part, however, without being disconnected from the remainder of the book. Overall, to order the different parts we assumed an intuitive – problem-oriented – perspective moving from Modeling, Simulation, and Meaning of Gene Networks to Inference of Gene Networks and Analysis of Gene Networks. The last part presents biomedical applications of various methods in Systems Approach to Diseases.

Each chapter is comprehensively presented, accessible not only to researchers from this field but also to advanced undergraduate or graduate students. For this reason, each chapter not only presents technical results but also provides background knowledge necessary to understand the statistical method or the biological problem under consideration. This allows to use this book as a textbook for an interdisciplinary seminar for advanced students not only because of the comprehensiveness of the chapters but also because of its size allowing to fill a complete semester.

Many colleagues, whether consciously or unconsciously, have provided us with input, help, and support before and during the preparation of this book. In particular, we would like to thank Andreas Albrecht, Gökmen Altay, Subhash Basak, Danail Bonchev, Maria Duca, Dean Fennell, Galina Glazko, Martin Grabner, Beryl Graham, Peter Hamilton, Des Higgins, Puthen Jithesh, Patrick Johnston, Frank Kee, Terry Lappin, Kang Li, D. D. Lozovanu, Dennis McCance, James McCann, Alexander Mehler, Abbe Mowshowitz, Ken Mills, Arcady Mushegian, Katie Orr, Andrei Perjan, Bert Rima, Brigitte Senn-Kircher, Ricardo de Matos Simoes, Francesca Shearer, Fred Sobik, John Storey, Simon Tavaré, Shailesh Tripathi, Kurt Varmuza, Bruce Weir, Pat White, Kathleen Williamson, Shu-Dong Zhang, and Dongxiao Zhu and apologize to all who have not been named mistakenly. We would also like to thank our editors Andreas Sendtko and Gregor Cicchetti from Wiley-VCH who have been always available and helpful.

Finally, we hope that this book will help to spread out the enthusiasm and joy we have for this field and inspire people regarding their own practical or theoretical research problems.

March 2011

Belfast, Hall/Tyrol, and Coimbra

Matthias Dehmer,Frank Emmert-Streib,Armin Graber,and Armindo Salvador

References

1. Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding the cell's functional organization. Nat. Rev. Genet., 5, 101–113.

2. Emmert-Streib, F. and Glazko, G. (2011)Network biology: a direct approach to study biological function. WIREs Syst. Biol. Med., in press.

3. Alon, U. (2006) An Introduction to Systems Biology: Design Principles of Biological Circuits, Chapman & Hall/CRC.

4. Bertalanffy, L. von. (1950) An outline of general systems theory. Br. J. Philos. Sci., 1 (2)

5. Kitano, H. (ed.) (2001) Foundations of Systems Biology, MIT Press.

6. Palsson, B.O. (2006) Systems Biology: Properties of Reconstructed Networks, Cambridge University Press.

List of Contributors

Andreas Bernthaler

Vienna University of Technology

Institute of Computer Languages

Theory and Logics Group

Favoritenstrasse 9

1040 Vienna

Austria

Marina Bessarabova

Thomson Reuters

Healthcare & Life Sciences

169 Saxony Road

Encinitas, CA 92024

USA

Mariano Bizzarri

Sapienza University

Department of Experimental Medicine

Viale Regina Elena 324

00161 Rome

Italy

Gianluca Bontempi

Université Libre de Bruxelles

Computer Science Department

Machine Learning Group

Boulevard du Triomphe

1050 Brussels

Belgium

Gastone Castellani

Università di Bologna

Department of Physics

INFN Bologna Section and

Galvani Center for Biocomplexity

40127 Bologna

Italy

Jing Chen

University of Cincinnati

Department of Environmental Health

Cincinnati, OH 45229

USA

Xi Chen

The University of Hong Kong

Department of Mathematics

Pok Fu Lam Road

Hong Kong

China

Wai-Ki Ching

The University of Hong Kong

Department of Mathematics

Pok Fu Lam Road

Hong Kong

China

Zoltan Dezso

Thomson Reuters

Healthcare & Life Sciences

169 Saxony Road

Encinitas, CA 92024

USA

Cathy S. J. Fann

Academia Sinica

Institute of Biomedical Sciences

Academia Road, Nankang

115 Taipei

Taiwan

Raul Fechete

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Rudolf Freund

Vienna University of Technology

Institute of Computer Languages

Theory and Logics Group

Favoritenstrasse 9

1040 Vienna

Austria

Alessandro Giuliani

Istituto Superiore di Sanità

Department of Environment

and Health

Viale Regina Elena 299

00161 Rome

Italy

Erich Gombocz

IO Informatics Inc.

2550 Ninth Street

Berkeley, CA 94710-2549

USA

Jing-Dong J. Han

Chinese Academy of Sciences

Institute of Genetics and

Developmental Biology

Center for Molecular Systems Biology

Key Laboratory of

Molecular Developmental Biology

Lincui East Road

100101 Beijing

China

and

Chinese Academy of Sciences–

Max Planck Partner Institute for

Computational Biology

Shanghai Institutes for

Biological Sciences

Chinese Academy of Sciences

320 Yue Yang Road

200031 Shanghai

China

Katsuhisa Horimoto

National Institute of Advanced

Industrial Science Technology

Computational Biology Research Center

2-4-7, Aomi, Koto-ku

135-0064 Tokyo

Japan

Ching-Lin Hsiao

Academia Sinica

Institute of Biomedical Sciences

Academia Road, Nankang

115 Taipei

Taiwan

Jialiang Huang

Chinese Academy of Sciences

Institute of Genetics and

Developmental Biology

Center for Molecular Systems Biology

Key Laboratory of

Molecular Developmental Biology

Lincui East Road

100101 Beijing

China

Anil G. Jegga

Cincinnati Children's Hospital

Medical Center

Division of Biomedical Informatics

Cincinnati, OH 45229

USA

and

University of Cincinnati

Department of Biomedical Engineering

Cincinnati, OH 45229

USA

and

University of Cincinnati

College of Medicine

Department of Pediatrics

Cincinnati, OH 45229

USA

Eugene Kirillov

Thomson Reuters

Healthcare & Life Sciences

169 Saxony Road

Encinitas, CA 92024

USA

Younhee Ko

University of Illinois at

Urbana-Champaign

Department of Animal Sciences

1207 W. Gregory Dr.

Urbana, IL 61801

USA

and

University of Illinois at

Urbana-Champaign

Institute for Genomic Biology

1205 W. Gregory Drive

Urbana, IL 61801

USA

Rainer König

University of Heidelberg

Institute of Pharmacy and Molecular

Biotechnology

Bioquant

Im Neuenheimer Feld 267

69120 Heidelberg

Germany

Xiangfang Li

Texas A&M University

Genomic Signal Processing Laboratory

TAMU 3128

College Station, TX 77843

USA

Arno Lukas

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Bernd Mayer

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Patrick E. Meyer

Université Libre de Bruxelles

Computer Science Department

Machine Learning Group

Boulevard du Triomphe

1050 Brussels

Belgium

Konrad Mönks

Vienna University of Technology

Institute of Computer Languages

Theory and Logics Group

Favoritenstrasse 9

1040 Vienna

Austria

and

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Irmgard Mühlberger

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Tatiana Nikolskaya

Thomson Reuters

Healthcare & Life Sciences

169 Saxony RD

Encinitas, CA 92024

USA

Yuri Nikolsky

Thomson Reuters

Healthcare & Life Sciences

169 Saxony Road

Encinitas, CA 92024

USA

Catharina Olsen

Université Libre de Bruxelles

Computer Science Department

Machine Learning Group

Boulevard du Triomphe

1050 Brussels

Belgium

Paul Perco

Emergentec Biodevelopment GmbH

Gersthofer Strasse 29-31

1180 Vienna

Austria

Kitiporn Plaimas

University of Heidelberg

Institute of Pharmacy and Molecular

Biotechnology

Bioquant

Im Neuenheimer Feld 267

69120 Heidelberg

Germany

Thomas N. Plasterer

Northeastern University

Department of Chemistry and

Chemical Biology

360 Huntington Ave.

Boston, MA 02115

USA

and

Pharmacogenetics Clinical Advisory

Board

2000 Commonwealth Avenue, Suite 200

Auburndale, MA 02466

USA

Lijun Qian

Texas A&M University System

Prairie View A&M University

Department of Electrical and

Computer Engineering

MS2520, POB 519

Prairie View, TX 77446

USA

Daniel Remondini

Università di Bologna

Department of Physics

INFN Bologna Section and

Galvani Center for Biocomplexity

40127 Bologna

Italy

Sandra Rodriguez-Zas

University of Illinois at

Urbana-Champaign

Department of Animal Sciences

1207 W. Gregory Drive

Urbana, IL 61801

USA

Debashis Sahoo

Instructor of Pathology and Siebel

Fellow at Institute of Stem Cell Biology

and Regenerative Medicine

Lorry I. Lokey Stem Cell Research

Building

265 Campus Drive, Rm G3101B

Stanford, CA 94305

USA

Shigeru Saito

Infocom Corp.

Chem & Bio Informatics Department

Sumitomo Fudosan Harajuku Building

2-34-17, Jingumae, Shibuya-ku

150-0001 Tokyo

Japan

Robert Stanley

IO Informatics Inc.

2550 Ninth Street

Berkeley, CA 94710-2549

USA

Gautam S. Thakur

University of Florida

Department of Computer and

Information Science and Engineering

Science

PO Box 116120

Gainsville, FL 32611-6120

USA

Tianhai Tian

University of Glasgow

Department of Mathematics

University Gardens

Glasgow G12 8QW

UK

Nam-Kiu Tsing

The University of Hong Kong

Department of Mathematics

Pok Fu Lam Road

Hong Kong

China

Eberhard O. Voit

Georgia Tech and Emory University

The Wallace H. Coulter Department

of Biomedical Engineering

313 Ferst Drive

Atlanta, GA 30332

USA

Haixin Wang

Fort Valley State University

Department of Mathematics and

Computer Science

CTM 101A

Fort Valley, GA 31030

USA

Matthew Weirauch

University of Toronto

Banting and Best Department

of Medical Research and

Donnelly Centre for

Cellular and Biomolecular Research

160 College Street

Toronto, ON, M5S 3E1

Canada

Hong Yu

Chinese Academy of Sciences

Institute of Genetics and

Developmental Biology

Center for Molecular Systems Biology

Key Laboratory of

Molecular Developmental Biology

Lincui East Road

100101 Beijing

China

Wei Zhang

Chinese Academy of Sciences

Institute of Genetics and

Developmental Biology

Center for Molecular Systems Biology

Key Laboratory of

Molecular Developmental Biology

Lincui East Road

100101 Beijing

China

Part One

Modeling, Simulation, and Meaning of Gene Networks

Chapter 2

Stochastic Modeling of Gene Regulatory Networks

Tianhai Tian

2.1 Introduction

Recent studies through biological experiments have indicated that noise plays a very important role in determining the dynamic behavior of biological systems. Since the research work on stochastic modeling of the regulatory network of phage [1, 2], there have been an increasing number of studies in the last decade investigating the origins of noise in biological networks and its crucial role in determining the key properties of biological networks [3–5]. Experimental studies have demonstrated that noise in cellular processes may result from a small number of molecular species, intermittent gene activity, and fluctuations of experimental conditions [6–9]. Empirical discoveries have stimulated explosive research interests in developing stochastic models for a wide range of biological systems, including gene regulatory networks [10–12], cell signaling pathways [13–15], and metabolic pathways [16, 17].

It has been proposed that noise in the form of random fluctuations arises in biological networks in one of two ways: internal (intrinsic) noise or external (extrinsic) noise [18, 19]. The internal noise is mainly derived from the chance events of biochemical reactions in the system due to small copy numbers of certain key molecular species. External noise mainly refers to the environmental fluctuations or the noise propagation from the upstream biological pathways. In addition, there are two major types of response of biological systems to noise. In the first case, living systems are optimized to function in the presence of stochastic fluctuations, and biochemical networks must withstand considerable variations and random perturbations of biochemical parameters [20–22]. Such a property of biological systems is known as “robustness” [23, 24]. On the other hand, biological systems are also sensitive to environmental fluctuations and/or intrinsic noise in certain time periods. For example, noise in gene expression could lead to qualitative differences in a cell's phenotype if the expressed genes act as inputs to downstream regulatory thresholds [8, 25, 26].

One of the major challenges in systems biology is the development of quantitative mathematical models for studying regulatory mechanisms in complex biological systems [27]. Although deterministic models have been widely used for analyzing gene regulatory networks, cell signaling pathways, and metabolic systems [28, 29], a deterministic model can only describe the averaged behavior of a system based on large populations, but cannot realize fluctuations of the system behavior in different cells. Recently, there has been an accelerating interest in the investigation of the effect of noise in genetic regulation through stochastic modeling. Although stochastic models have been developed based on detailed knowledge of biochemical reactions, data availability and regulatory information usually cannot provide a comprehensive picture of biological regulations. In recent years, a number of approaches have been proposed to develop either continuous or discrete stochastic models for the study of noise in large-scale gene regulatory networks. These methods include stochastic Boolean models [30, 31], probabilistic hybrid approaches [32], stochastic Petri nets [33, 34], stochastic differential equations (SDEs) [35, 36], and multiscale (hybrid) models that include both stochastic and deterministic dynamics [37, 38].

Systems of ordinary differential equations (ODEs) have been widely used to model biological systems and there are a large number of well-developed deterministic models for a broad range of biological systems. An important question in stochastic modeling is how to develop stochastic models by introducing stochastic processes into deterministic models for the external and/or internal noise. This chapter will use a number of modeling approaches and biological systems to address this issue. The remaining part of this chapter is organized as follows. Section 2.2 discusses numerical methods for simulating chemical reaction systems. These methods are the theoretical basis for designing stochastic models in the following sections. A general modeling approach for developing discrete stochastic models is discussed in Section 2.3. Section 2.4 provides a number of techniques for designing continuous stochastic models by using SDEs.

2.2 Discrete Stochastic Simulation Methods

Since many cellular processes are governed by effects associated with small numbers of certain key molecules, the standard chemical framework described by systems of ODEs breaks down. The stochastic simulation algorithm (SSA) represents a discrete modeling approach and an essentially exact procedure for numerically simulating the time evolution of a well-stirred reaction system [39]. The advances in stochastic modeling of gene regulatory networks and cell signaling transduction pathways have stimulated growing research interests in the development of effective methods for simulating chemical reaction systems. These effective simulation methods in return provided innovative methodologies for designing stochastic models of biological systems.

2.2.1 SSA

It is assumed that a chemical reaction system is a well-stirred mixture at constant temperature in a fixed volume . This mixture consists of molecular species that chemically interact through reaction channels . The dynamic state of this system is denoted as , where is the molecular number of species in the system at time . For each reaction (), a propensity function is defined for a given state and the value of represents the probability that one reaction will fire somewhere inside in the infinitesimal time interval . In addition, a state change vector is defined to characterize reaction . The element of represents the change in the copy number of species due to reaction . The matrix with elements is called the stoichiometric matrix.

The SSA is a statistically exact procedure for generating the time and index of the next occurring reaction in accordance with the current values of the propensity functions. In each time step, two random numbers are generated to determine the time step and the index of the next reaction. There are several forms of this algorithm. The widely used direct method works as described in Method 2.1.

Method 2.1 Direct Method [39]

Step 1: Calculate the values of propensity functions based on the system state at time and .

Step 2: Generate a sample of the uniformly distributed random variable and determine the time of the next reaction:

Step 3: Generate an independent sample of to determine the index of the next reaction occurring in :

Step 4: Update the state of the system by:

(2.1)

Step 5: Go to Step (1) if , where is the end time point. Otherwise, the system state .

Another exact method is the first reaction method that uses random numbers at each step to determine the possible reaction time of each reaction channel [40]. The reaction firing in the next step is that needing the smallest reaction time. Compared to the direct method, the first reaction method is not effective since it discards random numbers at each step. To improve the efficiency of the first reaction method, Gilson and Bruck [41] proposed the next reaction method by recycling the generated random numbers. The putative step size of a reaction channel is updated based on the step size of this channel at the previous step and values of the propensity function at these two steps. In addition, a so-called dependency graph was designed to reduce the computing time of propensity functions. Numerical results indicated that the next reaction method is effective for simulating systems with many species and reaction channels.

The SSA assumes that the next reaction will fire in the next reaction time interval with small values of . For systems including both fast and slow reactions, however, this assumption may not be valid if the slow reactions take a much longer time than the fast reactions. The large reaction time of slow reactions should be realized by time delay if we hope to put both fast and slow reactions in a system consistently and to study the impact of slow reactions on the system dynamics [42]. Recently, the delay SSA (delay stochastic simulation algorithmDSSA) was designed to simulate chemical reaction systems with time delays [43–45]. These methods have been used to validate stochastic models for biological systems with slow reactions [46, 47]. However, compared with the significant progress in designing simulation methods for biological systems without time delay [48, 49], only a few simulation methods have been designed to improve the efficiency of the DSSA [50, 51]. Similar to the effective methods for simulating biological systems without time delay, it is expected the progress in designing effective methods for simulating systems with time delay will also provide methodologies for modeling biological systems with time delay.

2.2.2 Accelerating -Leap Methods

Since the SSA can be very computationally inefficient, considerable attention has been paid recently to reducing the computational time for simulating stochastic chemical kinetics. Gillespie [52] proposed the -leap methods in order to improve the efficiency of the SSA while maintaining acceptable losses in accuracy. The key idea of the -leap methods is to take a larger time step and allow for more reactions to take place in that step. In the Poisson -leap method, the number of times that the reaction channel will fire in the time interval is approximated by a Poisson random variable () based on the present state at time [52]. Here, the leap size should satisfies the Leap Condition: a temporal leap by will result in a state change such that for every reaction channel , is “effectively infinitesimal” [52]. This method is given in Method 2.2.

Method 2.2 Poisson -Leap Method [52]

Step 1: Calculate the values of propensity functions based on the system state at time .

Step 2: Choose a value for the leap size that satisfies the Leap Condition.

Step 3: Generate a sample value of the Poisson random variable for each reaction channel ().

Step 4: Perform the updates of the system by:

(2.2)

A major step of the Poisson -leap method is to choose an appropriate step size that satisfies the Leap condition. Gillespie first proposed a simple procedure to determine the leap size [52]. In this formula, the expected change of each propensity function during should be bounded by with a given error control parameter :

(2.3)

where is the expected net change in state in , which can be calculated by . Later, more sophisticated methods have been proposed in order either to select the optimal leap size or to avoid the possible negative molecular numbers in simulation. For example, Gillespie and Petzold [53] proposed a method by considering both the mean and standard deviation of the expected change in the propensity functions. This method is an extension of the method (Equation 2.3) that only considered the mean of the expected change. It is worth noting that the leap size is a preselected deterministic value and is determined by the error control parameter . Like many other numerical methods, the leap size is related to the balance between computational efficiency and accuracy. In addition, our simulation results [54] indicated that the computing time for selecting the leap size is about a half of the total computing time when using the method of Gillespie and Petzold [53].

Since the samples of a Poisson random variable are unbounded, negative molecular numbers may be obtained if certain species have small molecular numbers and the propensity function involving that species has a large value. There are two ways of obtaining negative molecular numbers in stochastic simulations [55]. The first case is that the generated sample of reaction number is greater than one of the molecular numbers in that reaction channel. In the second case, a species involves a number of reaction channels and the total reaction number of these channels is greater than the copy number of that species, although the reaction number of each channel may be smaller than the molecular number.

For tackling the problem of negative numbers, binomial random variables were introduced to avoid the negative numbers of the first case by restricting the possible reaction numbers in the next time interval [55, 56]. In the binomial -leap method, the reaction number of channel is defined by a sample value of the binomial random variable under the condition . The maximal possible reaction number has been defined for the widely used three types of elementary reactions. In addition, a sampling technique was designed for sampling the total reaction number of a group of reaction channels if a reactant species involves these reaction channels [55]. The binomial -leap method is given in Method 2.3.

Method 2.3 Binomial -Leap Method [55]

Step 0: Define the maximal possible reaction number for each reaction channel. If a species involves two or more reaction channels , define a maximal possible total reaction number for these reaction channels.

Step 1: Calculate the values of propensity functions based on the system state at time .

Step 2: Use a method to determine the value of leap size . Check the step size conditions of the binomial random variables. If necessary, reduce the step size to satisfy these conditions.

Step 3: Generate a sample value of the binomial random variable for reaction channels in which species involve one single reaction. When a species involves two or more reaction channels, generate a total reaction number for these reaction channels and then generate the reaction number of each reaction channel in this group.

Step 4: Perform the updates of the system by:

(2.4)

In the -leap methods, it is assumed that, during a preselected time step , the number of fires of each reaction channel is a sample of a random variable. Another major type of leap method is the -leap method [52]. In an implementation of the -leap method, which is the so-called -leap method [57], it was proposed to select a predefined number of firings that may span several reaction channels. Then the leap step of these reactions is a sample of the Γ random variable . Over the time interval , the number of firings of each reaction channel (satisfying ) follows the correlated binomial distributions. A number of techniques have been proposed in the -leap method to determine the total reaction number and to sample the firing number of each reaction channel [57]. A similar approach, which is called the -leap method, was also proposed to achieve the computing efficiency over the exact SSA [58].

2.2.3 Langevin Approach

When the molecular numbers () in a chemical reaction system are quite large, the value of in the Poisson -leap method may be large for an appropriately selected step size . In this case, the Poisson random variable can be approximated by a normal random variable with the same mean and variance, given by [59]:

Then the Poisson -leap method (2.2) can be approximated by the following formula with normal random variables:

(2.5)

where the normal variables are all statistically independent. The above scheme is the explicit Euler method [60] for solving the chemical Langevin equation:

where is the Wiener process. If the molecular numbers in the system are very large, the value of may be still very large for a given step size . In this case, compared with the drift term , the diffusion term in (2.5) is neglectable. Finally, we obtained the explicit Euler method for solving the chemical rate equation:

The chemical Langevin equation links three types of important modeling regimes, namely the discrete stochastic models simulated by the SSA or Poisson -leap method, continuous stochastic models in terms of SDEs, and continuous deterministic systems of ODEs. In addition, the Langevin approach provides a method to describe internal noise of chemical reactions in the continuous SDE framework. When a reaction system has relatively large molecular numbers, the SDE models can be used to describe the system dynamics more efficiently than the discrete stochastic models. The chemical Langevin equation is also the theoretical basis of the multiscale simulation methods [61, 62]. Based on the molecular numbers and values of propensity functions, chemical reactions can be partitioned into a few reaction subsets at different time steps and then different simulation methods can be employed to simulate different subsets of chemical reactions. For example, Burrage et al. [63] proposed an adaptive approach to divide a reaction system into slow, intermediate, and fast reaction subsets, and used the SSA, Poisson -leap method, and SDEs to simulate the reactions in different subsets. Different partitioning techniques and different simulation methods have led to a number of effective methods and software for simulating chemical reaction systems [64–66].

2.3 Discrete Stochastic Modeling

Due to the lack of detailed knowledge of biochemical reactions, kinetic rates, and molecular numbers, stochastic models based on elementary chemical reactions may not always be the practical method to study chemical reaction systems. This section discusses a general approach to develop stochastic models based on widely used deterministic ODE models [67]. Instead of studying noise from detailed information of biochemical reactions, stochastic models will be developed by using macroscopic variables at some intermediate levels. Based on the stochastic simulation methods discussed in the previous section, the key idea of this method is to use Poisson random variables to represent chance events in protein synthesis, degradation, molecular diffusion and other biological processes. This technique is also consistent with other stochastic modeling approaches where Poisson random variables have been used for realizing the chance events in transcription and translation [68].

2.3.1 Stochastic Modeling Method

We first use a simple system to illustrate the relationship between a stochastic model, simulated by the Poisson -leap method, with the corresponding deterministic ODE model simulated by the Euler method. This system includes two reactions:

(2.6)

By using the Poisson -leap method, the number of molecules within the time interval is updated by

By assuming the independence of and , the mean of molecular numbers in the above Poisson -leap method can be obtained by:

which is the Euler method for solving the ODE with respect to , given by:

The above ODE is the chemical kinetic rate equation of species in the reaction system (2.6).

A further example is the enzymatic reaction:

(2.7)