Principles of Computational Cell Biology - Volkhard Helms - E-Book

Principles of Computational Cell Biology E-Book

Volkhard Helms

0,0
79,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Computational cell biology courses are increasingly obligatory for biology students around the world but of course also a must for mathematics and informatics students specializing in bioinformatics. This book, now in its second edition is geared towards both audiences. The author, Volkhard Helms, has, in addition to extensive teaching experience, a strong background in biology and informatics and knows exactly what the key points are in making the book accessible for students while still conveying in depth knowledge of the subject.About 50% of new content has been added for the new edition. Much more room is now given to statistical methods, and several new chapters address protein-DNA interactions, epigenetic modifications, and microRNAs.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 763

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Preface of the First Edition

Preface of the Second Edition

1

1 Networks in Biological Cells

1.1 Some Basics About Networks

1.2 Biological Background

1.3 Cellular Pathways

1.4 Ontologies and Databases

1.5 Methods for Cellular Modeling

1.6 Summary

1.7 Problems

Bibliography

2 Structures of Protein Complexes and Subcellular Structures

2.1 Examples of Protein Complexes

2.2 Complexome: The Ensemble of Protein Complexes

2.3 Experimental Determination of Three‐Dimensional Structures of Protein Complexes

2.4 Density Fitting

2.5 Fourier Transformation

2.6 Advanced Density Fitting

2.7 FFT Protein–Protein Docking

2.8 Protein–Protein Docking Using Geometric Hashing

2.9 Prediction of Assemblies from Pairwise Docking

2.10 Electron Tomography

2.11 Summary

2.12 Problems

Bibliography

3 Analysis of Protein–Protein Binding

3.1 Modeling by Homology

3.2 Properties of Protein–Protein Interfaces

3.3 Predicting Protein–Protein Interactions

3.4 Summary

3.5 Problems

Bibliography

4 Algorithms on Mathematical Graphs

4.1 Primer on Mathematical Graphs

4.2 A Few Words About Algorithms and Computer Programs

4.3 Data Structures for Graphs

4.4 Dijkstra's Algorithm

4.5 Minimum Spanning Tree

4.6 Graph Drawing

4.7 Summary

4.8 Problems

Bibliography

5 Protein–Protein Interaction Networks – Pairwise Connectivity

5.1 Experimental High‐Throughput Methods for Detecting Protein–Protein Interactions

5.2 Bioinformatic Prediction of Protein–Protein Interactions

5.3 Bayesian Networks for Judging the Accuracy of Interactions

5.4 Protein Interaction Networks

5.5 Protein Domain Networks

5.6 Summary

5.7 Problems

Bibliography

6 Protein–Protein Interaction Networks – Structural Hierarchies

6.1 Protein Interaction Graph Networks

6.2 Finding Cliques

6.3 Random Graphs

6.4 Scale‐Free Graphs

6.5 Detecting Communities in Networks

6.6 Modular Decomposition

6.7 Identification of Protein Complexes

6.8 Network Growth Mechanisms

6.9 Summary

6.10 Problems

Bibliography

7 Protein–DNA Interactions

7.1 Transcription Factors

7.2 Transcription Factor‐Binding Sites

7.3 Experimental Detection of TFBS

7.4 Position‐Specific Scoring Matrices

7.5 Binding Free Energy Models

7.6

Cis

‐Regulatory Motifs

7.7 Relating Gene Expression to Binding of Transcription Factors

7.8 Summary

7.9 Problems

Bibliography

8 Gene Expression and Protein Synthesis

8.1 Regulation of Gene Transcription at Promoters

8.2 Experimental Analysis of Gene Expression

8.3 Statistics Primer

8.4 Preprocessing of Data

8.5 Differential Expression Analysis

8.6 Gene Ontology

8.7 Similarity of GO Terms

8.8 Translation of Proteins

8.9 Summary

8.10 Problems

Bibliography

9 Gene Regulatory Networks

9.1 Gene Regulatory Networks (GRNs)

9.2 Graph Theoretical Models

9.3 Dynamic Models

9.4 DREAM: Dialogue on Reverse Engineering Assessment and Methods

9.5 Regulatory Motifs

9.6 Algorithms on Gene Regulatory Networks

9.7 Summary

9.8 Problems

Bibliography

10 Regulatory Noncoding RNA

10.1 Introduction to RNAs

10.2 Elements of RNA Interference: siRNAs and miRNAs

10.3 miRNA Targets

10.4 Predicting miRNA Targets

10.5 Role of TFs and miRNAs in Gene‐Regulatory Networks

10.6 Constructing TF/miRNA Coregulatory Networks

10.7 Summary

Bibliography

11 Computational Epigenetics

11.1 Epigenetic Modifications

11.2 Working with Epigenetic Data

11.3 Chromatin States

11.4 The Role of Epigenetics in Cellular Differentiation and Reprogramming

11.5 The Role of Epigenetics in Cancer and Complex Diseases

11.6 Summary

11.7 Problems

Bibliography

12 Metabolic Networks

12.1 Introduction

12.2 Resources on Metabolic Network Representations

12.3 Stoichiometric Matrix

12.4 Linear Algebra Primer

12.5 Flux Balance Analysis

12.6 Double Description Method

12.7 Extreme Pathways and Elementary Modes

12.8 Minimal Cut Sets

12.9 High‐Flux Backbone

12.10 Summary

12.11 Problems

Bibliography

13 Kinetic Modeling of Cellular Processes

13.1 Biological Oscillators

13.2 Circadian Clocks

13.3 Ordinary Differential Equation Models

13.4 Modeling Cellular Feedback Loops by ODEs

13.5 Partial Differential Equations

13.6 Dynamic Phosphorylation of Proteins

13.7 Summary

13.8 Problems

Bibliography

14 Stochastic Processes in Biological Cells

14.1 Stochastic Processes

14.2 Dynamic Monte Carlo (Gillespie Algorithm)

14.3 Stochastic Effects in Gene Transcription

14.4 Stochastic Modeling of a Small Molecular Network

14.5 Parameter Optimization with Genetic Algorithm

14.6 Protein–Protein Association

14.7 Brownian Dynamics Simulations

14.8 Summary

14.9 Problems

Bibliography

15 Integrated Cellular Networks

15.1 Response of Gene Regulatory Network to Outside Stimuli

15.2 Whole‐Cell Model of

Mycoplasma genitalium

15.3 Architecture of the Nuclear Pore Complex

15.4 Integrative Differential Gene Regulatory Network for Breast Cancer Identified Putative Cancer Driver Genes

15.5 Particle Simulations

15.6 Summary

Bibliography

16 Outlook

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Data on the genome length and on the number of protein‐coding and RNA ...

Table 1.2 The three graph objects in KEGG.

Table 1.3 Information stored in the BRENDA system for individual biochemical rea...

Table 1.4 Mathematical techniques used in computational cell biology that are co...

Chapter 2

Table 2.1 Composition of

Schizosaccharomyces pombe

SWI/SNF and RSC complexes comp...

Table 2.2 Key data that can be obtained by various experimental techniques relev...

Chapter 3

Table 3.1 Properties of protein–protein interfaces.

Table 3.2 Modeling various cellular processes requires different levels of detai...

Chapter 4

Table 4.1 Table that keeps track of events during execution of Dijkstra's algori...

Chapter 5

Table 5.1 Some public databases compiling data related to protein interactions.

Table 5.2 Bioinformatics methods to predict protein–protein interactions.

Table 5.3 Presence or absence of proteins P1–P7 in the four organisms in Figure ...

Table 5.4 Hamming distances between profiles of proteins P1–P7.

Table 5.5 Essentiality of protein pairs is weakly associated with their tendency...

Table 5.6 The first four columns contain results from high‐throughput experiment...

Table 5.7 The 10 most highly connected

InterPro

domains of

Methanococcus

,

E. coli

Chapter 6

Table 6.1 Predicted TF combinations with a significant increase of expression co...

Chapter 7

Table 7.1 Toy example of six DNA sequences that are 4 bp long.

Table 7.2 Frequency of nucleotide bases at the four positions, cf. Table 7.1.

Table 7.3 Score matrix of nucleotide bases at the four positions, cf. Tables 7.1...

Table 7.4 Toy example of five DNA sequences that are 4 bp long, cf. Table 7.1.

Chapter 8

Table 8.1 Example for Fisher's exact test.

Table 8.2 Symbols used in Fisher's exact test.

Table 8.3 Most extreme inequilibrium, cf. Table 8.1.

Table 8.4 Data distribution to illustrate Mann–Whitney U test.

Table 8.5 Ordered list of the values from Table 8.4.

Table 8.6 Normalization schema explaining quantile normalization algorithm.

Table 8.7 Prevalence of cancer in men and women (see Problem 1).

Table 8.8 Another scenario of Table 8.7 (see Problem 1).

Table 8.9 Expression values of genes G1 and G2 at four subsequent time points ...

Chapter 9

Table 9.1 Gene expression in each of the 10 steady gene activation states in the...

Chapter 10

Table 10.1 Different types of RNA molecules.

Chapter 11

Table 11.1 Toy data to connect epigenetic marks and gene expression status (se...

Table 11.2 Toy data for predicting the gene expression status (see Problem 5).

Chapter 12

Table 12.1 Metabolites most frequently found in the central metabolism of

E. coli

Table 12.2 Information on

E. coli

K‐12 strain MG1655 contained in EcoCyc, version...

Table 12.3 Properties of the E. coli network models

ColiGS

and

ColiCore

.

Table 12.4 Data for Problem 3(d). Inward and outward directed fluxes

b

1

to

b

5

.

Chapter 13

Table 13.1 Survey of biochemical oscillators.

List of Illustrations

Chapter 1

Figure 1.1 Is this how we should view a biological cell? The point of this sche...

Figure 1.2 (a) Since the 1950s, a paradigm was established, whereby the informat...

Figure 1.3 Structural organization of transcriptional regulatory networks. (a) T...

Figure 1.4 Major metabolic pathways.

Figure 1.5 The glycolysis pathway as visualized in the KEGG database is connecte...

Chapter 2

Figure 2.1 RNA polymerase II is the central enzyme of gene expression. It synth...

Figure 2.2

Spliceosome

: Spliceosome is a cellular “editor” that “cuts and pastes...

Figure 2.3 The ribosome: Model of the large ribosomal subunit from

Haloarcula ma

...

Figure 2.4

Arp 2/3 complex

: The seven‐subunit Arp2/3 complex choreographs the fo...

Figure 2.5 The human apoptosome.

Figure 2.6 Pyruvate dehydrogenase is a huge multienzyme complex comprising 60 co...

Figure 2.7 Schematic representation of the Oct4 homodimer bound to DNA. Oct4 is ...

Figure 2.8 Functional consequences of dimerization and oligomerization. (a) Co...

Figure 2.9 Definition and terminology used to define protein complex architectur...

Figure 2.10 The count of complexes in two

Escherichia coli

complex data sets and...

Figure 2.11 X‐rays are electromagnetic waves in the ultrashort (“hard”) regime...

Figure 2.12 (a) The chromophore of a cyan fluorescent protein (CFP) absorbs ligh...

Figure 2.13 (Top) In this example, the shapes of the X‐ray object (left) and the...

Figure 2.14 The steps involved in density matching. First, the shapes of the two...

Figure 2.15 Reordering an array (here of length 8) by bit reversal. Bit revers...

Figure 2.16 Schematic view of a Laplacian filter.

a

i

−1

jk

,

a

ijk

, and

a

i+1jk

...

Figure 2.17 An example illustrating the effect of a Laplacian filter. The left p...

Figure 2.18 The left picture represents the shape of a protein when discretized ...

Figure 2.19 The docking algorithm SnapDock uses the coordinates of C

α

/C

β

...

Figure 2.20 Flowchart of the CombDock algorithm. The protein subunits shown on t...

Figure 2.21 Principles of electron tomography. (a) The electron beam of an elect...

Figure 2.22 Three‐dimensional density reconstruction from electron tomography ...

Chapter 3

Figure 3.1 Plot illustrates the connection between the sequence similarity of...

Figure 3.2 The plot illustrates the structural similarity of complexes A–B to A′...

Figure 3.3 The plot illustrates the computation of the iRMSD between the complex...

Figure 3.4 Computation of the SASA. A small probe is rolled over the complete su...

Figure 3.5 Arrangement of all copies of a binary protein complex in a three‐di...

Figure 3.6 Interface size in transient protein–protein complexes. Histogram of...

Figure 3.7 Residue propensities at protein dimer interfaces and at artificial ...

Figure 3.8 (a) Schematic diagram of core and rim interface regions. Highlighted ...

Figure 3.9 Schematic illustration of possible shapes of the binding interface ...

Figure 3.10 Electrostatic interaction energy of two oppositely charged particl...

Figure 3.11 (a) Surface representation of the RNAse barnase colored according to...

Figure 3.12 Amino acid propensity matrix of transient protein–protein interfac...

Figure 3.13 Relative occurrence for binding partners of (a) leucine, (b) aspar...

Figure 3.14 (a) Radial pair distribution function of finding two alanine residue...

Figure 3.15 (a) Radial pair distribution function of finding two oppositely char...

Figure 3.16 ConSurf analysis of the β subunit of DNA polymerase III from

Esche

...

Figure 3.17 (a) Schematic drawing of a protein–protein interface involving conta...

Figure 3.18 Identification of correlated mutations. (Top) Family alignments ar...

Figure 3.19 Residue pairs across protein chains with high GREMLIN scores almos...

Chapter 4

Figure 4.1 A mathematical graph consists of vertices and edges. (a) An undirect...

Figure 4.2 Vertices A and D are connected by five paths (A → B → D, A → B → E → ...

Figure 4.3 A labeled tree with seven vertices and six edges connecting them.

Figure 4.7 Example of a minimum spanning tree so that each pair of vertices is c...

Figure 4.8 (a–j) Example illustrating the principles of Kruskal's algorithm. At ...

Figure 4.9 Example illustrating how the force‐directed layout algorithm will dis...

Figure 4.10 Weighted undirected graph (see Problem 4).

Figure 4.11 Schematic graph (see Problem 7).

Chapter 5

Figure 5.1 Schematic result of a gel electrophoresis run. The left lane label...

Figure 5.2 In affinity purification, a protein of interest (bait) is tagged with...

Figure 5.3 The Y2H system is one of the most widely used high‐throughput systems...

Figure 5.4 Results from different methods for complexes involving the cell cyc...

Figure 5.5 The gene cluster method. Genes A, B, and C are arranged linearly as ...

Figure 5.6 The

gene neighborhood method

analyzes the gene order in different evo...

Figure 5.7 If two protein‐coding genes are separated in some species (Sp1, Sp4, ...

Figure 5.8 (a) Proteins C and D are localized in the same compartment and may in...

Figure 5.9 The presence of protein families in various organisms detected, for e...

Figure 5.10 Graphical representation of the Hamming distances between the phyl...

Figure 5.11 Potentially interacting proteins or functionally related proteins sh...

Figure 5.12 An example of a Bayesian network. A directed arc, e.g. between varia...

Figure 5.13 Nine domains (geometrical objects) are the fundamental units of thes...

Figure 5.14 Connectivity of domains to other domains.

Figure 5.15 Resulting gel from a fictitious TAP experiment (see Problem 4).

Chapter 6

Figure 6.1 (a) Protein A interacts with proteins B, C and D. B also interacts w...

Figure 6.2 (a) Cubic lattice and (b) the corresponding distribution

p

(

k

).

Figure 6.3 Degree distribution in random network (a) showing a Poisson distribut...

Figure 6.4 Example illustrating the clustering coefficient on an undirected gr...

Figure 6.5 A clique in a graph is a set of pairwise adjacent vertices or a fully...

Figure 6.6 Result of one of the first large‐scale analyses of the protein intera...

Figure 6.7 In this schematic protein interaction networks, proteins are colored ...

Figure 6.8 Three communities formed by densely connected vertices (circles with ...

Figure 6.9 Modified version of Figure 6.8 where the three dark vertices mediate ...

Figure 6.10 (a) The friendship network from Zachary's karate club study. The ins...

Figure 6.11 The left vertex has a degree of 7 and the right vertex has a degree ...

Figure 6.12 A graph and its modules. In addition to the

trivial modules

{

a

}, {

b

}...

Figure 6.13 (A) Modular decomposition and (B) resulting tree. Vertices

a

,

b

and

Figure 6.14 Modular decomposition of four alternative phosphatase 2A complexes. ...

Figure 6.15 Schematic protein interaction network to illustrate cohesiveness m...

Figure 6.16 The gray nodes in this domain–domain interaction network are the pro...

Figure 6.17 Cell cycle expression profiles of all genes targeted by MET4 or MET3...

Figure 6.18 Fifty‐one representative subgraphs of length 8 (out of 148 subgrap...

Figure 6.19 Partitioning based on edge betweenness (see Problem 14).

Chapter 7

Figure 7.1 X‐ray crystal structures of common structural topologies of eukaryot...

Figure 7.2 Sequence logos for the DNA‐binding motifs that the transcription fa...

Figure 7.3 Resulting gel of an EMSA assay. The “shifted” band shows that the p...

Figure 7.4 Main steps of DNAse footprinting assay.

Figure 7.5 Schema of protein‐binding microarray experiments.

Figure 7.6 Methods for the detection of

cis

‐regulatory modules (CRMs): (a) CRM...

Figure 7.7 The ENCODE project studied how well the occupancy of transcription fa...

Chapter 8

Figure 8.1 Typical promoter region of a prokaryotic gene. The TTGACA and TATAAT...

Figure 8.2 Eukaryotic genomic region containing three genes A, B, and C. Differe...

Figure 8.3 Basic steps of a microarray experiment.

Figure 8.4 Standard normal distribution.

μ

is the mean of the (symmetric) n...

Figure 8.5 Schematic representation of the expression levels of a particular gen...

Figure 8.6 A “volcano plot” visualizes the results of differential expression an...

Figure 8.7 The structure of the gene ontology branch “biological process” is ill...

Figure 8.8 Protocol to determine synthesis rates and protein/mRNA lifetimes. Mou...

Figure 8.9 Kinetic schema to analyze experimental results of Figure 8.8. mRNAs a...

Figure 8.10 (a) Distribution of calculated mRNA transcription rate constants and...

Chapter 9

Figure 9.1 An example of a gene regulatory network. Solid arrows indicate direc...

Figure 9.2 Graph representation of the gene network corresponding to the biochem...

Figure 9.3 Graph representation of the

E. coli

transcriptional regulatory netw...

Figure 9.4 These three gene connectivities may lead to similar observed coexpr...

Figure 9.5 Gene network architecture determining the fate of the floral organ ...

Figure 9.6 Yeast1‐size10 network to test the GRN reconstruction algorithm in t...

Figure 9.7 Two regulation events that were missed by the noise models of the YYA...

Figure 9.8 Connectivity matrix for causal regulation of target gene

j

(row) by...

Figure 9.9 Example of an FFL (

L

‐arabinose utilization in

E. coli

). The global ...

Figure 9.10 Example of a single‐input motif (SIM) system (arginine biosynthesi...

Figure 9.11 Example of a DOR. (a) In this motif, many inputs regulate many out...

Figure 9.12 Largest subnetwork identified as downregulated in the caudate nucl...

Figure 9.13 An illustration of the MDS and MCDS solutions of an example networ...

Figure 9.14 Tightly interwoven network of 17 transcription factors and target ge...

Figure 9.15 Toy Boolean network (see Problem 3).

Chapter 10

Figure 10.1 Basic structural motifs of RNA secondary structure. This RNA consis...

Figure 10.2 Three‐dimensional structure of the VS ribozyme. This ribozyme from t...

Figure 10.3 MicroRNAs (miRNAs) recognize their targets by Watson–Crick base pair...

Figure 10.4 Schematic representation of miRNA biogenesis.

Figure 10.5 Stem‐loop structures of

C. elegans

,

Drosophila melanogaster

, and

H

...

Figure 10.6 Regulatory networks of miRNAs and proteins involved in the control o...

Figure 10.7 Bioinformatics tools are used for different purposes in miRNA resear...

Figure 10.8 FFL and FBL types. (a) Three types of FFLs classified by the master ...

Figure 10.9 Schematic model for TF‐miRNA coregulatory network in cell proliferat...

Figure 10.10 Based on deregulated genes and microRNAs in breast cancer patients,...

Chapter 11

Figure 11.1 Epigenetic marks around the NANOG gene after two days of directed...

Figure 11.2 (Left) Unmethylated cytosine and (right) cytosine methylated in its ...

Figure 11.3 Reversible changes in chromatin organization influence gene expressi...

Figure 11.4 (Left) C5‐methylated cytosine and (right) thymine. The deamination r...

Figure 11.5 Atomic structure of the nucleosome core particle. The two strands ...

Figure 11.6 (Left) Lysine amino acid and (to the right) methylated versions of...

Figure 11.7 Main experimental steps of the ChIP‐seq protocol that is used to ide...

Figure 11.8 Schematic example of CpG methylation in five genes. The sticks indic...

Figure 11.9 Association of comethylation of genes with genomic distance. Only pa...

Figure 11.10 Principle of the DNase 1 hypersensitivity assay.

Figure 11.11 Histone marks may have either activating (a) or repressive (b) effe...

Figure 11.12 The ENCODE project studied how well histone modifications are corre...

Figure 11.13 Basic architecture of an HMM.

X

1

to

X

3

are the possible states of t...

Figure 11.14 A “double‐negative gate” realized in both

Drosophila

and sea urchin...

Figure 11.15 A transcriptional regulatory circuit involving nine transcription f...

Figure 11.16 Schematic illustration of DNA methylation levels at CpG loci surrou...

Figure 11.17 Toy data for protein intensities (see Problem 3).

Figure 11.18 Hierarchy of hematopoietic differentiation stages (see Problem 4).

Chapter 12

Figure 12.1 Flow chart to automatically reconstruct metabolic networks from a ...

Figure 12.2 Simple network and the corresponding stoichiometric matrix. In thi...

Figure 12.3 This example is slightly more complicated than the one in Figure 12....

Figure 12.4 (a) A “pointed” cone spanned by five generating vectors that interse...

Figure 12.5 Strategy for determining optimal states of a biochemical network by ...

Figure 12.6 A torch light illuminates a dark room through an open door.

Figure 12.7 (a) Points belonging to the gray‐shaded area fulfill the condition

x

Figure 12.8 Cube

abcdefgh

represents a cone of solutions satisfying all inequa...

Figure 12.9 Simple metabolic network with four metabolites A–D connected by fo...

Figure 12.10 Construction of the first tableau in step 1.

Figure 12.11 By going from the upper to the lower tableau, rows are being transf...

Figure 12.12 In the upper part of the picture, the tableau

T

(E)

with the exchang...

Figure 12.13 Three extreme pathways are found that span the solution space of th...

Figure 12.14 The figure shows the same network as in Figure 12.3. (a) This is th...

Figure 12.15 Example network with five internal metabolites and eight reaction...

Figure 12.16 For the example network of Fig. 12.15, one obtains six elementary f...

Figure 12.17 Minimal cut sets for repressing synthesis of P in the example netw...

Figure 12.18 (a) Example network of Figure 12.15. (b) Participation of individua...

Figure 12.19 (a) Calculated flux distribution for optimized biomass production o...

Figure 12.20 Schematic illustration of two hypothetical scenarios in which eit...

Figure 12.21 Measured

kY

(

k

) shown as a function of

k

for incoming and outgoing...

Figure 12.22 HFB in the metabolic network of

E. coli

as optimized by flux bala...

Figure 12.23 Example of a metabolic network leading to the production of biomass...

Figure 12.24 Simple metabolic network as in Figure 12.9 (see Problem 2).

Figure 12.25 Simple metabolic network with five exchange fluxes (see Problem 3).

Chapter 13

Figure 13.1 Schematic illustration of an oscillating output of a biological cloc...

Figure 13.2 Minimal components of the mammalian clock. First, the two transcript...

Figure 13.3 Force diagram for a mathematical pendulum consisting of a weight att...

Figure 13.4 Simple model for the synthesis of protein

R

(“response”) under act...

Figure 13.5 For the example of linear response, the steady‐state response

R

ss

de...

Figure 13.6 Simple model for the equilibrium between the phosphorylated form of ...

Figure 13.7 Steady‐state concentration of phosphorylated protein

R

P

as a functi...

Figure 13.8 Response in a phosphorylation/dephosphorylation equilibrium with Mic...

Figure 13.9 (a) Coupling of the initial response pathway via

R

with a second sig...

Figure 13.10 Example of a positive feedback system built from a protein

E

and th...

Figure 13.11 Positive feedback system. As

S

increases, the response is low until...

Figure 13.12 (a, b) Positive feedback system, termed “toggle switch.”

Figure 13.13 (a, b) Negative feedback system.

Figure 13.14 (a) Three‐component system with feedback loop. (b) Feedback loop le...

Figure 13.15 Wiring diagram of the cell cycle regulation in eukaryotes. Major ev...

Figure 13.16 (a) Three‐component system with feedback loop (cf. Figure 13.10). (...

Figure 13.17 Toggle switch in G

2

/M phase involving the activating interaction of...

Figure 13.18 Spatial segregation of two opposing enzymes in a protein modificati...

Figure 13.19 Topology of a Boolean network for fission yeast.

Figure 13.20 Schematic view of the two ODE model (see Problem 1).

Chapter 14

Figure 14.1 Schema illustrating detailed balance for a system with two states,

Figure 14.2 (a) Histogram showing the expression level of a fluorescent reporter...

Figure 14.3 Stochastic simulation of single‐gene expression using the Gillespie ...

Figure 14.4 Toggle switch design by Gardner and Collins. Repressor 1 inhibits tr...

Figure 14.5 In stochastic simulations of the system shown in Figure 14.4, initia...

Figure 14.6 Having both inducers present, the two genes switch on and off, depen...

Figure 14.7 Artistic textbook‐style rendering of the photosynthetic apparatus of...

Figure 14.8 (a) A reconstructed chromatophore model vesicle of 45 nm diameter. T...

Figure 14.9 “Pools‐and‐proteins” view of bacterial photosynthesis. The network i...

Figure 14.10 Reactions modeled in the RC. The respective rate constants are

A

1

(...

Figure 14.11 Reactions modeled in the

bc

1

complex. The respective rate constants...

Figure 14.12 Rate of ATP production per second as a function of light intensity....

Figure 14.13 Number of reduced cytochrome

c

2

particles in the stochastic simulat...

Figure 14.14 Optimization of kinetic system parameters by an evolutionary algori...

Figure 14.15 Comparison between different experimental data for the time‐depende...

Figure 14.16 Schematic illustration of particles undergoing undirected Brownian ...

Figure 14.17 Definition of different criteria to describe the relative orientati...

Figure 14.18 Schematic representation of the free energy for protein–protein int...

Figure 14.19 Occupancy maps for barstar at various distances

cd

avg

from barnase....

Figure 14.20 (a) Electrostatic interaction energy between barnase and barstar at...

Figure 14.21 A simple reaction network (see Problem 1).

Figure 14.22 Constructing a stoichiometric matrix for reaction network of Figure...

Figure 14.23 Concentrations of metabolites

A

and

D

at various time points.

Figure 14.24 Simplified signaling cascade (Problem 2).The reactions labeled

R

0

t...

Chapter 15

Figure 15.1 Dynamic representation of the transcriptional network of

Saccharomy

...

Figure 15.2

M. genitalium

whole‐cell model consisting of 28 integrated submodels...

Figure 15.3 DNA‐binding and dissociation dynamics of the oriC‐DnaA complex (red)...

Figure 15.4 Whole‐cell simulations of mutant

M. genitalium

strains. Single‐gene ...

Figure 15.5 Determining the architecture of the nuclear pore complex by integrat...

Figure 15.6 Representation of the optimization process of the NPC molecular arch...

Figure 15.7 Integrative network‐based approach to understand breast carcinogenes...

Figure 15.8 Gene network modules of TF–gene interactions. (a) Topological overla...

Figure 15.9 Regulatory interactions involving the 17 key driver genes identified...

Figure 15.10 (a) Schematic illustration of

Mycoplasma genitalium

(

MG

). (b)

MG

h

s...

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

vi

2

3

4

xv

xvi

xvii

16

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

89

90

91

92

99

100

101

102

103

104

105

106

107

108

109

110

111

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

136

137

138

139

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

427

428

429

429

430

431

432

433

434

435

436

437

438

439

440

441

442

Principles of Computational Cell Biology

From Protein Complexes to Cellular Networks

Volkhard Helms

 

 

Second Edition

Copyright

Author

Volkhard Helms

Universität des Saarlandes

Zentrum für Bioinformatik

66041 Saarbrücken

Germany

All books published by Wiley‐VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Library of Congress Card No.:

applied for

British Library Cataloguing‐in‐Publication Data

A catalogue record for this book is available from the British Library.

Bibliographic information published by

the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at <http://dnb.d‐nb.de>.

© 2019 Wiley‐VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany

All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.

Print ISBN: 978‐3‐527‐33358‐5

ePDF ISBN: 978‐3‐527‐81033‐8

ePub ISBN: 978‐3‐527‐81032‐1

Preface of the First Edition

This book grew out of a course for graduate students in the first year of the MSc bioinformatics program that the author teaches every year at Saarland University. Also included is some material from a special lecture on cell simulations. The book is designed as a textbook, placing emphasis on transmitting the main ideas of a problem, outlining algorithmic strategies for solving these, and describing possible complications or connections to other parts of the book. The main challenge during the writing of the book was the concentration on conceptual points that may be of general educative value rather than including the latest research results of this fascinating fast‐moving field. It is considered more important for a textbook to give a cohesive picture rather than mentioning all possible drawbacks and special cases where particular general guidelines may not apply. We apologize to those whose work could not be mentioned because of space constraints.

The intended audience includes students of bioinformatics and from life science disciplines. Consequently, some basic knowledge in molecular biology is taken for granted. The language used is not very formal. Previous knowledge of computer science is not required, but a certain adeptness in basic mathematics is necessary. The book introduces all of the mathematical concepts needed to understand the material covered. In particular, Chapter 2 introduces mathematical graphs and algorithms on graphs used in classifying protein–protein interaction networks. Chapter 6 introduces linear and convex algebra typically being used in the description of metabolic networks. Chapter 7 discusses ordinary and stochastic differential equations used in the kinetic modeling of signal transduction pathways. Chapter 8 introduces the method of Fourier transformation for protein–protein docking and pattern matching. Also introduced are Bayesian networks in Chapter 4 as a way to judge the reliability of protein–protein interactions and inference techniques to model gene regulatory networks. We note, however, that the emphasis of this book is placed on discrete mathematics rather than on statistical methods. Not included yet are classical network flow algorithms such as Menger's theorem or the max‐flow min‐cut theorem as they are currently rarely used in cellular modeling. The book focuses on proteins and the genes coding for them, as well as on metabolites. Less room is given to DNA, RNA, or lipid membranes that would, of course, also deserve a great deal of attention. The main reason for this was to provide a homogenous background for discussing algorithmic concepts.

The author is very grateful to Dr. Tihamér Geyer who coordinated the assignments for the lectures for valuable comments on the manuscript and for many solved examples and problems for this book. The following coworkers from Saarbrücken and elsewhere have provided valuable suggestions on different portions of the text: Kerstin Kunz, Jan Christoph, and Florian Lauck. I thank Dr. Hawoong Jeong, Dr. Julio Collado‐Vides, Dr. Agustino Martínez‐Antonio, Dr. Ruth Sperling, Dr. James R. Williamson, Dr. Joanna Trylska, Dr. Claude Antony, and Dr. Nicholas Luscombe for sending me high‐resolution versions of their graphics. I thank Dr. Andreas Sendtko and the publishing staff at Wiley‐VCH for their generous support of this book project, for their seemingly endless patience during the revision stage, and for excellent typesetting.

I also thank the Center of Theoretical Biophysics at the University of California, San Diego, for their hospitality during a sabbatical visit in summer 2007 that finally allowed to complete this work. Finally, this book would not have been possible without the support and patience of my wife Regina and our two daughters.

March 2008

Volkhard Helms

Center for Bioinformatics

Saarland University

Saarbrücken, Germany

Preface of the Second Edition 1

About 10 years after the publication of the first edition, I finally managed to prepare this expanded second edition of this book. Its main spirit remained the same: it is designed as a textbook, placing emphasis on transmitting the main ideas of a problem, outlining algorithmic strategies for solving these, and describing possible complications or connections to other parts of the book. Because of the feedback from colleagues, I have reordered the content, starting now in Chapter 2 with an introduction into the structures of protein–protein complexes before we enter into the world of protein interaction networks. I refrain from listing all the rearrangements here. Usually, I tried to keep subsections intact and simply shifted them around. A few sections were removed from the text because I now felt that they were too specialized. About 50% of new content has been added. In terms of mathematical methods, much more room is now given to statistical methods. In terms of biology, several new chapters now address protein–DNA interactions, epigenetic modifications, and microRNAs. Still not covered are biophysical topics related to intracellular transport, cytoskeletal dynamics, and processes taking place at and across biological membranes. Maybe, there will be a need for a third edition eventually?

In addition to those who contributed to the first edition, the author is very grateful to Thorsten Will and Maryam Nazarieh for solved examples and problems for this book. The following coworkers from Saarbrücken and elsewhere have provided valuable suggestions on different portions of the text: Mohamed Hamed Fahmy, Dania Humaidan, Olga Kalinina, Heiko Rieger, and Thorsten Will. I thank my group members of the past years with whom I had the privilege to work on exciting research projects related to the content of this book and I thank our secretary Kerstin Gronow‐Pudelek for technical assistance.

April 2018

Volkhard Helms

Center for Bioinformatics

Saarland University

Saarbrücken, Germany

Note

1

Problems: To really absorb the content of this textbook, it is advisable to also try to solve some of the problems enclosed.

1Networks in Biological Cells

Modern molecular and cell biology has worked out many important cellular processes in more detail, although some other areas are known to a lesser extent. It often remains to understand how the individual parts are connected, and this is exactly the focus of this book. Figure 1.1 displays a cartoon of a cell as a highly viscous soup containing a complicated mixture of many particles. Certainly, several important details are left out here that introduce a partial order, such as the cytoskeleton and organelles of eukaryotic cells. Figure 1.1 reminds us that there is a myriad of biomolecular interactions taking place in biological cells at all times and that it is pretty amazing how a considerable order is achieved in many cellular processes that are all based on pairwise molecular interactions.

Figure 1.1Is this how we should view a biological cell? The point of this schematic picture is that about 30% of the volume of a biological cell is taken up my millions of individual proteins. Therefore, biological cells are really “full.” However, of course, such pictures do not tell us much about the organization of biological processes. As we will see later in this book, there are many different hierarchies of order in such a cell.

The focus of this book is placed on presenting mathematical descriptions developed in recent years to describe various levels of cellular networks. We will learn that many biological processes are tightly interconnected, and this is exactly where many links still need to be discovered in further experimental studies. Many researchers in the field of molecular biology believe that only combined efforts of modern experimental techniques, mathematical modeling, and bioinformatics analysis will be able to arrive at a sufficient understanding of the biological networks of cells and organisms.

In this chapter, we will start with some principles of mathematical networks and their relationship with biological networks. Then, we will briefly look at several biological key players to be used in the rest of this book (cells, compartments, proteins, and pathways). Without going into any further detail, we will directly move into the field of network theory with the amazing “small‐world phenomenon.”

1.1 Some Basics About Networks

Network theory is a branch of applied mathematics and more of physics that uses the concepts of graph theory. Its developments are led by application to real‐world examples in the areas of social networks (such as networks of acquaintances or among scientists having joint publications), technological networks (such as the World Wide Web that is a network of web pages and the Internet that is a network of computers and routers or power grids), and biological networks (such as neural networks and metabolic networks).

1.1.1 Random Networks

In a random network, every possible link between two “vertices” (or nodes) A and B is established according to a given probability distribution irrespective of the nature and connectivity of the two vertices A and B. This is what is “random” about these networks. If the network contains n vertices in total, the maximal number of undirected edges (links) between them is n × (n − 1)/2. This is because we can pick each of the n vertices as the first vertex of an edge, and there are (n − 1) other vertices that this vertex can be connected to. In this way, we will actually consider each edge twice, using each end point as the first vertex. Therefore, we need to divide the number of edges by 2.

If every edge is established with a probability p ∈ [0, 1], the total number of edges in an undirected graph is p × n × (n − 1)/2. The mathematics of random graphs was developed and elucidated by two Hungarian mathematicians Erdös and Renyi. However, the analysis of real networks showed that such networks often differ significantly from the characteristics of random graphs. We will turn back to random graphs in Section 6.3.

1.1.2 Small‐World Phenomenon

The term small‐world phenomenon was coined to describe the observation that everyone in the world is linked to some other person through a short chain of social acquaintances. In a small‐world experiment, the psychologist Stanley Milgram found in 1967 that, on average, any two US citizens randomly picked were connected to each other by only six acquaintances. Vertices in a network have short average distances. Usually, the distance between the nodes scales logarithmically with the total number, n, of the vertices.

In a paper published in the journal Nature in 1998, the two mathematicians Duncan J. Watts and Steven H. Strogatz (Watts and Strogatz, 1998) reported that small‐world networks are common in many different areas ranging from neuronal connections of the worm Caenorhabditis elegans to power grids.

1.1.3 Scale‐Free Networks

Only one year after the discovery of Watts and Strogatz, Albert‐László Barabási from the Physics Department at the University of Notre Dame introduced an even simpler model for the emergence of the small‐world phenomenon (Barabási and Albert 1999). Although Watts and Strogatz's model was able to explain the short average path length and the dense clustering coefficient of a small world (all these terms will be introduced in Chapter 6), it did not manage to explain another property that is typical for real‐world networks such as the Internet: these networks are scale‐free. In simple terms, this means that although the vast majority of vertices are weakly connected, there also exist some highly interconnected super‐vertices or hubs. The term scale‐free expresses that the ratio of highly to weakly connected vertices remains the same irrespective of the total number of links in the network. We will see in Section 6.4 that the connectivity of scale‐free networks follows a power law. If a network is scale‐free, it is also a small world.

In this paper, Barabási and Albert presented a strikingly simple and intuitive algorithm that generates networks with a scale‐free topology. It has two essential elements:

Growth

. The network is started from a small number of (at least two) connected vertices. At every iteration step, a new vertex is added that forms links to

m

of the existing vertices.

Preferential attachment

. One assumes that the probability of a link between a newly added vertex and an existing vertex

i

depends on the degree of

i

(the number of existing links between vertex

i

and other vertices). The more connections

i

has already, the more likely the new vertices will link to

i

. This behavior is described by the saying “the rich become richer.” Let us motivate this on the fictitious example of the early days of air traffic. Initially, one needs to build two airports so that a first regular flight connection can be established between them. Eventually, a third airport is established. Most likely, initially, only one new flight will go to either one of the existing airports. Now, the situation is unbalanced. Now, there exists one airport that is connected to two other cities, and the airports of those cities are only connected to one city. There is a certain chance that, after some time, the “missing” connection between the new airport and the other airport would be introduced, which would lead to a balanced situation again. Alternatively, a fourth airport could emerge that would also start by establishing only one flight to one of the existing airports. Now, the airport that already has two connections would have an obvious practical advantage because passengers taking this route simply have more options to carry on. Therefore, the chance that this flight is established is higher than for the other connections. Exactly, this idea is captured by the concept of preferential attachment.

The same growth mechanism applies, for example, to the World Wide Web. Obviously, this network grows constantly over time, and many new pages are added to it every moment. We know from our own experience that once a new web page is created, its owner will most likely include links to other popular pages (hubs) on the new page so that the second “rule” is also fulfilled.

In the early exciting days of network theory when the study of large‐scale networks took off like a storm, it was even suggested that the scale‐free network model may be something like a law of nature that controls how natural small‐world networks are formed. However, subsequent work on integrated biological networks showed that the concept of scale‐free networks may rather be of theoretical value and that it may not be directly applicable to certain biological networks. For the moment, we will consider the idea of network topology (scale‐free networks and small‐world phenomenon) as a powerful concept that is useful for understanding the mechanism of network growth and vulnerability.

Figure 1.2(a) Since the 1950s, a paradigm was established, whereby the information flows from DNA over RNA to protein synthesis, which then gives rise to particular phenotypes. (b) The emergence of structural biology – the first crystal structure of the protein myoglobin was determined in 1960 – emphasized the importance of the three‐dimensional structures of proteins determining their function. (c) Today, we have realized the central role played by molecular interactions that influence all other elements.

1.2 Biological Background

Until recently, the paradigm of molecular biology was that genetic information is read from the genomic DNA by the RNA polymerase complex and is transcribed into the corresponding RNA. Ribosomes then bind to messenger RNA (mRNA) snippets and produce amino acid strands. This process is called translation. Importantly, the paradigm involved the notion that this entire process is unidirectional, seeFigure 1.2.

1.2.1 Transcriptional Regulation

It is now well established that many feedback loops are provided in this system too, e.g. by the proteins known as transcription factors that bind to sequence motifs on the genomic DNA and mediate (activate or repress) transcription of certain genomic segments. Important discoveries of the past 20 years showed that cellular mRNA concentrations are also largely affected by small RNA snippets termed microRNAs and that the chromatin structure is shaped by epigenetic modifications of the DNA and histone proteins that control the accessibility of genomic regions. The cellular network therefore certainly appears much more complicated today than it did 60 years ago.

This brings us to the world of gene regulatory networks. Collecting the required information on the regulation of individual genes is a subject of intense active research. For example, the ENCODE project for human cells and the modENCODE project for the model organisms C. elegans and Drosophila melanogaster mapped the binding sites of hundreds of transcription factors throughout the genomes. Also, the FANTOM initiative started in Japan is a worldwide collaborative project aiming at identifying all the functional elements in mammalian genomes. However, occupancy maps of transcription factors alone are not being considered as compelling evidence of biologically functional regulation. To really prove or disprove which gene is activated or repressed by a particular transcription factor (or microRNA), one could create a knockout organism lacking the gene coding for this transcription factor and see which genes are no longer expressed or are now expressed in excess. Such genome‐wide deletion libraries have actually been produced for the model organism Saccharomyces cerevisiae. However, in this way, we can only discover those combinations that are not lethal for the organism. Also, pairs or larger assemblies of transcription factors often need to bind simultaneously. It simply appears impossible to discover the full connectivity of this regulatory network by a traditional one‐by‐one approach. Fortunately, modern microarray and RNAseq experiments probe the expression levels of many genes simultaneously. Ongoing challenges are the noisy nature of the large‐scale data and the fact that genes actually do not interact directly with each other. Analysis of gene expression data will be discussed in Chapter 8.

In this book, we will be mostly concerned with the following four types of biological cellular networks: protein–protein interaction networks, gene regulatory networks, signal transduction networks, and metabolic networks. We will discuss them at different hierarchical levels as shown in Figure 1.3 using the example of regulatory networks.

Figure 1.3Structural organization of transcriptional regulatory networks. (a) The “basic unit” comprises the transcription factor, its target gene with a DNA recognition site, and the regulatory interaction between them. (b) Units are often organized into network “motifs” that comprise specific patterns of inter‐regulation that are overrepresented in networks. Examples of motifs include single‐input/multiple output (SIM), multiple input/multiple output (MIM), and feed‐forward loop (FFL) motifs. (c) Network motifs can be interconnected to form semi‐independent “modules,” many of which have been identified by integrating regulatory interaction data with gene expression data and imposing evolutionary conservation. The next level consists of the entire network (not shown).

Source: Babu et al. (2004). Drawn with permission of Elsevier.

1.2.2 Cellular Components

Cells can be described at various levels in detail. We will mostly use three different levels of description:

Inventory lists and lists of processes

.

Proteins in particular compartments

Proteins forming macromolecular complexes

Biomolecular interactions

Regulatory interactions

Metabolic reactions

Structural descriptions

.

Structures of single proteins

Topologies of protein complexes

Subcellular compartments

Dynamic descriptions

.

Cellular processes ranging from nanosecond dynamics for the association of two biomolecules up to processes occurring in seconds and minutes such as the cell division of yeast cells.

We will assume that the reader has a basic knowledge about the organic molecules commonly found within living cells and refer those who do not to basic books on biochemistry or molecular biology. Depending on their role in metabolism, the biomolecules in a cell can be grouped into several classes.

Macromolecules

including nucleic acids, proteins, polysaccharides, and certain lipids.

The

building blocks

of macromolecules include sugars as the precursors of polysaccharides, amino acids as the building blocks of proteins, nucleotides as the precursors of nucleic acids (and therefore of DNA and RNA), and fatty acids that are incorporated into lipids. Interestingly, in biological cells, only a small number of theoretically synthesizable macromolecules exist at a given time point. At any moment during a normal cell cycle, many new macromolecules need to be synthesized from their building blocks, and this is meticulously controlled by the complex gene expression machinery. Even during a steady state of the cell, there exists a constant turnover of macromolecules.

Metabolic intermediates (metabolites

)

. Many molecules in a biological cell have complex chemical structures and must be synthesized in several reactions from specific starting materials that may be taken up as the energy source. In the cell, connected chemical reactions are often grouped into metabolic pathways (Section

1.3

).

Molecules of

miscellaneous function

including vitamins, steroid hormones, molecules that can store energy storage such as ATP, regulatory molecules, and metabolic waste products.

Almost all biological materials that are needed to construct a biological cell are either synthesized by the RNA polymerase and ribosome machinery of the cell or are taken up from the outside via the cell membrane. Therefore, as a minimum inventory, every cell needs to contain the construction plan (DNA), a processing unit to transcribe this information into mRNA (polymerase), a processing unit to translate these mRNA pieces into protein (ribosome), and transporter proteins inside the cell membrane that transport material through the cell membrane.

1.2.3 Spatial Organization of Eukaryotic Cells into Compartments

Organization into various compartments greatly simplifies the temporal and spatial process flow in eukaryotic cells. As mentioned above, at each time point during a cell cycle, only a small subfraction of all potential proteins is being synthesized (and not yet degraded). Also, many proteins are only available in very small concentrations, possibly with only a few copies per cell. However, localizing these proteins to particular spots in the cell, e.g. by attaching them to the cytoskeleton or by partitioning them into lipid rafts, their local concentrations may be much higher. We assume that the reader is vaguely familiar with the compartmentalization of eukaryotic cells involving the lysosome, plasma membrane, cell membrane, Golgi complex, nucleus, smooth endoplasmic reticulum, mitochondrion, nucleolus, rough endoplasmic reticulum, and cytoskeleton.

An important element of cellular organization is the active transport of macromolecules along the microtubules of the cytoskeleton that is carried out by molecular motor proteins such as kinesin and dynein. Here, we will not address the activities of molecular motors because this is rather a research topic in biophysics.

1.2.4 Considered Organisms

Table 1.1 presents some statistics of the organisms considered in this book.

Table 1.1 Data on the genome length and on the number of protein‐coding and RNA genes are taken from the Kyoto Encyclopedia of Genes and Genomes database (April 2018); data on the number of putative transporter proteins are taken from www.membranetransport.org.

Organism

Length of genome (Mb)

Number of protein‐coding genes

Number of RNA genes

Number of transporter proteins

Prokaryotes

Mycoplasma genitalium

G37

   0.6

476

43

53

Bacillus subtilis

BSN5

   4.2

 4 145

113

552

Escherichia coli

APEC01

   4.6

 4 890

93

665

Eukaryotes

Saccharomyces cerevisiae

S288C

   1.3

 6 002

425

341

Drosophila melanogaster

  12

13 929

 3 209

662

Caenorhabditis elegans

 100.2

20 093

24 969

669

Homo sapiens

3 150

20 338

19 201

1 467

1.3 Cellular Pathways

1.3.1 Biochemical Pathways

Metabolism denotes the entirety of biochemical reactions that occur within a cell (Figure 1.4). In the past century, many of these reactions have been organized into metabolic pathways. Each pathway consists of a sequence of chemical reactions that are catalyzed by specific enzymes, and the outcome of one reaction is the input for the next one. Unraveling the individual enzymatic reactions was one of the big successes of applying biochemical methods to cellular processes. Metabolic pathways can be divided into two broad types. Catabolic pathways disintegrate complex molecules into simpler ones, which can be reused for synthesizing other molecules. Also, catabolic pathways provide chemical energy required for many cellular processes. This energy may be stored temporarily as high‐energy phosphates (primarily in ATP) or as high‐energy electrons (primarily in NADPH). Conversely, anabolic pathways synthesize more complex substances from simpler starting reagents by utilizing the chemical energy generated by exergonic catabolic pathways.

Figure 1.4Major metabolic pathways.

The traditional biochemical pathways were often derived from studying simple organisms where these pathways constitute a dominating part of the metabolic activity. For example, the glycolysis pathway was discovered in yeast (and in muscle) in the 1930s. It describes the disassembly of the nutrient glucose that is taken up by many microorganisms from the outside. Figure 1.5 shows the glycolysis pathway in Homo sapiens as represented in the KEGG database (Kanehisa et al. 2016).

Figure 1.5The glycolysis pathway as visualized in the KEGG database is connected to many other cellular pathways.

Source: From http://www.genome.ad.jp/kegg.

1.3.2 Enzymatic Reactions

Enzymes are proteins that catalyze biochemical reactions so that they proceed much faster than in aqueous solution, e.g. by factors of many thousands to billions of times. As is the case for any catalyst, the enzyme remains intact after the reaction is complete and can therefore continue to function. Enzymes reduce the activation energy of a reaction, but this affects forward reaction and backward reaction in the same manner. Hence, the relative free energy difference and the equilibrium between the products and reagents are not affected. Compared to other catalysts, enzymatic reactions are carried out in a highly stereo‐, regio‐, and chemoselective and specific manner.

For the binding reaction P + L ↔ PL of a protein P and a ligand L, the binding constantkd:

determines how much of the ligand concentration [L] is bound by the protein (with concentration [P]) under equilibrium conditions. [PL] is the concentration of the protein:ligand complex. The binding constant has the unit M. In the case of a “nanomolar inhibitor,” for example, where a blocking ligand binds to a protein with a kd in the order of 10−9 M, the product of the concentrations of free protein and of free ligand is 109 times smaller than the concentration of the protein–ligand complex. Thus, the equilibrium is very strongly shifted to the complexed form, and only a few free ligand molecules exist. The binding constant kd is also the ratio of the kinetic rates for the backward and forward reactions, koff and kon. The units of the two kinetic rates are M−1s−1 for the forward reaction and s−1 for the backward reaction.

Understanding the fine details of enzymatic reactions is one of the main branches of biochemistry. Fortunately, in the context of cellular simulations, we need not be interested with the enzymatic mechanisms themselves. Here, instead, it is important to characterize the chemical diversity of the substrates a particular enzyme can turn over and to collect the thermodynamic and kinetic constants of all relevant catalytic and binding reactions. A rigorous system to classify enzymatic function is the Enzyme Classification (EC) scheme. It contains four major categories, each divided into three hierarchies of subclassifications.

1.3.3 Signal Transduction

Here, we denote by signal transduction the transmission of a chemical signal such as phosphorylation of a target amino acid. Signal transduction is a very important subdiscipline of cell biology. Hundreds of working groups are looking at separate aspects of signal transduction, and large research consortia such as the Alliance of Cell Signaling have been formed in the past. In humans, about 70% of all proteins get phosphorylated at specific residues in certain conditions. Many proteins can be phosphorylated multiple times at different amino acids. A phosphorylation step often characterizes a transition between active and inactive states. The fraction of phosphorylated versus unphosphorylated proteins can be detected experimentally by mass spectrometry on a genome‐wide level.

1.3.4 Cell Cycle

The cell cycle describes a series of processes in a prokaryotic or eukaryotic cell that leads from one cell division to the next one. The cell cycle is regulated by two types of proteins termed cyclins and cyclin‐dependent kinases. In 2001, the Nobel Prize in Physiology or Medicine was awarded to Leland H. Hartwell, R. Timothy Hunt, and Paul M. Nurse who discovered these central molecules. Broadly speaking, a cell cycle can be grouped into three stages termed interphase, mitosis, and cytokinesis. These can be further split into the following:

The

G

0

phase

. This is a resting phase outside the regular “cell cycle” where the cells exist in a quiescent state.

The

G

1

phase