79,99 €
Computational cell biology courses are increasingly obligatory for biology students around the world but of course also a must for mathematics and informatics students specializing in bioinformatics. This book, now in its second edition is geared towards both audiences. The author, Volkhard Helms, has, in addition to extensive teaching experience, a strong background in biology and informatics and knows exactly what the key points are in making the book accessible for students while still conveying in depth knowledge of the subject.About 50% of new content has been added for the new edition. Much more room is now given to statistical methods, and several new chapters address protein-DNA interactions, epigenetic modifications, and microRNAs.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 763
Veröffentlichungsjahr: 2018
Cover
Preface of the First Edition
Preface of the Second Edition
1
1 Networks in Biological Cells
1.1 Some Basics About Networks
1.2 Biological Background
1.3 Cellular Pathways
1.4 Ontologies and Databases
1.5 Methods for Cellular Modeling
1.6 Summary
1.7 Problems
Bibliography
2 Structures of Protein Complexes and Subcellular Structures
2.1 Examples of Protein Complexes
2.2 Complexome: The Ensemble of Protein Complexes
2.3 Experimental Determination of Three‐Dimensional Structures of Protein Complexes
2.4 Density Fitting
2.5 Fourier Transformation
2.6 Advanced Density Fitting
2.7 FFT Protein–Protein Docking
2.8 Protein–Protein Docking Using Geometric Hashing
2.9 Prediction of Assemblies from Pairwise Docking
2.10 Electron Tomography
2.11 Summary
2.12 Problems
Bibliography
3 Analysis of Protein–Protein Binding
3.1 Modeling by Homology
3.2 Properties of Protein–Protein Interfaces
3.3 Predicting Protein–Protein Interactions
3.4 Summary
3.5 Problems
Bibliography
4 Algorithms on Mathematical Graphs
4.1 Primer on Mathematical Graphs
4.2 A Few Words About Algorithms and Computer Programs
4.3 Data Structures for Graphs
4.4 Dijkstra's Algorithm
4.5 Minimum Spanning Tree
4.6 Graph Drawing
4.7 Summary
4.8 Problems
Bibliography
5 Protein–Protein Interaction Networks – Pairwise Connectivity
5.1 Experimental High‐Throughput Methods for Detecting Protein–Protein Interactions
5.2 Bioinformatic Prediction of Protein–Protein Interactions
5.3 Bayesian Networks for Judging the Accuracy of Interactions
5.4 Protein Interaction Networks
5.5 Protein Domain Networks
5.6 Summary
5.7 Problems
Bibliography
6 Protein–Protein Interaction Networks – Structural Hierarchies
6.1 Protein Interaction Graph Networks
6.2 Finding Cliques
6.3 Random Graphs
6.4 Scale‐Free Graphs
6.5 Detecting Communities in Networks
6.6 Modular Decomposition
6.7 Identification of Protein Complexes
6.8 Network Growth Mechanisms
6.9 Summary
6.10 Problems
Bibliography
7 Protein–DNA Interactions
7.1 Transcription Factors
7.2 Transcription Factor‐Binding Sites
7.3 Experimental Detection of TFBS
7.4 Position‐Specific Scoring Matrices
7.5 Binding Free Energy Models
7.6
Cis
‐Regulatory Motifs
7.7 Relating Gene Expression to Binding of Transcription Factors
7.8 Summary
7.9 Problems
Bibliography
8 Gene Expression and Protein Synthesis
8.1 Regulation of Gene Transcription at Promoters
8.2 Experimental Analysis of Gene Expression
8.3 Statistics Primer
8.4 Preprocessing of Data
8.5 Differential Expression Analysis
8.6 Gene Ontology
8.7 Similarity of GO Terms
8.8 Translation of Proteins
8.9 Summary
8.10 Problems
Bibliography
9 Gene Regulatory Networks
9.1 Gene Regulatory Networks (GRNs)
9.2 Graph Theoretical Models
9.3 Dynamic Models
9.4 DREAM: Dialogue on Reverse Engineering Assessment and Methods
9.5 Regulatory Motifs
9.6 Algorithms on Gene Regulatory Networks
9.7 Summary
9.8 Problems
Bibliography
10 Regulatory Noncoding RNA
10.1 Introduction to RNAs
10.2 Elements of RNA Interference: siRNAs and miRNAs
10.3 miRNA Targets
10.4 Predicting miRNA Targets
10.5 Role of TFs and miRNAs in Gene‐Regulatory Networks
10.6 Constructing TF/miRNA Coregulatory Networks
10.7 Summary
Bibliography
11 Computational Epigenetics
11.1 Epigenetic Modifications
11.2 Working with Epigenetic Data
11.3 Chromatin States
11.4 The Role of Epigenetics in Cellular Differentiation and Reprogramming
11.5 The Role of Epigenetics in Cancer and Complex Diseases
11.6 Summary
11.7 Problems
Bibliography
12 Metabolic Networks
12.1 Introduction
12.2 Resources on Metabolic Network Representations
12.3 Stoichiometric Matrix
12.4 Linear Algebra Primer
12.5 Flux Balance Analysis
12.6 Double Description Method
12.7 Extreme Pathways and Elementary Modes
12.8 Minimal Cut Sets
12.9 High‐Flux Backbone
12.10 Summary
12.11 Problems
Bibliography
13 Kinetic Modeling of Cellular Processes
13.1 Biological Oscillators
13.2 Circadian Clocks
13.3 Ordinary Differential Equation Models
13.4 Modeling Cellular Feedback Loops by ODEs
13.5 Partial Differential Equations
13.6 Dynamic Phosphorylation of Proteins
13.7 Summary
13.8 Problems
Bibliography
14 Stochastic Processes in Biological Cells
14.1 Stochastic Processes
14.2 Dynamic Monte Carlo (Gillespie Algorithm)
14.3 Stochastic Effects in Gene Transcription
14.4 Stochastic Modeling of a Small Molecular Network
14.5 Parameter Optimization with Genetic Algorithm
14.6 Protein–Protein Association
14.7 Brownian Dynamics Simulations
14.8 Summary
14.9 Problems
Bibliography
15 Integrated Cellular Networks
15.1 Response of Gene Regulatory Network to Outside Stimuli
15.2 Whole‐Cell Model of
Mycoplasma genitalium
15.3 Architecture of the Nuclear Pore Complex
15.4 Integrative Differential Gene Regulatory Network for Breast Cancer Identified Putative Cancer Driver Genes
15.5 Particle Simulations
15.6 Summary
Bibliography
16 Outlook
Index
End User License Agreement
Chapter 1
Table 1.1 Data on the genome length and on the number of protein‐coding and RNA ...
Table 1.2 The three graph objects in KEGG.
Table 1.3 Information stored in the BRENDA system for individual biochemical rea...
Table 1.4 Mathematical techniques used in computational cell biology that are co...
Chapter 2
Table 2.1 Composition of
Schizosaccharomyces pombe
SWI/SNF and RSC complexes comp...
Table 2.2 Key data that can be obtained by various experimental techniques relev...
Chapter 3
Table 3.1 Properties of protein–protein interfaces.
Table 3.2 Modeling various cellular processes requires different levels of detai...
Chapter 4
Table 4.1 Table that keeps track of events during execution of Dijkstra's algori...
Chapter 5
Table 5.1 Some public databases compiling data related to protein interactions.
Table 5.2 Bioinformatics methods to predict protein–protein interactions.
Table 5.3 Presence or absence of proteins P1–P7 in the four organisms in Figure ...
Table 5.4 Hamming distances between profiles of proteins P1–P7.
Table 5.5 Essentiality of protein pairs is weakly associated with their tendency...
Table 5.6 The first four columns contain results from high‐throughput experiment...
Table 5.7 The 10 most highly connected
InterPro
domains of
Methanococcus
,
E. coli
Chapter 6
Table 6.1 Predicted TF combinations with a significant increase of expression co...
Chapter 7
Table 7.1 Toy example of six DNA sequences that are 4 bp long.
Table 7.2 Frequency of nucleotide bases at the four positions, cf. Table 7.1.
Table 7.3 Score matrix of nucleotide bases at the four positions, cf. Tables 7.1...
Table 7.4 Toy example of five DNA sequences that are 4 bp long, cf. Table 7.1.
Chapter 8
Table 8.1 Example for Fisher's exact test.
Table 8.2 Symbols used in Fisher's exact test.
Table 8.3 Most extreme inequilibrium, cf. Table 8.1.
Table 8.4 Data distribution to illustrate Mann–Whitney U test.
Table 8.5 Ordered list of the values from Table 8.4.
Table 8.6 Normalization schema explaining quantile normalization algorithm.
Table 8.7 Prevalence of cancer in men and women (see Problem 1).
Table 8.8 Another scenario of Table 8.7 (see Problem 1).
Table 8.9 Expression values of genes G1 and G2 at four subsequent time points ...
Chapter 9
Table 9.1 Gene expression in each of the 10 steady gene activation states in the...
Chapter 10
Table 10.1 Different types of RNA molecules.
Chapter 11
Table 11.1 Toy data to connect epigenetic marks and gene expression status (se...
Table 11.2 Toy data for predicting the gene expression status (see Problem 5).
Chapter 12
Table 12.1 Metabolites most frequently found in the central metabolism of
E. coli
Table 12.2 Information on
E. coli
K‐12 strain MG1655 contained in EcoCyc, version...
Table 12.3 Properties of the E. coli network models
ColiGS
and
ColiCore
.
Table 12.4 Data for Problem 3(d). Inward and outward directed fluxes
b
1
to
b
5
.
Chapter 13
Table 13.1 Survey of biochemical oscillators.
Chapter 1
Figure 1.1 Is this how we should view a biological cell? The point of this sche...
Figure 1.2 (a) Since the 1950s, a paradigm was established, whereby the informat...
Figure 1.3 Structural organization of transcriptional regulatory networks. (a) T...
Figure 1.4 Major metabolic pathways.
Figure 1.5 The glycolysis pathway as visualized in the KEGG database is connecte...
Chapter 2
Figure 2.1 RNA polymerase II is the central enzyme of gene expression. It synth...
Figure 2.2
Spliceosome
: Spliceosome is a cellular “editor” that “cuts and pastes...
Figure 2.3 The ribosome: Model of the large ribosomal subunit from
Haloarcula ma
...
Figure 2.4
Arp 2/3 complex
: The seven‐subunit Arp2/3 complex choreographs the fo...
Figure 2.5 The human apoptosome.
Figure 2.6 Pyruvate dehydrogenase is a huge multienzyme complex comprising 60 co...
Figure 2.7 Schematic representation of the Oct4 homodimer bound to DNA. Oct4 is ...
Figure 2.8 Functional consequences of dimerization and oligomerization. (a) Co...
Figure 2.9 Definition and terminology used to define protein complex architectur...
Figure 2.10 The count of complexes in two
Escherichia coli
complex data sets and...
Figure 2.11 X‐rays are electromagnetic waves in the ultrashort (“hard”) regime...
Figure 2.12 (a) The chromophore of a cyan fluorescent protein (CFP) absorbs ligh...
Figure 2.13 (Top) In this example, the shapes of the X‐ray object (left) and the...
Figure 2.14 The steps involved in density matching. First, the shapes of the two...
Figure 2.15 Reordering an array (here of length 8) by bit reversal. Bit revers...
Figure 2.16 Schematic view of a Laplacian filter.
a
i
−1
jk
,
a
ijk
, and
a
i+1jk
...
Figure 2.17 An example illustrating the effect of a Laplacian filter. The left p...
Figure 2.18 The left picture represents the shape of a protein when discretized ...
Figure 2.19 The docking algorithm SnapDock uses the coordinates of C
α
/C
β
...
Figure 2.20 Flowchart of the CombDock algorithm. The protein subunits shown on t...
Figure 2.21 Principles of electron tomography. (a) The electron beam of an elect...
Figure 2.22 Three‐dimensional density reconstruction from electron tomography ...
Chapter 3
Figure 3.1 Plot illustrates the connection between the sequence similarity of...
Figure 3.2 The plot illustrates the structural similarity of complexes A–B to A′...
Figure 3.3 The plot illustrates the computation of the iRMSD between the complex...
Figure 3.4 Computation of the SASA. A small probe is rolled over the complete su...
Figure 3.5 Arrangement of all copies of a binary protein complex in a three‐di...
Figure 3.6 Interface size in transient protein–protein complexes. Histogram of...
Figure 3.7 Residue propensities at protein dimer interfaces and at artificial ...
Figure 3.8 (a) Schematic diagram of core and rim interface regions. Highlighted ...
Figure 3.9 Schematic illustration of possible shapes of the binding interface ...
Figure 3.10 Electrostatic interaction energy of two oppositely charged particl...
Figure 3.11 (a) Surface representation of the RNAse barnase colored according to...
Figure 3.12 Amino acid propensity matrix of transient protein–protein interfac...
Figure 3.13 Relative occurrence for binding partners of (a) leucine, (b) aspar...
Figure 3.14 (a) Radial pair distribution function of finding two alanine residue...
Figure 3.15 (a) Radial pair distribution function of finding two oppositely char...
Figure 3.16 ConSurf analysis of the β subunit of DNA polymerase III from
Esche
...
Figure 3.17 (a) Schematic drawing of a protein–protein interface involving conta...
Figure 3.18 Identification of correlated mutations. (Top) Family alignments ar...
Figure 3.19 Residue pairs across protein chains with high GREMLIN scores almos...
Chapter 4
Figure 4.1 A mathematical graph consists of vertices and edges. (a) An undirect...
Figure 4.2 Vertices A and D are connected by five paths (A → B → D, A → B → E → ...
Figure 4.3 A labeled tree with seven vertices and six edges connecting them.
Figure 4.7 Example of a minimum spanning tree so that each pair of vertices is c...
Figure 4.8 (a–j) Example illustrating the principles of Kruskal's algorithm. At ...
Figure 4.9 Example illustrating how the force‐directed layout algorithm will dis...
Figure 4.10 Weighted undirected graph (see Problem 4).
Figure 4.11 Schematic graph (see Problem 7).
Chapter 5
Figure 5.1 Schematic result of a gel electrophoresis run. The left lane label...
Figure 5.2 In affinity purification, a protein of interest (bait) is tagged with...
Figure 5.3 The Y2H system is one of the most widely used high‐throughput systems...
Figure 5.4 Results from different methods for complexes involving the cell cyc...
Figure 5.5 The gene cluster method. Genes A, B, and C are arranged linearly as ...
Figure 5.6 The
gene neighborhood method
analyzes the gene order in different evo...
Figure 5.7 If two protein‐coding genes are separated in some species (Sp1, Sp4, ...
Figure 5.8 (a) Proteins C and D are localized in the same compartment and may in...
Figure 5.9 The presence of protein families in various organisms detected, for e...
Figure 5.10 Graphical representation of the Hamming distances between the phyl...
Figure 5.11 Potentially interacting proteins or functionally related proteins sh...
Figure 5.12 An example of a Bayesian network. A directed arc, e.g. between varia...
Figure 5.13 Nine domains (geometrical objects) are the fundamental units of thes...
Figure 5.14 Connectivity of domains to other domains.
Figure 5.15 Resulting gel from a fictitious TAP experiment (see Problem 4).
Chapter 6
Figure 6.1 (a) Protein A interacts with proteins B, C and D. B also interacts w...
Figure 6.2 (a) Cubic lattice and (b) the corresponding distribution
p
(
k
).
Figure 6.3 Degree distribution in random network (a) showing a Poisson distribut...
Figure 6.4 Example illustrating the clustering coefficient on an undirected gr...
Figure 6.5 A clique in a graph is a set of pairwise adjacent vertices or a fully...
Figure 6.6 Result of one of the first large‐scale analyses of the protein intera...
Figure 6.7 In this schematic protein interaction networks, proteins are colored ...
Figure 6.8 Three communities formed by densely connected vertices (circles with ...
Figure 6.9 Modified version of Figure 6.8 where the three dark vertices mediate ...
Figure 6.10 (a) The friendship network from Zachary's karate club study. The ins...
Figure 6.11 The left vertex has a degree of 7 and the right vertex has a degree ...
Figure 6.12 A graph and its modules. In addition to the
trivial modules
{
a
}, {
b
}...
Figure 6.13 (A) Modular decomposition and (B) resulting tree. Vertices
a
,
b
and
Figure 6.14 Modular decomposition of four alternative phosphatase 2A complexes. ...
Figure 6.15 Schematic protein interaction network to illustrate cohesiveness m...
Figure 6.16 The gray nodes in this domain–domain interaction network are the pro...
Figure 6.17 Cell cycle expression profiles of all genes targeted by MET4 or MET3...
Figure 6.18 Fifty‐one representative subgraphs of length 8 (out of 148 subgrap...
Figure 6.19 Partitioning based on edge betweenness (see Problem 14).
Chapter 7
Figure 7.1 X‐ray crystal structures of common structural topologies of eukaryot...
Figure 7.2 Sequence logos for the DNA‐binding motifs that the transcription fa...
Figure 7.3 Resulting gel of an EMSA assay. The “shifted” band shows that the p...
Figure 7.4 Main steps of DNAse footprinting assay.
Figure 7.5 Schema of protein‐binding microarray experiments.
Figure 7.6 Methods for the detection of
cis
‐regulatory modules (CRMs): (a) CRM...
Figure 7.7 The ENCODE project studied how well the occupancy of transcription fa...
Chapter 8
Figure 8.1 Typical promoter region of a prokaryotic gene. The TTGACA and TATAAT...
Figure 8.2 Eukaryotic genomic region containing three genes A, B, and C. Differe...
Figure 8.3 Basic steps of a microarray experiment.
Figure 8.4 Standard normal distribution.
μ
is the mean of the (symmetric) n...
Figure 8.5 Schematic representation of the expression levels of a particular gen...
Figure 8.6 A “volcano plot” visualizes the results of differential expression an...
Figure 8.7 The structure of the gene ontology branch “biological process” is ill...
Figure 8.8 Protocol to determine synthesis rates and protein/mRNA lifetimes. Mou...
Figure 8.9 Kinetic schema to analyze experimental results of Figure 8.8. mRNAs a...
Figure 8.10 (a) Distribution of calculated mRNA transcription rate constants and...
Chapter 9
Figure 9.1 An example of a gene regulatory network. Solid arrows indicate direc...
Figure 9.2 Graph representation of the gene network corresponding to the biochem...
Figure 9.3 Graph representation of the
E. coli
transcriptional regulatory netw...
Figure 9.4 These three gene connectivities may lead to similar observed coexpr...
Figure 9.5 Gene network architecture determining the fate of the floral organ ...
Figure 9.6 Yeast1‐size10 network to test the GRN reconstruction algorithm in t...
Figure 9.7 Two regulation events that were missed by the noise models of the YYA...
Figure 9.8 Connectivity matrix for causal regulation of target gene
j
(row) by...
Figure 9.9 Example of an FFL (
L
‐arabinose utilization in
E. coli
). The global ...
Figure 9.10 Example of a single‐input motif (SIM) system (arginine biosynthesi...
Figure 9.11 Example of a DOR. (a) In this motif, many inputs regulate many out...
Figure 9.12 Largest subnetwork identified as downregulated in the caudate nucl...
Figure 9.13 An illustration of the MDS and MCDS solutions of an example networ...
Figure 9.14 Tightly interwoven network of 17 transcription factors and target ge...
Figure 9.15 Toy Boolean network (see Problem 3).
Chapter 10
Figure 10.1 Basic structural motifs of RNA secondary structure. This RNA consis...
Figure 10.2 Three‐dimensional structure of the VS ribozyme. This ribozyme from t...
Figure 10.3 MicroRNAs (miRNAs) recognize their targets by Watson–Crick base pair...
Figure 10.4 Schematic representation of miRNA biogenesis.
Figure 10.5 Stem‐loop structures of
C. elegans
,
Drosophila melanogaster
, and
H
...
Figure 10.6 Regulatory networks of miRNAs and proteins involved in the control o...
Figure 10.7 Bioinformatics tools are used for different purposes in miRNA resear...
Figure 10.8 FFL and FBL types. (a) Three types of FFLs classified by the master ...
Figure 10.9 Schematic model for TF‐miRNA coregulatory network in cell proliferat...
Figure 10.10 Based on deregulated genes and microRNAs in breast cancer patients,...
Chapter 11
Figure 11.1 Epigenetic marks around the NANOG gene after two days of directed...
Figure 11.2 (Left) Unmethylated cytosine and (right) cytosine methylated in its ...
Figure 11.3 Reversible changes in chromatin organization influence gene expressi...
Figure 11.4 (Left) C5‐methylated cytosine and (right) thymine. The deamination r...
Figure 11.5 Atomic structure of the nucleosome core particle. The two strands ...
Figure 11.6 (Left) Lysine amino acid and (to the right) methylated versions of...
Figure 11.7 Main experimental steps of the ChIP‐seq protocol that is used to ide...
Figure 11.8 Schematic example of CpG methylation in five genes. The sticks indic...
Figure 11.9 Association of comethylation of genes with genomic distance. Only pa...
Figure 11.10 Principle of the DNase 1 hypersensitivity assay.
Figure 11.11 Histone marks may have either activating (a) or repressive (b) effe...
Figure 11.12 The ENCODE project studied how well histone modifications are corre...
Figure 11.13 Basic architecture of an HMM.
X
1
to
X
3
are the possible states of t...
Figure 11.14 A “double‐negative gate” realized in both
Drosophila
and sea urchin...
Figure 11.15 A transcriptional regulatory circuit involving nine transcription f...
Figure 11.16 Schematic illustration of DNA methylation levels at CpG loci surrou...
Figure 11.17 Toy data for protein intensities (see Problem 3).
Figure 11.18 Hierarchy of hematopoietic differentiation stages (see Problem 4).
Chapter 12
Figure 12.1 Flow chart to automatically reconstruct metabolic networks from a ...
Figure 12.2 Simple network and the corresponding stoichiometric matrix. In thi...
Figure 12.3 This example is slightly more complicated than the one in Figure 12....
Figure 12.4 (a) A “pointed” cone spanned by five generating vectors that interse...
Figure 12.5 Strategy for determining optimal states of a biochemical network by ...
Figure 12.6 A torch light illuminates a dark room through an open door.
Figure 12.7 (a) Points belonging to the gray‐shaded area fulfill the condition
x
Figure 12.8 Cube
abcdefgh
represents a cone of solutions satisfying all inequa...
Figure 12.9 Simple metabolic network with four metabolites A–D connected by fo...
Figure 12.10 Construction of the first tableau in step 1.
Figure 12.11 By going from the upper to the lower tableau, rows are being transf...
Figure 12.12 In the upper part of the picture, the tableau
T
(E)
with the exchang...
Figure 12.13 Three extreme pathways are found that span the solution space of th...
Figure 12.14 The figure shows the same network as in Figure 12.3. (a) This is th...
Figure 12.15 Example network with five internal metabolites and eight reaction...
Figure 12.16 For the example network of Fig. 12.15, one obtains six elementary f...
Figure 12.17 Minimal cut sets for repressing synthesis of P in the example netw...
Figure 12.18 (a) Example network of Figure 12.15. (b) Participation of individua...
Figure 12.19 (a) Calculated flux distribution for optimized biomass production o...
Figure 12.20 Schematic illustration of two hypothetical scenarios in which eit...
Figure 12.21 Measured
kY
(
k
) shown as a function of
k
for incoming and outgoing...
Figure 12.22 HFB in the metabolic network of
E. coli
as optimized by flux bala...
Figure 12.23 Example of a metabolic network leading to the production of biomass...
Figure 12.24 Simple metabolic network as in Figure 12.9 (see Problem 2).
Figure 12.25 Simple metabolic network with five exchange fluxes (see Problem 3).
Chapter 13
Figure 13.1 Schematic illustration of an oscillating output of a biological cloc...
Figure 13.2 Minimal components of the mammalian clock. First, the two transcript...
Figure 13.3 Force diagram for a mathematical pendulum consisting of a weight att...
Figure 13.4 Simple model for the synthesis of protein
R
(“response”) under act...
Figure 13.5 For the example of linear response, the steady‐state response
R
ss
de...
Figure 13.6 Simple model for the equilibrium between the phosphorylated form of ...
Figure 13.7 Steady‐state concentration of phosphorylated protein
R
P
as a functi...
Figure 13.8 Response in a phosphorylation/dephosphorylation equilibrium with Mic...
Figure 13.9 (a) Coupling of the initial response pathway via
R
with a second sig...
Figure 13.10 Example of a positive feedback system built from a protein
E
and th...
Figure 13.11 Positive feedback system. As
S
increases, the response is low until...
Figure 13.12 (a, b) Positive feedback system, termed “toggle switch.”
Figure 13.13 (a, b) Negative feedback system.
Figure 13.14 (a) Three‐component system with feedback loop. (b) Feedback loop le...
Figure 13.15 Wiring diagram of the cell cycle regulation in eukaryotes. Major ev...
Figure 13.16 (a) Three‐component system with feedback loop (cf. Figure 13.10). (...
Figure 13.17 Toggle switch in G
2
/M phase involving the activating interaction of...
Figure 13.18 Spatial segregation of two opposing enzymes in a protein modificati...
Figure 13.19 Topology of a Boolean network for fission yeast.
Figure 13.20 Schematic view of the two ODE model (see Problem 1).
Chapter 14
Figure 14.1 Schema illustrating detailed balance for a system with two states,
Figure 14.2 (a) Histogram showing the expression level of a fluorescent reporter...
Figure 14.3 Stochastic simulation of single‐gene expression using the Gillespie ...
Figure 14.4 Toggle switch design by Gardner and Collins. Repressor 1 inhibits tr...
Figure 14.5 In stochastic simulations of the system shown in Figure 14.4, initia...
Figure 14.6 Having both inducers present, the two genes switch on and off, depen...
Figure 14.7 Artistic textbook‐style rendering of the photosynthetic apparatus of...
Figure 14.8 (a) A reconstructed chromatophore model vesicle of 45 nm diameter. T...
Figure 14.9 “Pools‐and‐proteins” view of bacterial photosynthesis. The network i...
Figure 14.10 Reactions modeled in the RC. The respective rate constants are
A
1
(...
Figure 14.11 Reactions modeled in the
bc
1
complex. The respective rate constants...
Figure 14.12 Rate of ATP production per second as a function of light intensity....
Figure 14.13 Number of reduced cytochrome
c
2
particles in the stochastic simulat...
Figure 14.14 Optimization of kinetic system parameters by an evolutionary algori...
Figure 14.15 Comparison between different experimental data for the time‐depende...
Figure 14.16 Schematic illustration of particles undergoing undirected Brownian ...
Figure 14.17 Definition of different criteria to describe the relative orientati...
Figure 14.18 Schematic representation of the free energy for protein–protein int...
Figure 14.19 Occupancy maps for barstar at various distances
cd
avg
from barnase....
Figure 14.20 (a) Electrostatic interaction energy between barnase and barstar at...
Figure 14.21 A simple reaction network (see Problem 1).
Figure 14.22 Constructing a stoichiometric matrix for reaction network of Figure...
Figure 14.23 Concentrations of metabolites
A
and
D
at various time points.
Figure 14.24 Simplified signaling cascade (Problem 2).The reactions labeled
R
0
t...
Chapter 15
Figure 15.1 Dynamic representation of the transcriptional network of
Saccharomy
...
Figure 15.2
M. genitalium
whole‐cell model consisting of 28 integrated submodels...
Figure 15.3 DNA‐binding and dissociation dynamics of the oriC‐DnaA complex (red)...
Figure 15.4 Whole‐cell simulations of mutant
M. genitalium
strains. Single‐gene ...
Figure 15.5 Determining the architecture of the nuclear pore complex by integrat...
Figure 15.6 Representation of the optimization process of the NPC molecular arch...
Figure 15.7 Integrative network‐based approach to understand breast carcinogenes...
Figure 15.8 Gene network modules of TF–gene interactions. (a) Topological overla...
Figure 15.9 Regulatory interactions involving the 17 key driver genes identified...
Figure 15.10 (a) Schematic illustration of
Mycoplasma genitalium
(
MG
). (b)
MG
h
s...
Cover
Table of Contents
Begin Reading
iii
vi
2
3
4
xv
xvi
xvii
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
89
90
91
92
99
100
101
102
103
104
105
106
107
108
109
110
111
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
136
137
138
139
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
427
428
429
429
430
431
432
433
434
435
436
437
438
439
440
441
442
Volkhard Helms
Second Edition
Author
Volkhard Helms
Universität des Saarlandes
Zentrum für Bioinformatik
66041 Saarbrücken
Germany
All books published by Wiley‐VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.
Library of Congress Card No.:
applied for
British Library Cataloguing‐in‐Publication Data
A catalogue record for this book is available from the British Library.
Bibliographic information published by
the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at <http://dnb.d‐nb.de>.
© 2019 Wiley‐VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Print ISBN: 978‐3‐527‐33358‐5
ePDF ISBN: 978‐3‐527‐81033‐8
ePub ISBN: 978‐3‐527‐81032‐1
This book grew out of a course for graduate students in the first year of the MSc bioinformatics program that the author teaches every year at Saarland University. Also included is some material from a special lecture on cell simulations. The book is designed as a textbook, placing emphasis on transmitting the main ideas of a problem, outlining algorithmic strategies for solving these, and describing possible complications or connections to other parts of the book. The main challenge during the writing of the book was the concentration on conceptual points that may be of general educative value rather than including the latest research results of this fascinating fast‐moving field. It is considered more important for a textbook to give a cohesive picture rather than mentioning all possible drawbacks and special cases where particular general guidelines may not apply. We apologize to those whose work could not be mentioned because of space constraints.
The intended audience includes students of bioinformatics and from life science disciplines. Consequently, some basic knowledge in molecular biology is taken for granted. The language used is not very formal. Previous knowledge of computer science is not required, but a certain adeptness in basic mathematics is necessary. The book introduces all of the mathematical concepts needed to understand the material covered. In particular, Chapter 2 introduces mathematical graphs and algorithms on graphs used in classifying protein–protein interaction networks. Chapter 6 introduces linear and convex algebra typically being used in the description of metabolic networks. Chapter 7 discusses ordinary and stochastic differential equations used in the kinetic modeling of signal transduction pathways. Chapter 8 introduces the method of Fourier transformation for protein–protein docking and pattern matching. Also introduced are Bayesian networks in Chapter 4 as a way to judge the reliability of protein–protein interactions and inference techniques to model gene regulatory networks. We note, however, that the emphasis of this book is placed on discrete mathematics rather than on statistical methods. Not included yet are classical network flow algorithms such as Menger's theorem or the max‐flow min‐cut theorem as they are currently rarely used in cellular modeling. The book focuses on proteins and the genes coding for them, as well as on metabolites. Less room is given to DNA, RNA, or lipid membranes that would, of course, also deserve a great deal of attention. The main reason for this was to provide a homogenous background for discussing algorithmic concepts.
The author is very grateful to Dr. Tihamér Geyer who coordinated the assignments for the lectures for valuable comments on the manuscript and for many solved examples and problems for this book. The following coworkers from Saarbrücken and elsewhere have provided valuable suggestions on different portions of the text: Kerstin Kunz, Jan Christoph, and Florian Lauck. I thank Dr. Hawoong Jeong, Dr. Julio Collado‐Vides, Dr. Agustino Martínez‐Antonio, Dr. Ruth Sperling, Dr. James R. Williamson, Dr. Joanna Trylska, Dr. Claude Antony, and Dr. Nicholas Luscombe for sending me high‐resolution versions of their graphics. I thank Dr. Andreas Sendtko and the publishing staff at Wiley‐VCH for their generous support of this book project, for their seemingly endless patience during the revision stage, and for excellent typesetting.
I also thank the Center of Theoretical Biophysics at the University of California, San Diego, for their hospitality during a sabbatical visit in summer 2007 that finally allowed to complete this work. Finally, this book would not have been possible without the support and patience of my wife Regina and our two daughters.
March 2008
Volkhard Helms
Center for Bioinformatics
Saarland University
Saarbrücken, Germany
About 10 years after the publication of the first edition, I finally managed to prepare this expanded second edition of this book. Its main spirit remained the same: it is designed as a textbook, placing emphasis on transmitting the main ideas of a problem, outlining algorithmic strategies for solving these, and describing possible complications or connections to other parts of the book. Because of the feedback from colleagues, I have reordered the content, starting now in Chapter 2 with an introduction into the structures of protein–protein complexes before we enter into the world of protein interaction networks. I refrain from listing all the rearrangements here. Usually, I tried to keep subsections intact and simply shifted them around. A few sections were removed from the text because I now felt that they were too specialized. About 50% of new content has been added. In terms of mathematical methods, much more room is now given to statistical methods. In terms of biology, several new chapters now address protein–DNA interactions, epigenetic modifications, and microRNAs. Still not covered are biophysical topics related to intracellular transport, cytoskeletal dynamics, and processes taking place at and across biological membranes. Maybe, there will be a need for a third edition eventually?
In addition to those who contributed to the first edition, the author is very grateful to Thorsten Will and Maryam Nazarieh for solved examples and problems for this book. The following coworkers from Saarbrücken and elsewhere have provided valuable suggestions on different portions of the text: Mohamed Hamed Fahmy, Dania Humaidan, Olga Kalinina, Heiko Rieger, and Thorsten Will. I thank my group members of the past years with whom I had the privilege to work on exciting research projects related to the content of this book and I thank our secretary Kerstin Gronow‐Pudelek for technical assistance.
April 2018
Volkhard Helms
Center for Bioinformatics
Saarland University
Saarbrücken, Germany
1
Problems: To really absorb the content of this textbook, it is advisable to also try to solve some of the problems enclosed.
Modern molecular and cell biology has worked out many important cellular processes in more detail, although some other areas are known to a lesser extent. It often remains to understand how the individual parts are connected, and this is exactly the focus of this book. Figure 1.1 displays a cartoon of a cell as a highly viscous soup containing a complicated mixture of many particles. Certainly, several important details are left out here that introduce a partial order, such as the cytoskeleton and organelles of eukaryotic cells. Figure 1.1 reminds us that there is a myriad of biomolecular interactions taking place in biological cells at all times and that it is pretty amazing how a considerable order is achieved in many cellular processes that are all based on pairwise molecular interactions.
Figure 1.1Is this how we should view a biological cell? The point of this schematic picture is that about 30% of the volume of a biological cell is taken up my millions of individual proteins. Therefore, biological cells are really “full.” However, of course, such pictures do not tell us much about the organization of biological processes. As we will see later in this book, there are many different hierarchies of order in such a cell.
The focus of this book is placed on presenting mathematical descriptions developed in recent years to describe various levels of cellular networks. We will learn that many biological processes are tightly interconnected, and this is exactly where many links still need to be discovered in further experimental studies. Many researchers in the field of molecular biology believe that only combined efforts of modern experimental techniques, mathematical modeling, and bioinformatics analysis will be able to arrive at a sufficient understanding of the biological networks of cells and organisms.
In this chapter, we will start with some principles of mathematical networks and their relationship with biological networks. Then, we will briefly look at several biological key players to be used in the rest of this book (cells, compartments, proteins, and pathways). Without going into any further detail, we will directly move into the field of network theory with the amazing “small‐world phenomenon.”
Network theory is a branch of applied mathematics and more of physics that uses the concepts of graph theory. Its developments are led by application to real‐world examples in the areas of social networks (such as networks of acquaintances or among scientists having joint publications), technological networks (such as the World Wide Web that is a network of web pages and the Internet that is a network of computers and routers or power grids), and biological networks (such as neural networks and metabolic networks).
In a random network, every possible link between two “vertices” (or nodes) A and B is established according to a given probability distribution irrespective of the nature and connectivity of the two vertices A and B. This is what is “random” about these networks. If the network contains n vertices in total, the maximal number of undirected edges (links) between them is n × (n − 1)/2. This is because we can pick each of the n vertices as the first vertex of an edge, and there are (n − 1) other vertices that this vertex can be connected to. In this way, we will actually consider each edge twice, using each end point as the first vertex. Therefore, we need to divide the number of edges by 2.
If every edge is established with a probability p ∈ [0, 1], the total number of edges in an undirected graph is p × n × (n − 1)/2. The mathematics of random graphs was developed and elucidated by two Hungarian mathematicians Erdös and Renyi. However, the analysis of real networks showed that such networks often differ significantly from the characteristics of random graphs. We will turn back to random graphs in Section 6.3.
The term small‐world phenomenon was coined to describe the observation that everyone in the world is linked to some other person through a short chain of social acquaintances. In a small‐world experiment, the psychologist Stanley Milgram found in 1967 that, on average, any two US citizens randomly picked were connected to each other by only six acquaintances. Vertices in a network have short average distances. Usually, the distance between the nodes scales logarithmically with the total number, n, of the vertices.
In a paper published in the journal Nature in 1998, the two mathematicians Duncan J. Watts and Steven H. Strogatz (Watts and Strogatz, 1998) reported that small‐world networks are common in many different areas ranging from neuronal connections of the worm Caenorhabditis elegans to power grids.
Only one year after the discovery of Watts and Strogatz, Albert‐László Barabási from the Physics Department at the University of Notre Dame introduced an even simpler model for the emergence of the small‐world phenomenon (Barabási and Albert 1999). Although Watts and Strogatz's model was able to explain the short average path length and the dense clustering coefficient of a small world (all these terms will be introduced in Chapter 6), it did not manage to explain another property that is typical for real‐world networks such as the Internet: these networks are scale‐free. In simple terms, this means that although the vast majority of vertices are weakly connected, there also exist some highly interconnected super‐vertices or hubs. The term scale‐free expresses that the ratio of highly to weakly connected vertices remains the same irrespective of the total number of links in the network. We will see in Section 6.4 that the connectivity of scale‐free networks follows a power law. If a network is scale‐free, it is also a small world.
In this paper, Barabási and Albert presented a strikingly simple and intuitive algorithm that generates networks with a scale‐free topology. It has two essential elements:
Growth
. The network is started from a small number of (at least two) connected vertices. At every iteration step, a new vertex is added that forms links to
m
of the existing vertices.
Preferential attachment
. One assumes that the probability of a link between a newly added vertex and an existing vertex
i
depends on the degree of
i
(the number of existing links between vertex
i
and other vertices). The more connections
i
has already, the more likely the new vertices will link to
i
. This behavior is described by the saying “the rich become richer.” Let us motivate this on the fictitious example of the early days of air traffic. Initially, one needs to build two airports so that a first regular flight connection can be established between them. Eventually, a third airport is established. Most likely, initially, only one new flight will go to either one of the existing airports. Now, the situation is unbalanced. Now, there exists one airport that is connected to two other cities, and the airports of those cities are only connected to one city. There is a certain chance that, after some time, the “missing” connection between the new airport and the other airport would be introduced, which would lead to a balanced situation again. Alternatively, a fourth airport could emerge that would also start by establishing only one flight to one of the existing airports. Now, the airport that already has two connections would have an obvious practical advantage because passengers taking this route simply have more options to carry on. Therefore, the chance that this flight is established is higher than for the other connections. Exactly, this idea is captured by the concept of preferential attachment.
The same growth mechanism applies, for example, to the World Wide Web. Obviously, this network grows constantly over time, and many new pages are added to it every moment. We know from our own experience that once a new web page is created, its owner will most likely include links to other popular pages (hubs) on the new page so that the second “rule” is also fulfilled.
In the early exciting days of network theory when the study of large‐scale networks took off like a storm, it was even suggested that the scale‐free network model may be something like a law of nature that controls how natural small‐world networks are formed. However, subsequent work on integrated biological networks showed that the concept of scale‐free networks may rather be of theoretical value and that it may not be directly applicable to certain biological networks. For the moment, we will consider the idea of network topology (scale‐free networks and small‐world phenomenon) as a powerful concept that is useful for understanding the mechanism of network growth and vulnerability.
Figure 1.2(a) Since the 1950s, a paradigm was established, whereby the information flows from DNA over RNA to protein synthesis, which then gives rise to particular phenotypes. (b) The emergence of structural biology – the first crystal structure of the protein myoglobin was determined in 1960 – emphasized the importance of the three‐dimensional structures of proteins determining their function. (c) Today, we have realized the central role played by molecular interactions that influence all other elements.
Until recently, the paradigm of molecular biology was that genetic information is read from the genomic DNA by the RNA polymerase complex and is transcribed into the corresponding RNA. Ribosomes then bind to messenger RNA (mRNA) snippets and produce amino acid strands. This process is called translation. Importantly, the paradigm involved the notion that this entire process is unidirectional, seeFigure 1.2.
It is now well established that many feedback loops are provided in this system too, e.g. by the proteins known as transcription factors that bind to sequence motifs on the genomic DNA and mediate (activate or repress) transcription of certain genomic segments. Important discoveries of the past 20 years showed that cellular mRNA concentrations are also largely affected by small RNA snippets termed microRNAs and that the chromatin structure is shaped by epigenetic modifications of the DNA and histone proteins that control the accessibility of genomic regions. The cellular network therefore certainly appears much more complicated today than it did 60 years ago.
This brings us to the world of gene regulatory networks. Collecting the required information on the regulation of individual genes is a subject of intense active research. For example, the ENCODE project for human cells and the modENCODE project for the model organisms C. elegans and Drosophila melanogaster mapped the binding sites of hundreds of transcription factors throughout the genomes. Also, the FANTOM initiative started in Japan is a worldwide collaborative project aiming at identifying all the functional elements in mammalian genomes. However, occupancy maps of transcription factors alone are not being considered as compelling evidence of biologically functional regulation. To really prove or disprove which gene is activated or repressed by a particular transcription factor (or microRNA), one could create a knockout organism lacking the gene coding for this transcription factor and see which genes are no longer expressed or are now expressed in excess. Such genome‐wide deletion libraries have actually been produced for the model organism Saccharomyces cerevisiae. However, in this way, we can only discover those combinations that are not lethal for the organism. Also, pairs or larger assemblies of transcription factors often need to bind simultaneously. It simply appears impossible to discover the full connectivity of this regulatory network by a traditional one‐by‐one approach. Fortunately, modern microarray and RNAseq experiments probe the expression levels of many genes simultaneously. Ongoing challenges are the noisy nature of the large‐scale data and the fact that genes actually do not interact directly with each other. Analysis of gene expression data will be discussed in Chapter 8.
In this book, we will be mostly concerned with the following four types of biological cellular networks: protein–protein interaction networks, gene regulatory networks, signal transduction networks, and metabolic networks. We will discuss them at different hierarchical levels as shown in Figure 1.3 using the example of regulatory networks.
Figure 1.3Structural organization of transcriptional regulatory networks. (a) The “basic unit” comprises the transcription factor, its target gene with a DNA recognition site, and the regulatory interaction between them. (b) Units are often organized into network “motifs” that comprise specific patterns of inter‐regulation that are overrepresented in networks. Examples of motifs include single‐input/multiple output (SIM), multiple input/multiple output (MIM), and feed‐forward loop (FFL) motifs. (c) Network motifs can be interconnected to form semi‐independent “modules,” many of which have been identified by integrating regulatory interaction data with gene expression data and imposing evolutionary conservation. The next level consists of the entire network (not shown).
Source: Babu et al. (2004). Drawn with permission of Elsevier.
Cells can be described at various levels in detail. We will mostly use three different levels of description:
Inventory lists and lists of processes
.
Proteins in particular compartments
Proteins forming macromolecular complexes
Biomolecular interactions
Regulatory interactions
Metabolic reactions
Structural descriptions
.
Structures of single proteins
Topologies of protein complexes
Subcellular compartments
Dynamic descriptions
.
Cellular processes ranging from nanosecond dynamics for the association of two biomolecules up to processes occurring in seconds and minutes such as the cell division of yeast cells.
We will assume that the reader has a basic knowledge about the organic molecules commonly found within living cells and refer those who do not to basic books on biochemistry or molecular biology. Depending on their role in metabolism, the biomolecules in a cell can be grouped into several classes.
Macromolecules
including nucleic acids, proteins, polysaccharides, and certain lipids.
The
building blocks
of macromolecules include sugars as the precursors of polysaccharides, amino acids as the building blocks of proteins, nucleotides as the precursors of nucleic acids (and therefore of DNA and RNA), and fatty acids that are incorporated into lipids. Interestingly, in biological cells, only a small number of theoretically synthesizable macromolecules exist at a given time point. At any moment during a normal cell cycle, many new macromolecules need to be synthesized from their building blocks, and this is meticulously controlled by the complex gene expression machinery. Even during a steady state of the cell, there exists a constant turnover of macromolecules.
Metabolic intermediates (metabolites
)
. Many molecules in a biological cell have complex chemical structures and must be synthesized in several reactions from specific starting materials that may be taken up as the energy source. In the cell, connected chemical reactions are often grouped into metabolic pathways (Section
1.3
).
Molecules of
miscellaneous function
including vitamins, steroid hormones, molecules that can store energy storage such as ATP, regulatory molecules, and metabolic waste products.
Almost all biological materials that are needed to construct a biological cell are either synthesized by the RNA polymerase and ribosome machinery of the cell or are taken up from the outside via the cell membrane. Therefore, as a minimum inventory, every cell needs to contain the construction plan (DNA), a processing unit to transcribe this information into mRNA (polymerase), a processing unit to translate these mRNA pieces into protein (ribosome), and transporter proteins inside the cell membrane that transport material through the cell membrane.
Organization into various compartments greatly simplifies the temporal and spatial process flow in eukaryotic cells. As mentioned above, at each time point during a cell cycle, only a small subfraction of all potential proteins is being synthesized (and not yet degraded). Also, many proteins are only available in very small concentrations, possibly with only a few copies per cell. However, localizing these proteins to particular spots in the cell, e.g. by attaching them to the cytoskeleton or by partitioning them into lipid rafts, their local concentrations may be much higher. We assume that the reader is vaguely familiar with the compartmentalization of eukaryotic cells involving the lysosome, plasma membrane, cell membrane, Golgi complex, nucleus, smooth endoplasmic reticulum, mitochondrion, nucleolus, rough endoplasmic reticulum, and cytoskeleton.
An important element of cellular organization is the active transport of macromolecules along the microtubules of the cytoskeleton that is carried out by molecular motor proteins such as kinesin and dynein. Here, we will not address the activities of molecular motors because this is rather a research topic in biophysics.
Table 1.1 presents some statistics of the organisms considered in this book.
Table 1.1 Data on the genome length and on the number of protein‐coding and RNA genes are taken from the Kyoto Encyclopedia of Genes and Genomes database (April 2018); data on the number of putative transporter proteins are taken from www.membranetransport.org.
Organism
Length of genome (Mb)
Number of protein‐coding genes
Number of RNA genes
Number of transporter proteins
Prokaryotes
Mycoplasma genitalium
G37
0.6
476
43
53
Bacillus subtilis
BSN5
4.2
4 145
113
552
Escherichia coli
APEC01
4.6
4 890
93
665
Eukaryotes
Saccharomyces cerevisiae
S288C
1.3
6 002
425
341
Drosophila melanogaster
12
13 929
3 209
662
Caenorhabditis elegans
100.2
20 093
24 969
669
Homo sapiens
3 150
20 338
19 201
1 467
Metabolism denotes the entirety of biochemical reactions that occur within a cell (Figure 1.4). In the past century, many of these reactions have been organized into metabolic pathways. Each pathway consists of a sequence of chemical reactions that are catalyzed by specific enzymes, and the outcome of one reaction is the input for the next one. Unraveling the individual enzymatic reactions was one of the big successes of applying biochemical methods to cellular processes. Metabolic pathways can be divided into two broad types. Catabolic pathways disintegrate complex molecules into simpler ones, which can be reused for synthesizing other molecules. Also, catabolic pathways provide chemical energy required for many cellular processes. This energy may be stored temporarily as high‐energy phosphates (primarily in ATP) or as high‐energy electrons (primarily in NADPH). Conversely, anabolic pathways synthesize more complex substances from simpler starting reagents by utilizing the chemical energy generated by exergonic catabolic pathways.
Figure 1.4Major metabolic pathways.
The traditional biochemical pathways were often derived from studying simple organisms where these pathways constitute a dominating part of the metabolic activity. For example, the glycolysis pathway was discovered in yeast (and in muscle) in the 1930s. It describes the disassembly of the nutrient glucose that is taken up by many microorganisms from the outside. Figure 1.5 shows the glycolysis pathway in Homo sapiens as represented in the KEGG database (Kanehisa et al. 2016).
Figure 1.5The glycolysis pathway as visualized in the KEGG database is connected to many other cellular pathways.
Source: From http://www.genome.ad.jp/kegg.
Enzymes are proteins that catalyze biochemical reactions so that they proceed much faster than in aqueous solution, e.g. by factors of many thousands to billions of times. As is the case for any catalyst, the enzyme remains intact after the reaction is complete and can therefore continue to function. Enzymes reduce the activation energy of a reaction, but this affects forward reaction and backward reaction in the same manner. Hence, the relative free energy difference and the equilibrium between the products and reagents are not affected. Compared to other catalysts, enzymatic reactions are carried out in a highly stereo‐, regio‐, and chemoselective and specific manner.
For the binding reaction P + L ↔ PL of a protein P and a ligand L, the binding constantkd:
determines how much of the ligand concentration [L] is bound by the protein (with concentration [P]) under equilibrium conditions. [PL] is the concentration of the protein:ligand complex. The binding constant has the unit M. In the case of a “nanomolar inhibitor,” for example, where a blocking ligand binds to a protein with a kd in the order of 10−9 M, the product of the concentrations of free protein and of free ligand is 109 times smaller than the concentration of the protein–ligand complex. Thus, the equilibrium is very strongly shifted to the complexed form, and only a few free ligand molecules exist. The binding constant kd is also the ratio of the kinetic rates for the backward and forward reactions, koff and kon. The units of the two kinetic rates are M−1s−1 for the forward reaction and s−1 for the backward reaction.
Understanding the fine details of enzymatic reactions is one of the main branches of biochemistry. Fortunately, in the context of cellular simulations, we need not be interested with the enzymatic mechanisms themselves. Here, instead, it is important to characterize the chemical diversity of the substrates a particular enzyme can turn over and to collect the thermodynamic and kinetic constants of all relevant catalytic and binding reactions. A rigorous system to classify enzymatic function is the Enzyme Classification (EC) scheme. It contains four major categories, each divided into three hierarchies of subclassifications.
Here, we denote by signal transduction the transmission of a chemical signal such as phosphorylation of a target amino acid. Signal transduction is a very important subdiscipline of cell biology. Hundreds of working groups are looking at separate aspects of signal transduction, and large research consortia such as the Alliance of Cell Signaling have been formed in the past. In humans, about 70% of all proteins get phosphorylated at specific residues in certain conditions. Many proteins can be phosphorylated multiple times at different amino acids. A phosphorylation step often characterizes a transition between active and inactive states. The fraction of phosphorylated versus unphosphorylated proteins can be detected experimentally by mass spectrometry on a genome‐wide level.
The cell cycle describes a series of processes in a prokaryotic or eukaryotic cell that leads from one cell division to the next one. The cell cycle is regulated by two types of proteins termed cyclins and cyclin‐dependent kinases. In 2001, the Nobel Prize in Physiology or Medicine was awarded to Leland H. Hartwell, R. Timothy Hunt, and Paul M. Nurse who discovered these central molecules. Broadly speaking, a cell cycle can be grouped into three stages termed interphase, mitosis, and cytokinesis. These can be further split into the following:
The
G
0
phase
. This is a resting phase outside the regular “cell cycle” where the cells exist in a quiescent state.
The
G
1
phase
