Crumbling Genome - Alexey S. Kondrashov - E-Book

Crumbling Genome E-Book

Alexey S. Kondrashov

0,0
88,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A thought-provoking exploration of deleterious mutations in the human genome and their effects on human health and wellbeing Despite all of the elaborate mechanisms that a cell employs to handle its DNA with the utmost care, a newborn human carries about 100 new mutations, originated in their parents, about 10 of which are deleterious. A mutation replacing just one of the more than three billion nucleotides in the human genome may lead to synthesis of a dysfunctional protein, and this can be inconsistent with life or cause a tragic disease. Several percent of even young people suffer from diseases that are caused, exclusively or primarily, by pre ]existing and new mutations in their genomes, including both a wide variety of genetically simple Mendelian diseases and diverse complex diseases such as birth anomalies, diabetes, and schizophrenia. Milder, but still substantial, negative effects of mutations are even more pervasive. As of now, we possess no means of reducing the rate at which mutations appear spontaneously. However, the recent flood of genomic data made possible by next-generation methods of DNA sequencing, enabled scientists to explore the impacts of deleterious mutations on humans with previously unattainable precision and begin to develop approaches to managing them. Written by a leading researcher in the field of evolutionary genetics, Crumbling Genome reviews the current state of knowledge about deleterious mutations and their effects on humans for those in the biological sciences and medicine, as well as for readers with only a general scientific literacy and an interest in human genetics. * Provides an extensive introduction to the fundamentals of evolutionary genetics with an emphasis on mutation and selection * Discusses the effects of pre-existing and new mutations on human genotypes and phenotypes * Provides a comprehensive review of the current state of knowledge in the field and considers crucial unsolved problems * Explores key ethical, scientific, and social issues likely to become relevant in the near future as the modification of human germline genotypes becomes technically feasible Crumbling Genome is must-reading for students and professionals in human genetics, genomics, bioinformatics, evolutionary biology, and biological anthropology. It is certain to have great appeal among all those with an interest in the links between genetics and evolution and how they are likely to influence the future of human health, medicine, and society.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 640

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Preface

1 Genotypes and Phenotypes

1.1 DNA is a Text

1.2 Genomes Small and Large

1.3 Genes and Intergenic Regions

1.4 Cells, Mitosis, and Meiosis

1.5 From Genotype to Phenotype

Further Reading

2 Mendelian Inheritance and Population Genetics

2.1 Inheritance is Discrete

2.2 Populations are Genetically Variable

2.3 Loci and Genes

2.4 Effects of Alleles on Phenotypes

2.5 Mendelian Traits and Diseases

Further Reading

3 Complex Traits and Their Inheritance

3.1 Complex Inheritance of Phenotypes

3.2 Properties of a Complex Trait

3.3 Complex Traits in Populations

3.4 Effects of Heredity and Environment on Complex Traits

3.5 Polymorphic Loci Behind Complex Variation

Further Reading

4 Unavoidable Mutation

4.1 Phenomenon of Mutation

4.2 Kinds of Mutations

4.3 Spontaneous Mutation

4.4 Evolution of Mutation Rates

4.5 Artificial Mutagenesis and Antimutagenesis

Further Reading

5 Struggle for Fidelity

5.1 Fidelity of DNA Replication

5.2 Cleaning Up After the Replisome

5.3 Dealing with DNA Damages

5.4 Harms of Broken Maintenance

5.5 Mechanisms of Mutation

Further Reading

6 Mutation Rates

6.1 Measuring Mutation Rates

6.2 Data on Mutation Rates

6.3 Guilty Older Men

6.4 Rates of Phenotypically Drastic Mutations

6.5 Mild Mutations and Mutational Pressures

Further Reading

7 Natural Selection

7.1 Vulnerable Adaptations and Their Evolutionary Origin

7.2 Two Basic Characteristics of Selection

7.3 Measuring Natural Selection

7.4 Selection at a Polymorphic Locus

7.5 Selection on a Quantitative Trait

Further Reading

8 Functioning DNA and Junk DNA

8.1 Selective Neutrality and Random Drift

8.2 Effective Population Size

8.3 Junk DNA Provides the Simplest Evidence for Evolution

8.4 Finding Functioning Genome Segments

8.5 The Genomic Rate of Deleterious Mutations

Further Reading

9 It Takes All the Running You Can Do

9.1 Middle Class Neighborhood for

Drosophila

9.2 Selection Against Deleterious Alleles

9.3 Mutation–Selection Equilibrium

9.4 Inbreeding Depression

9.5 Dangerous Slightly Deleterious Alleles

Further Reading

10 Phenomenon of Imperfection

10.1 Phenotypic and Genotypic Imperfection

10.2 Five Evolutionary Causes of Imperfection

10.3 Weakly Perfect Human Genotypes and Phenotypes

10.4 Native, Novel, and Optimal Environments

10.5 Factors, Exacerbating Mutation Imperfection

Further Reading

11 Our Imperfect Fitness

11.1 Properties of an Allele

11.2 Human Derived Alleles

11.3 Average Imperfection of a Genotype

11.4 Variation Among Genotypes

11.5 Selection in Modern Human Populations

Further Reading

12 Our Imperfect Wellness

12.1 Qualitative Characteristics of Wellness

12.2 Quantitative Traits

12.3 Contributions of Heredity and Environment

12.4 Wellness‐impairing Alleles

12.5 Genetic Architecture of Wellness

Further Reading

13 Mutational Pressure on Our Species

13.1 Mutational Pressure on Diseases

13.2 Mutational Pressure on Quantitative Traits

13.3 Possible Increase of the Mutational Pressure

13.4

De Novo

Mutations and Human Wellness

13.5 Optimistic and Pessimistic Scenarios

Further Reading

14 Ethical Issues

14.1 Lessons from History

14.2 Modern Practices

14.3 Humanist Ethics and the Main Concern

14.4 The Main Concern and Ethical Dilemmas

14.5 Role of Scientists

Further Reading

15 What to Do?

15.1 Conditionally Beneficial or Unconditionally Deleterious?

15.2 Mutationless Utopia: What Could It Be?

15.3 Mutationless Utopia: Is It Ever Going to Happen?

15.4 What Can I Do Without Germline Genotype Modification?

15.5 Prognosis

Further Reading

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 Key properties of some representative genomes.

Table 1.2 Genetic code.

Chapter 02

Table 2.1 Genetic diversity within populations of several species.

Table 2.2 A sample of derived alleles affecting the factor IX‐encoding gene.

Chapter 03

Table 3.1 Data on some complex traits in human populations.

Table 3.2 The case of no inheritance of genetic influences (see text).

Chapter 05

Table 5.1 Spontaneous damages to a human genotype.

Table 5.2 Mechanisms of spontaneous mutation.

Chapter 06

Table 6.1 Data on rates of small‐scale mutations in some multicellular eukaryotes.

Table 6.2 Mutational pressures on some quantitative traits.

Chapter 08

Table 8.1 Estimated effective sizes of populations of some species.

Chapter 09

Table 9.1 Mutational pressures measured in the MCN experiment.

Chapter 10

Table 10.1 Five evolutionary causes of imperfection.

Chapter 11

Table 11.1 Correlations between the five properties of derived alleles.

Table 11.2 Numbers of human polymorphisms, observed in a sample of 10 000 diploid genotypes, classified by the kinds of DNA sequence alterations.

Table 11.3 Numbers of human polymorphisms, observed in a sample of 10 000 diploid genotypes, classified by their effects on molecular function.

Table 11.4 Average per diploid genotype numbers of pre‐existing derived alleles and

de novo

mutations, classified by the kinds of the DNA sequence alteration.

Table 11.5 Average per diploid genotype numbers of pre‐existing derived alleles and

de novo

mutations, classified by their effects on molecular function.

Table 11.6 Selection against human deleterious pre‐existing alleles and

de novo

mutations.

Chapter 12

Table 12.1 Birth frequencies of Mendelian diseases.

Table 12.2 Frequencies of birth anomalies in Europe (after Dolk, Loane & Garne,

Advances in Experimental Medicine and Biology

, vol. 686, p. 349, 2010).

Table 12.3 Frequencies of diseases and impairments that are manifested before the age of 30.

Table 12.4 Self‐reported disability in Canada in 2012 (data from http://www.statcan.gc.ca/pub/89‐654‐x/89‐654‐x2013002‐eng.htm).

Table 12.5 Self‐reported health in Canada in 2012 (data fromhttp://www4.hrsdc.gc.ca/.3ndic.1t.4r@‐eng.jsp?iid=10#M_3).

Chapter 13

Table 13.1 Mutational pressures on complex diseases.

List of Illustrations

Chapter 01

Figure 1.1 Phenomenon of heredity. (a) Mother and Father Beagle dogs stare with dismay at their kittens (joke). (b) A tiny mammalian sperm approaching an egg.

Figure 1.2 The fundamentals of DNA. (a) A single DNA strand with bases A, T, G, and C attached to it. P stands for a phosphate, and dR for a deoxyribose. (b) A scheme of double‐stranded DNA, consisting of two complementary strands (“>” shows the 5' > 3' direction of a strand, and “:” shows weak bounds connecting the two strands to each other). (c) A double‐stranded DNA shown in its actual shape of a right‐handed helix. (d) A double‐stranded DNA shown as a sequence of nucleotides in one of its strands, or a text written in a four‐letter alphabet. For a geneticist, this simplistic representation is usually enough.

Figure 1.3 Semiconservative DNA replication. New strands are shown light.

Figure 1.4 Alignment of short pieces of 12 genotypes (shown in lower case) of the fungus

Schizophyllum commune

, the most genetically variable species known (see Chapter 2). Gaps of length 2 were inserted into genotypes 2 and 9, a gap of length 1 into genotype 6, instead of lost segments of the corresponding lengths. Gaps of length 1 were inserted into all genotypes, except genotype 3, because genotype 3 gained nucleotide g at the corresponding location. Consensus of these genotypes (a piece of

S. commune

genome) is shown in upper case at the bottom. Deviations of genotypes from the genome are shown in grey.

Figure 1.5 Assembly of a long DNA sequence from relatively short overlapping reads. Reads are in lower case, the assembled sequence is in upper case, and the numbers show how many times each nucleotide of the sequence is covered by reads.

Figure 1.6 A normal human genotype. Each individual carries two copies of chromosomes 1–22, called autosomes, one inherited from the mother and the other from the father. A woman also carries two X chromosomes, one inherited from each parent, and a man carries one X chromosome, inherited from the mother, and one Y chromosome, inherited from the father. X and Y chromosomes are called sex chromosomes. Thus, a human genotype normally consists of 46 linear chromosomes, as well as of many small circular mitochondrial chromosomes, which are inherited only from the mother. Here, linear chromosomes are shown after DNA replication and before cell division, so that each of them consists of two identical copies, still joined to each other at one point. Bands on linear chromosomes roughly correspond to how they look under a light microscope. Mitochondrial chromosomes (mt) are too small to be seen in this way, and are shown disproportionally large.

Figure 1.7 A single‐stranded RNA molecule which forms a hairpin secondary structure, due to local self‐complementarity.

Figure 1.8 Scheme of DNA transcription.

Figure 1.9 Two adjacent coding genes, arranged head‐to‐head, use different DNA strands as templates for transcription.

Figure 1.10 Scheme of RNA splicing.

Figure 1.11 (a) An amino acid sequence of a very short protein, human hormone vasopressin, which, among other things, is involved in regulation of blood pressure. (b) A model of the spatial structure of vasopressin. (c) A scheme of the spatial structure of DNA polymerase ε, which plays a key role in DNA replication (see Chapter 5). https://en.wikipedia.org/wiki/Vasopressin#/media/File:Arginine_vasopressin3d.png. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0094835.

Figure 1.12 (a) Translation of a mature mRNA. (b) Relationship between a protein‐coding gene and the encoded protein.

Figure 1.13 Coordination between DNA replication and cell division.

Figure 1.14 (a) Hypothetical one‐step meiosis with crossing‐over. Maternal and paternal genotypes, each consisting of two chromosomes, each made of double‐stranded DNA, are in black and grey, respectively. (b) Segments of maternal and paternal genotypes (left), and segments of two genotypes obtained as a result of their reciprocal recombination (right) (differences between the two genotypes are in upper case).

Figure 1.15 Sexual life cycle. Haploid and diploid phases, separated from each other by meiosis (M) and fertilization (F), are shown by n and 2n. Reduction of a phase is shown by a dotted line.

Figure 1.16 Levels of organization.

Figure 1.17 A human fertilized egg – this is what is transmitted from parents to offspring, not a naked DNA – and a human, or identical twins, have to develop from this entity.

Figure 1.18 (a) Folding of a protein and (b) stages of human development (weeks).

Figure 1.19 The scheme of a small portion of a network of interactions which regulate transcription of protein‐coding genes in yeast,

Saccharomyces cerevisiae

. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0106479.

Figure 1.20 Norm of reaction: different phenotypes can develop on the basis of the same genotype.

Figure 1.21 Genotype > phenotype map, the correspondence between genotypes and phenotypes, under a particular environment.

Chapter 02

Figure 2.1 (a) The expected outcome of a simple cross between parents with two contrasting phenotypes, if inheritance were continuous. (b) The actual outcome of such a cross in the simplest case.

Figure 2.2 The outcome of a simple cross, at the level of phenotypes, with the explanation that Mendel proposed, at the level of genotypes.

Figure 2.3 The same as Figure 2.2, if allele A is dominant and allele a is recessive.

Figure 2.4 A cross between parents which differ from each other by two Mendelian traits, controlled by two unlinked loci. All 16 possible combinations of maternal and paternal gametes, each occurring with probability 1/16, are shown in F

2

, although they represent only nine different genotypes, as long as we do not care if an allele came from the mother or from the father.

Figure 2.5 (a) Analyzing (or test) cross between AaBb (obtained from a AABB × aabb cross, so that alleles A and B, and a and b, were transmitted in the same gamete) and aabb parents produces offspring of four genotypes. The top numbers are proportions of offspring if loci A and B are unlinked, and the bottom numbers are proportions of offspring if loci A and B are linked, with recombination coefficient 0.2. (b) A diagram showing what happens if loci A and B are on the same chromosome, in which case recombinant offspring appear only if these loci recombined, due to an odd number (1, 3, 5, …) of cross‐overs between them.

Figure 2.6 Development of monozygotic twins. Twins in humans can be identical (monozygotic) or fraternal (dizygotic). Monozygotic twins are genetically identical to each other and appear when, at some point during the first 2 weeks after conception, a developing embryo splits into two (or, very rarely, up to five) embryos. Dizygotic twins develop from independently formed zygotes and are not genetically different from ordinary siblings. Comparison of monozygotic and dizygotic twins helps to study the effects of genetic variation on phenotypes (see Chapter 3).

Figure 2.7 Determining which allele is ancestral and which is derived by using an outgroup. At a human polymorphic locus, the allele that is also present in the chimpanzee genome (dark grey) is likely to be ancestral and the one unique to humans (light grey) is probably derived.

Figure 2.8 Small‐scale polymorphic loci (in upper case), revealed by comparison of two aligned genotypes (the one carrying derived alleles is on top): an A > G single‐nucleotide substitution, a deletion of five nucleotides, and insertion of one nucleotide, and a complex polymorphism (CG > A).

Figure 2.9 Large‐scale polymorphic loci (ancestral allele on the left and derived allele on the right): a deletion (top), an insertion (duplication) (middle), and an inversion (bottom). The boundaries of a locus are marked by “|”, and its 5' and 3' ends are in upper case and in italics, respectively.

Figure 2.10 The relationship between protein‐coding genes and polymorphic loci. Small‐scale and large‐scale polymorphic loci are shown on separate lines below the genome.

Figure 2.11 A hypothetical short protein‐coding gene (“|” symbols within the sequence separate its functional parts), which encodes a protein consisting of eight amino acids, and derived alleles at polymorphic loci within it. From top to bottom (deviations are in upper case): a start‐abolishing substitution, a nonsense substitution, a missense substitution, a synonymous substitution, an inframe deletion of three nucleotides, a frameshift deletion of two nucleotides (new stop is in italics), a stop‐abolishing substitution (new stop is in italics), and a splicing‐disturbing insertion.

Figure 2.12 Three inversions within a genome segment, shown in Figure 2.10. The top inversion is unlikely to affect the function of either of the flanking genes, the second one destroys the function of the right gene, and the bottom one destroys the function of both the left and the right gene. Function of the central gene may be unaffected in all three cases.

Figure 2.13 Recessive lethals that are present in natural populations of fish usually cause death before or soon after hatching, and produce drastic morphological changes. The embryo on the right is normal, and the other two carry different homozygous recessive lethal alleles (McCune

et al

.,

Science

, vol. 296, p. 2398, 2002).

Figure 2.14 (a) In classical genetics, we say that allele A that causes purple flower color is dominant because Aa has the same phenotype as AA. (b) In population and medical genetics, allele A that causes achondroplasia is called dominant because Aa causes a phenotype different from wild‐type aa. However, homozygotes AA are lethal, and, thus, are also phenotypically different from Aa, so that neither of these alleles is dominant in the classical, Mendelian sense.

Figure 2.15 Pedigrees showing offspring of different kinds of related parents: (a) one individual acting as both the mother and the father, (b) siblings, and (c) first cousins.

Figure 2.16 The simplest way of exposing recessive drastic alleles, by multiple matings within a sibship.

Figure 2.17 Approximately Mendelian inheritance, in the course of production of gametes, of alleles at two loci within the same gene, which together form a compound heterozygote.

Figure 2.18 Two loss‐of‐function alleles of different genes (top) and of the same gene (bottom), where they form a compound heterozygote. If both alleles are recessive, in the first case neither of them affects the phenotype, because each allele is complemented by a wild‐type allele of the same gene. By contrast, in the second case the compound heterozygote affects the phenotype.

Figure 2.19 (a) Autosomal recessive Mendelian disease alkaptonuria, caused by drastic alleles of the HGD gene, leads, among other things, to pigmented sclera, due to accumulation of homogentisic acid. (b) Autosomal dominant Mendelian disease aniridia, whose most salient manifestation is the absence of the iris, is caused by drastic alleles of the PAX6 gene.

Chapter 03

Figure 3.1 Mendelian (a) and complex (b) inheritance of a quantitative trait which can accept states 0, 1, …, 10. One parent had phenotype 0, and the other parent had phenotype 10. Frequencies of the trait states in F

1

(top) and F

2

(bottom) are shown.

Figure 3.2 A hypothetical pedigree illustrating incomplete penetrance of an autosomal allele. Individuals who carry this allele in a heterozygous state are marked by circles, and the phenotype which appears only in some of them is shown by gray.

Figure 3.3 Gaussian distribution.

σ

stands for standard deviation. The exact properties of this fundamentally important distribution are not important for us. It is enough to remember that Gaussian distribution is symmetric about its mean and that the probability of a particular deviation from the mean declines very rapidly with the size of the deviation.

Figure 3.4 An example of actual, inherited, and transmittable values of a trait of an individual (vertical bars). Curves show distributions of actual values of the trait in multiple individuals having this genotype and in multiple offspring of individuals having this genotype.

Figure 3.5 Distributions of two‐, four‐, and seven‐state traits. (a) Autistic spectrum disorder: absent (0) or present (1) (at frequencies 0.99 and 0.01, respectively). (b) Intellectual disability: absent (0), mild (1), moderate (2), or severe (3) (0.98, 0.012, 0.005, 0.003). (c) Lifetime number of children of a woman in the USA (0.178, 0.184, 0.362, 0.178, 0.061, 0.025, 0.012).

Figure 3.6 Distributions of discrete‐valued quantitative traits which appear when height of adult males in Canada is measured with low (a) and intermediate (b) precision which approximate the essentially Gaussian distribution (c) of this continuous quantitative trait. Data from https://tall.life/height‐percentile‐calculator‐age‐country/.

Figure 3.7 Distributions of actual, inherited, and transmittable value of a two‐state (a) and quantitative (b) complex traits within a population.

Figure 3.8 Evolvability (a) and heritability (b) of transmittable values of a trait.

Figure 3.9 Data on evolvabilities and heritabilities (Hansen

et al

.,

Evolutionary Biology

, vol. 38, p. 258, 2011). Each point represents evolvability and heritability of transmittable values of a quantitative trait. Heritabilities above 1 are statistical artifacts.

Figure 3.10 The line which shows linear dependence on the life‐time fecundity of

Drosophila

individuals on that of their mothers (Long

et al

.,

Journal of Evolutionary Biology

, vol. 22, p. 637, 2009). The slope of this line, the parent–offspring regression coefficient, provides the simplest way to estimate

V

t

.

Figure 3.11 A positive relationship between the proportion of shared alleles and similarity indicates that inheritance contributes to phenotypic variation and can be used to estimate

V

t

.

Figure 3.12 Direct detection of CTLs by a GWAS.

Figure 3.13 Within‐locus epistasis in the case of one diploid locus (a) and between‐loci epistasis in the case of two haploid loci (b). In the case of monotonic epistasis between derived alleles, shown in lower case, two such alleles, either within the same diploid locus, or at different haploid loci, reinforce effects of each other (shown in black). In the case of sign epistasis, derived alleles present alone (genotype Aa in the case of one diploid locus, and genotypes Ab and aB in the case of two haploid loci) and derived alleles present together (aa and ab) affect the trait in the opposite directions (shown in gray).

Figure 3.14 Dependency on the inherited value of a two‐state (shown in gray) and quantitative (shown in black) complex trait on the trait potential. In the first case, the same increase of a trait potential due to a particular allele replacement substantially increases the probability of state 1 only when the trait potential is already high. In the second case, as long as epistasis is absent (as shown in the figure), the relative effect of a particular allele replacement is always the same.

Chapter 04

Figure 4.1 In the course of mutation (process), a mutation (event) results in a mutation (new allele). The altered nucleotide is in upper case. Single‐nucleotide substitution is by far the most common kind of mutation.

Figure 4.2 Depending on the genetic composition of the population, the same mutation, shown on top, produces a new polymorphic locus (left column), a new allele at an already‐present polymorphic locus (center column), and an already‐present allele (right column).

Figure 4.3 Reciprocal recombination between two genotypes that differ from each other at just one locus (in upper case) does not produce anything new.

Figure 4.4 A single‐nucleotide substitution, a deletion, an insertion, and a complex event. Three nucleotides are shown on each side of each mutation.

Figure 4.5 A complex mutation (after Chen

et al

.,

Human Mutation

, vol. 30, p. 1435, 2009). Barred sequences denote deleted nucleotides whereas nucleotide substitutions are indicated below the original sequence.

Figure 4.6 Sequencing of the genotype of a boy with severe congenital abnormalities, together with genotypes of his parents, revealed a paternal mutation that involved a complex rearrangement of 16 segments from chromosomes 1, 4, and 10. The figure shows a simplified scheme of three derived, rearranged chromosomes, present in the boy’s genotype. Multiple short deletions and insertions at junctions between the rearranged segments are not shown (after Kloosterman

et al

.,

Human Molecular Genetics

, vol. 20, p. 1916, 2011).

Figure 4.7 Singleton (a) and cluster (b) germline mutation. Cells carrying the mutation are shown in grey. The moments when mutations occurred are shown by horizontal bars.

Figure 4.8 God does play dice.

Figure 4.9 Structural formulae of two of the most powerful, and widely used, chemical mutagens: (a) ethyl methanesulfonate (EMS); and (b)

N

‐ethyl‐

N

‐nitrosourea (ENU).

Figure 4.10 Genotype editing by means of CRISPR/Cas9 (see text).

Chapter 05

Figure 5.1 A replisome, moving rightward, at the replication fork.

Figure 5.2 A DNA topoisomerase type I allows the two DNA strands to pass through each other, by temporarily cutting one strand, moving the other strand through the cut, and sealing the cut. A DNA topoisomerase type II (not shown) cuts both strands.

Figure 5.3 A crude scheme of processes that take place in the course of DNA replication. Two strands of the original DNA are unwound by topoisomerase and detached from each other by helicase, composite RNA‐DNA primers are synthesized by primase which forms complex with DNA polymerase alpha, only to be soon removed and replaced with “permanent” DNA, leading strand is replicated continuously by DNA polymerase epsilon, and lagging strand is replicated discontinuously by DNA polymerase delta, with the resulting Okazaki fragments being joined by ligase (Lujan

et al

.,

Trends in Cell Biology

, vol. 26, p. 640, 2016).

Figure 5.4 Watson–Crick A:T and G:C nucleotide pairs have very similar overall shapes and, thus, fit well into the double‐stranded DNA molecule, regardless of its sequence.

Figure 5.5 Proofreading 3'‐to‐5' exonuclease activity of a DNA polymerase. A nucleotide which was incorrectly attached to the growing 3'‐end of the nascent DNA strand is usually removed, and another nucleotide is attached.

Figure 5.6 Scheme of mismatch repair. Proteins that perform MMR bind the DNA molecule around a mismatch and replace a segment of one of its two strands.

Figure 5.7 Correct (a) and mutagenic (b) repair of a mismatch (site in upper case) between DNA strands.

Figure 5.8 Contributions of steps 2, 3, and 4 in the overall fidelity of DNA replication (after Fijalkowska,

FEMS Microbiology Reviews

, vol. 36, p. 1105, 2012).

Figure 5.9 After replication of a DNA molecule which contains a mismatch (site in upper case) between an old nucleotide a and a new nucleotide g, two DNA molecules emerge, an unchanged and a mutant. Thus, fertilization of an egg by a mismatch‐carrying sperm can produce a half‐body mosaic (see cover).

Figure 5.10 Three general kinds of hard DNA damages.

Figure 5.11 The major mechanism of interaction between two DNA molecules (maternal and paternal, shown in black and gray) which results in their recombination. (i) A DSB is introduced into one molecule, after which its strands that face this DSB by their 5'‐ends are shortened, producing 3′ single‐stranded ends. (ii) One single‐stranded DNA end invades into the other, intact DNA molecule, by pairing with its (approximately) complementary strand. Double‐stranded DNA that consists of two strands of different origin is called a heteroduplex, and the resulting structure is called a D‐loop. (iii) Gaps and strand discontinuities are repaired by DNA synthesis and ligation. As a result, the two original DNA molecules become connected by two structures, called Holliday junctions (HJs). (iv) Each HJ can be resolved by strand cleavage and ligation in orientation 1 or 2. When the two junctions are resolved in opposite orientations (e.g., left HJ in orientation 1 and right HJ in orientation 2, as shown), crossing‐over occurs. In contrast, when both junctions are resolved in the same orientation (e.g., in orientation 1, as shown), crossing‐over does not occur (after Cromie

et al

.,

Cell

, vol. 127, p. 1167, 2006).

Figure 5.12 Pipeline of the base excision repair which can involve synthesis of a long (left) and a short (right) patch. Newly synthesized DNA is shown in gray.

Figure 5.13 Possible sources of information for gene conversion in a diploid cell: another allele of the same locus, another locus of the same genotype, situated either on the same or on a different chromosome, and another locus of the other genotype.

Figure 5.14 Origin of a long deletion (and a circular piece of DNA, to be lost in the course of the next cell division) and of a long deletion and a long insertion due to ectopic recombination between similar sequence segments within one genotype (left) and in two haploid genotypes of a diploid cell (right). Similar segments are shown by thick line segments.

Chapter 06

Figure 6.1 Detecting

de novo

mutations within a parent–offspring duo (asexual haploids) and a mother–father–offspring trio (sexual diploids). Sites that are polymorphic within parents are in italics, and mutations are in upper case.

Figure 6.2 If hemophilia in an affected boy is due to a

de novo

mutation, his brothers would be mostly unaffected (a). In contrast, if hemophilia is inherited from the mother, who is, therefore, a heterozygous carrier, a brother of an affected boy has 50% chance to also have the disease (b). X and Y chromosomes are shown by long and short rods, and disease‐causing alleles are shown as open circles.

Figure 6.3 Detecting recessive visible

de novo

mutations. A number of wild‐type siblings, produced by wild‐type parents (white) are crossed with individuals homozygous for recessive alleles of one or, better, several genes (black). A singleton mutation that occurred in one of these siblings is manifested as one mutant offspring (gray) from one cross (left). A cluster of mutations is manifested as several mutant offspring from one cross (center). If no mutations occurred in the genes under study, which is by far the most common case, all the offspring in the cross are wild‐type (right). A pre‐existing mutant allele (inherited by some wild‐type siblings from one of their parents) would lead to ~50% mutant offspring from ~50% of crosses (not shown).

Figure 6.4 Detecting recessive lethals on the X chromosome (see text). Balancer chromosome is shown by a loop.

Chapter 07

Figure 7.1 Lateral view of the human brain.

Figure 7.2 Fitness landscapes over the spaces of (a) genotypes or (b) phenotypes.

Figure 7.3 Individuals from cave populations of fish

Astyanax mexicanus

possess only vestigial, functionless eyes or are totally eyeless (a), in contrast to individuals from surface populations (b).

Figure 7.4 Distribution of fitness q(

w

) can be characterized by evolvability of relative fitness

E

 = 0.25/4 = 1/16 and by imperfection

I

 = (5 − 2)/2 = 1.5.

Figure 7.5 Selection which leads to the minimal

I

, under a particular

E

. If, as it is shown in the figure, 40% of genotypes do not reproduce at all and 60% have the same, non‐zero fitness,

I

 = 

E

 = 2/3. Under any other mode of selection that produces the same value of

E

,

I

is higher.

Figure 7.6 A likely relationship between distributions of actual, inherited, and transmittable values of fitness of individuals, p(

w

), q(

w

), and r(

w

). Densities of inherited and transmittable values of fitness, which cannot be observed directly, are shown by broken lines.

Figure 7.7 Life‐time reproductive success of females and of males in a wild population of collared flycatcher (a) and of females in a semi‐wild population of rhesus macaques (b) (after Merilä and Sheldon,

American Naturalist

, vol. 155, p. 301, 2000 and Blomquist,

Evolutionary Ecology

, vol. 24, p. 657, 2010).

Figure 7.8 The frequency of an initially rare beneficial allele a, produced by a mutation, increases and eventually reaches 100% in the course of a positive selection‐driven allele replacement.

Figure 7.9 Phenomenon of hitch‐hiking. Neutral or mildly deleterious alleles located close to the site of the

de novo

mutation that produced a strongly beneficial allele will attain higher frequencies or, in the case of very tight linkage, even fixation. While the beneficial allele goes toward fixation, recombination chips away the edges of the hitch‐hiked genotype segment. From left to right: genetic compositions of the population at the moment when the beneficial allele appeared, half‐way through the process of positive selection‐driven allele replacement, and soon after fixation of the beneficial allele.

Figure 7.10 Generic (a) and fitness potential‐mediated (b) fitness landscapes.

Figure 7.11 Directional, stabilizing, and disruptive fitness landscapes and modes of selection acting on a quantitative trait. A directional landscape is monotonic, a stabilizing landscape has a maximum close to the population mean, and a disruptive landscape has a minimum close to the population mean.

Figure 7.12 Action of truncation directional selection on a quantitative trait. The black and dark gray curves are densities of the quantitative trait after and before selection, respectively. The selection differential Δ is the difference between their mean values.

Figure 7.13 Action of selection (gray fitness landscape) on a quantitative trait (black distribution) produces a distribution of fitness.

Figure 7.14 (a) Real stabilizing selection (dark gray stabilizing fitness landscape) – those with intermediate values of a quantitative trait (black distribution) have higher fitness. (b) Apparent stabilizing selection – those with large numbers of deleterious alleles (black distribution) have lower fitnesses (dark gray directional fitness landscape) and, on average, more deviating values of the quantitative trait (light gray dependency), which results in the same dependence of expected fitness on the value of the quantitative trait as in (a) which, however, does not reflect real causation.

Chapter 08

Figure 8.1 Three trajectories of allele A frequencies in a population of 100 individuals with no mutation or selection. The intial frequency of allele A is 0.5, and it changes due to random drift alone.

Figure 8.2 Alignment of sequences of the same protein‐coding gene in human (top) and murine (bottom) genomes. Correspondence between highly similar genome segments is shown by dark gray shading, and between moderately similar segments by light gray shading.

Chapter 09

Figure 9.1 Three generations of life of an MA line of selfing worms and outcrossing flies.

Figure 9.2 A middle class neighborhood population.

Figure 9.3 Various modes of selection against deleterious alleles. The light gray curve shows nonepistatic, exponential selection. Dark gray curves show different modes of selection with synergistic epistasis between deleterious alleles. The black curve shows a mode of selection with diminishing returns epistasis.

Figure 9.4 Proportions of offspring sired by inbred (black) and outbred (gray) male mice under a semi‐wild environment where males were competing with each other (a) and in the laboratory where they were held separately, under benign conditions (b). In both cases, there were equal numbers of inbred and outbred males. Inbred males were produced by brother–sister mating (

F

 = 0.25) (after Meagher

et al

.,

Proceedings of the National Academy of Sciences of the USA

, vol. 97, p. 3324, 2000).

Figure 9.5 Li–Akashi effect. Mutations impairing the function of a molecule will be fixed, due to too weak opposition from negative selection, until enough of them accumulate to make the coefficient of selection against the next mutation ~1/

N

e

.

Chapter 10

Figure 10.1 Frequency‐dependent fitness landscape. When apple (orange) eaters are rare, they are more fit. At equilibrium, a population that depends on 2 apple trees and 1 orange tree consists of 2/3 of apple‐eaters and 1/3 of orange‐eaters, both having the same fitness.

Figure 10.2 Fundamental, lag, segregation, mutation, and drift imperfection of the present genotype(s), shown by open circle(s). (a) The perfect genotype corresponds to a different fitness peak, which can be reached only after crossing a fitness valley. (b) The perfect (under the current environment) genotype corresponds to the nearest fitness peak, but the current genotypes did not have time to reach it. (c) The perfect genotype is heterozygous, and Mendelian segregation keeps producing imperfect genotypes. (d) Rare substantially deleterious alleles make all the present genotypes imperfect. (e) Fixed slightly deleterious alleles make all the present genotypes imperfect.

Figure 10.3 Production of a deleterious alleles‐free gamete by a diploid genotype carrying many heterozygous deleterious alleles is extremely improbable, as it requires multiple, independent recombination events (thick lines) to occur only at exactly prescribed locations. Deleterious alleles are shown by filled circles and normal alleles by open circles.

Figure 10.4 Four human environments – (a) Pleistocene, (b) Preindustrial, (c) Industrialized, and (d) optimal.

Figure 10.5 The phenomenon of G × E: on a good food (a), a genetically inferior fly (bottom) has only a slightly inferior phenotype, in comparison with a genetically superior fly. However, on a poor food (b) the difference between phenotypes produced by the same two genotypes is much larger.

Chapter 11

Figure 11.1 Estimating the age of a derived allele (shown by open circles) at a locus. (Left) In different genotypes, copies of a young derived allele are still embedded into a long segment of the genotype (bounded by bars) in which the mutation that produced this allele first appeared. (Right) Copies of an old derived allele are embedded only into a short segment of the original genotype. Derived alleles at other loci are shown by black circles.

Figure 11.2 Spectrum of DAFs in genotypes of 10 000 humans (after 1000 Genomes Project Consortium,

Nature

, vol. 526, p. 68, 2015).

Figure 11.3 Average ages of derived alleles from different functional classes in Europeans (shown in black) and Africans (shown in gray) (after Fu

et al

.,

Nature

, vol. 493, p. 216, 2013).

Figure 11.4 Distribution of the per genotype number of missense alleles predicted to be deleterious in human populations (after Fu

et al

.,

American Journal of Human Genetics

, vol. 95, p. 421, 2014).

Figure 11.5 Strengths of associations between alleles at different loci decline with the distance between them more rapidly in African (shown in gray) than in non‐African (shown in black) populations (after 1000 Genomes Project Consortium,

Nature

, vol. 526, p. 68, 2015).

Chapter 12

Figure 12.1 Estimated probability of a person born in Japan in 2014 to live until a particular age (data from http://www.ipss.go.jp/p‐toukei/JMD/00/index‐en.html).

Figure 12.2 Under a stabilizing wellness landscape, imperfection appears only because values of the quantitative trait in some individuals deviate from the average. Thus, a wider distribution of the trait leads to a larger imperfection. In contrast, under (effectively) a directional wellness landscape, imperfection appears because all individuals deviate from the perfect value of the trait in the same direction and, thus, does not depend much on variation of the trait. As a result, the two distributions of the trait shown in the figure produce essentially the same imperfection.

Figure 12.3 Distribution of actual values of a two‐state trait with mean

P

is always concentrated in two points, 0 (“well”) with probability 1 − 

P

, and 1 (“unwell”) with probability

P

(gray bars). Assuming that

P

is small, the variance of this distribution is ~ 

P

.

I

gen

is less than

P

only if the minimal inherited value of the trait is above zero. This minimal value is the highest, under the condition that the variance of the distribution of inherited values of the trait is

Ph

2

i

, when the distribution is concentrated at just two points:

P

(1 − 

h

2

i

) with probability 1 − 

Ph

2

i

, and 1 with probability

Ph

2

i

(black bars).

Figure 12.4 Current understanding of the joint distribution of frequencies and effects of human disease‐causing alleles: alleles with large effects tend to be rarer (after Stahl

et al

.,

Nature Genetics

, vol. 44, p. 483, 2012).

Chapter 13

Figure 13.1 The effect of epistasis on the mutational pressure on a quantitative trait. If a quantitative trait depends on genotype contamination exponentially (no epistasis), going from its distribution 1 to 2 leads to the same relative decline of the average value of the trait as going from distribution 2 to 3. In contrast, if deleterious alleles affect the trait synergistically, its relative decline is larger in the second case. Of course, an exponential dependence is linear on the logarithmic scale.

Figure 13.2 The effect of epistasis on the mutational pressure on a two‐state trait. Going from the distribution of genotype contamination 2 to 3 leads to a larger decline of the probability of the desired state of a two‐state trait than going from 1 to 2.

Chapter 14

Figure 14.1 Replacement of the nuclear genotype of the donor egg with that from the mother’s egg.

Chapter 15

Figure 15.1 Within‐gene incompatibility. Two loci (nucleotide sites) within exons of the same gene each harbor alleles G and C. These sites interact within the RNA molecule encoded by this gene, so that Watson–Crick (see Chapter 1) pairs G:C and C:G confer higher fitness and non‐Watson–Crick pairs G:G and C:C confer lower fitness (crossed out). Thus, G (C) at one locus is superior when the other locus carries C (G).

Figure 15.2 Sign epistasis is an unavoidable result of real stabilizing selection. An allele which increases the value of the trait (black arrows) moves the genotype towards the fitness optimum if other alleles produce a small value of the trait and away from it in the opposite case. Of course, the opposite pattern is present if an allele decreases the value of the trait (gray arrows).

Figure 15.3 When the average genotype contamination is high enough, genotypes with low contamination are absent from the population (black distribution). Thus, we cannot study properties of the corresponding portions of the fitness (or any other quality) landscape (dashed lines), and have to make guesses. Without epistasis, the quality of the mutation‐free genotype is the highest, in comparison with all other genotypes, and is very high; under narrowing, synergistic (but not sign!) epistasis it is the highest, but not very high; and under sign epistasis it is not even the highest.

Figure 15.4 Estimating weak imperfection of the population with some distribution of genotype contamination (solid line) by studying a population with incremented genotype contaminations (broken line). Unfortunately, the magnitude of weak imperfection implied by a particular decline of fitness due to increased genotype contamination depends strongly on epistasis (1, no epistasis; 2, moderate epistasis; and 3, strong epistasis).

Guide

Cover

Table of Contents

Begin Reading

Pages

iv

ix

x

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

Crumbling Genome

The Impact of Deleterious Mutations on Humans

 

Alexey S. Kondrashov

 

 

 

 

 

 

 

 

 

 

This edition first published 2017© 2017 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Alexey S. Kondrashov to be identified as the author of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

The publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on‐going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.

Library of Congress Cataloguing‐in‐Publication data applied for

ISBN: 9781118952115

Cover Image: A mosaic mutant fly strives to reach perfection. Image by Glafira Kolbasova.Cover Design: Wiley

Preface

Before he died, or rather passed into Mahaparinirvana, Buddha gave final instructions to his disciples. Canonical text in Pali records his last sentence as: “vayadhammā saṅkhārā appamādena sampādetha”, which can be translated as “Everything consisting of parts crumbles – but you will succeed through diligence”.

Buddha’s words provide a perfect summary of this book. Our subject is how complex and vulnerable our genes and bodies are, how they crumble under the relentless pressure of spontaneous mutation, and what can be done to mitigate the impact of deleterious mutations on individuals and populations.

Indeed, we are amazingly complex. The human genome consists of DNA molecules with a combined length of over 3 billion nucleotides and contains over 20 000 protein‐coding genes, each encoding either one protein or a set of similar proteins. These genes and proteins interact with each other, forming staggering networks of mutual influences. The human body, which develops from a single cell on the basis of instructions provided by the genome, comprises many billions of interconnected cells. Verily, we consist of parts.

Thus, it is hardly surprising that we are very vulnerable to all kinds of insults. A mutation that replaces a single nucleotide within a protein‐coding gene may lead to synthesis of a dysfunctional protein, and this can be inconsistent with life or cause a tragic disease. Chaos is common, but order is rare, and chances of a typo improving Hamlet are slim. Thus, deleterious mutations vastly outnumber beneficial mutations.

Far from being a moot point, vulnerability of our genome is constantly exposed by mutation. Despite of the all elaborate mechanisms that a cell employs to handle its DNA with the utmost care, a newborn human carries about 100 new (de novo) mutations, originated in the germline of their parents, about 10% of which are substantially deleterious. Several percent of even young people suffer from overt diseases that are caused, exclusively or primarily, by pre‐existing and de novo mutations in their genomes, including both a wide variety of genetically simple Mendelian diseases and even more diverse complex diseases such as birth anomalies, diabetes, and schizophrenia. Milder, but still substantial, negative effects of mutations are harder to detect, but are even more pervasive. As of now, we possess no means of reducing the rate at which mutations appear spontaneously.

In the course of the long evolution of life, natural selection eliminated deleterious mutations with cruel efficiency, establishing what is known as the mutation–selection equilibrium. Still, even genotypes that exist at this equilibrium are far from being perfect. In the average human genotype, over 100 genes are dysfunctional or altogether missing, and over 1000 genes are substantially impaired.

Moreover, the Industrial Revolution radically improved the environments of many human populations. Although it is not yet clear how strong selection against deleterious mutations is in developed countries, it can hardly remain as comprehensive as it used to be when a newborn was expected to live only 25 years. If so, populations in these countries cannot be at the mutation–selection equilibrium and are accumulating deleterious mutations.

After the basic laws of heredity were discovered in the early 20th century, a number of geneticists, including Hermann J. Muller and James F. Crow, developed a conceptual framework for studying negative effects of deleterious mutations on individual humans and on human populations. Throughout this book, I will refer to these effects as the “Main Concern”. It is clear that both pre‐existing mutations, which appeared some generations ago and have not yet been purged from the population, and de novo mutations, which are happening now, do a lot of harm. However, until very recently, estimates of the scale of this problem were rather fuzzy.

The ongoing avalanche of genome‐level data is rapidly changing this situation. Now we know much more than even 10 years ago about the current contribution of deleterious mutations to human suffering and imperfection, and about potential consequences of accumulation of such mutations in the future. Substantial uncertainties remain, but the post‐genomic biology will likely eliminate many of them in the next 20–30 years.

My wish is to enable a motivated lay person to thoroughly comprehend the Main Concern. With this goal in mind, I provide about equal coverage to concepts, data, and unsolved problems in this book. This book is intended for three types of readers: those who are simply fascinated by the life sciences and want to learn more about genetics and evolution; those who are worried about challenges that deleterious mutations pose to humanity; and those who are personally touched by a disease with a large genetic component and need a background for understanding it.

Only a general scientific literacy is expected from a reader, and the necessary biological basics are presented in the first three chapters. After this, three chapters deal with different facets of mutation, and the next four with different facets of selection against mutations. The last five chapters directly address the Main Concern. I tried to follow the sage advice commonly attributed to Einstein: “Everything should be made as simple as possible, but not simpler”. You will succeed through diligence.

I would like to thank Raquel Assis, Charlie Baer, Laura Eidietis, Eugene Koonin, Michael Lynch, Sergey Mirkin, Vladimir Seplyarsky, Mashaal Sohail and Shamil Sunyaev whose comments on the manuscript have spared me a number of embarrassments. I am most thankful to Glafira Kolbasova for producing almost all illustrations.

Alexey S. KondrashovUniversity of Michigan

1Genotypes and Phenotypes

I grew my own body… Nobody else did it for me. Soif I grew it, I must have known how to grow it.

Jerome D. Salinger, Teddy, 1953.

Hereditary information of a cell is stored in double‐stranded DNA molecules which, for a geneticist, are texts written in a four‐letter alphabet {A, T, G, C}. Recent technological advances have made it possible to efficiently read long DNA texts. Before dividing, a cell produces two copies of its hereditary information by DNA replication. From each parent, an individual human receives a haploid genotype, a 3.2‐billion‐letter‐long DNA text which is subdivided into 23 molecules, chromosomes. Also, tiny mitochondrial chromosomes are received from the mother only. The consensus of genotypes within a species is called the genome of the species. Some segments of the genome are transcribed into RNAs, and some segments of these RNAs are translated into proteins. Parental genotypes recombine in the course of meiosis, a form of cell division which halves the amount of DNA in a cell. In the course of sexual reproduction, a multicellular diploid organism develops from a single cell, zygote. Together with the environment, the genotype of the zygote determines the phenotype of the organism.

1.1 DNA is a Text

Inheritance is a salient phenomenon. Dogs beget dogs, and Beagles beget Beagles. Because an instruction on how to make a unique individual Beagle dog must be a lengthy one, there must be something, transmitted from parents to offspring, which carries an enormous amount of information. In the course of sexual reproduction, a multicellular organism develops from a single cell, called a zygote, which is produced by fertilization, fusion of two gametes, an egg and a sperm. A mammalian egg is just ~0.2 mm in diameter, and the head of a mammalian sperm is still much smaller, being only ~0.005 mm across. And yet even this tiny sperm is large enough for the job: mammals are about as similar to their fathers as to their mothers (Figure 1.1).

Figure 1.1 Phenomenon of heredity. (a) Mother and Father Beagle dogs stare with dismay at their kittens (joke). (b) A tiny mammalian sperm approaching an egg.

Thus, hereditary information must be packed at the molecular scale. Which molecule carries instructions from parents to offspring? In 1944, DNA emerged as a likely answer when it was shown that bacteria may acquire traits of other bacteria after ingesting their DNA. Then, in 1953, came the discovery, by Raymond Gosling, Rosalind Franklin, Maurice Wilkins, James Watson, and Francis Crick, of three properties of DNA which make it exquisitely suitable for storing and propagating information (Figure 1.2).

Figure 1.2 The fundamentals of DNA. (a) A single DNA strand with bases A, T, G, and C attached to it. P stands for a phosphate, and dR for a deoxyribose. (b) A scheme of double‐stranded DNA, consisting of two complementary strands (“>” shows the 5' > 3' direction of a strand, and “:” shows weak bounds connecting the two strands to each other). (c) A double‐stranded DNA shown in its actual shape of a right‐handed helix. (d) A double‐stranded DNA shown as a sequence of nucleotides in one of its strands, or a text written in a four‐letter alphabet. For a geneticist, this simplistic representation is usually enough.

First, DNA is a linear polymer. A strand of DNA consists of a regular alternation of phosphates (residues of phosphoric acid, H3PO4) and sugars called deoxyribose, to each of which one of the four possible bases (known as A, T, G, and C, for adenine, thymine, guanine, and cytosine) is attached at a side. A and G are bigger molecules, called purines, and T and C are smaller molecules, called pyrimidines. The only thing we care about in deoxyribose is that its two ends, by which it is attached to two adjacent phosphates, are different, and one is called 5' and the other 3' (both “ends” of a phosphate are identical). All deoxyriboses within a DNA strand have the same orientation, which provides direction to the whole strand. Traditionally, a single DNA strand is shown in the 5' > 3' direction. Together, a phosphate, a deoxyribose, and a base are called a nucleotide, and nucleotides are denoted by the same letters as bases. Chemical details are not too important for a geneticist: I do not remember the exact structures of A, T, G, or C.

Secondly, there is no firm limit on the length of a DNA strand, which can consist of hundreds of millions of nucleotides. Moreover, there are also no restrictions on the order of nucleotides within a strand. Thus, if we think of a sequence of nucleotides as a text, written in a four‐letter alphabet {A, T, G, C}, any message (such as ACCATCATCGATGACT…) is chemically possible. A four‐letter alphabet is perfectly sufficient to store information: computers are content with a two‐letter alphabet {0, 1}, and a 26‐letter English alphabet is just a luxury.

Thirdly, two DNA strands can nicely fit each other, if (i) they are arranged side‐by‐side in the opposite (antiparallel) directions, 5' end to 3' end and vice versa, and (ii) their nucleotide sequences are complementary to each other. Complementarity means that A’s in one strand are opposed by T’s in the other (and vice versa), and G’s are opposed by C’s (and vice versa). Overall shapes of A:T and G:C nucleotide pairs (referred to as complementary, or Watson–Crick pairs) are very similar, leading to a nearly perfect fit of strands with complementary sequences (called complementary strands). Within a Watson–Crick nucleotide pair, A (large) and T (small), or G (large) and C (small), are attached to each other by weak chemical bonds, known as hydrogen bonds (two of them in A:T pairs, and three in G:C pairs). DNA in all cells exists not as single‐stranded but as double‐stranded molecules, consisting of two complementary strands. Each strand in such a molecule carries all the information, because the other strand can always be reconstructed, using the simple rules of complementarity. Two complementary strands coil around each other, forming the famous DNA double helix. This helix is conventionally called right‐handed: when you look at either end of it, and think of a bright dot moving away from you along a strand, this dot rotates clockwise, and not counter clockwise, as it would be the case for a left‐handed helix.

Hereditary information which parents pass to their offspring consists primarily of DNA sequences. Because all living beings must propagate their kind, at pain of extinction, DNA molecules must also be somehow propagated, to be supplied to each offspring. Complementarity of DNA strands suggests a mechanism for this: a new DNA strand can be synthesized as a complementary one to the pre‐existing template strand. In 1958 Matthew Meselson and Franklin Stahl showed that this is what, indeed, happens in living cells: after a cell division, each of the two daughter cells contains DNA molecules in which one strand is old and the other strand is new. Accordingly, propagation of DNA is called semiconservative replication (Figure 1.3). One can say that DNA is double‐stranded because, after its replication, the old, template and the new, complementary strand stay together until the next replication, when both of them will act as templates.

Figure 1.3 Semiconservative DNA replication. New strands are shown light.

DNA is the most striking manifestation of the fundamental unity of life on Earth. All cells contain double‐stranded DNA molecules built according to the same chemical rules, with only occasional minor secondary modifications. Moreover, some segments of DNA from all kinds of living beings, including bacteria, protists, plants, and animals, have rather similar sequences. This is the case, for example, for many DNA segments which carry instructions on how to make ribosomes, protein‐synthesizing molecular machines.

Of course, there are also a lot of differences between DNA sequences from individuals that belong to different species. These differences are responsible for one individual being a dog and another one being the dog’s owner. Even different individuals of the same species possess slightly different DNA sequences. These differences are responsible for one dog being a Beagle and another a Collie, as well as for hereditary variation within a breed. The DNA sequence present in every cell of an individual is called its genotype (there could be minor differences even between DNA sequences from different cells of an individual, see Chapters 4 and 5).