E-Book
52,99 €

Population Genetics E-Book

Matthew B. Hamilton

2,2

52,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

This book aims to make population genetics approachable, logical and easily understood. To achieve these goals, the book's design emphasizes well explained introductions to key principles and predictions. These are augmented with case studies as well as illustrations along with introductions to classical hypotheses and debates. Pedagogical features in the text include: * Interact boxes that guide readers step-by-step through computer simulations using public domain software. * Math boxes that fully explain mathematical derivations. * Methods boxes that give insight into the use of actual genetic data. * Numerous Problem boxes are integrated into the text to reinforce concepts as they are encountered. * Dedicated website at href="http://www.wiley.com/go/hamiltongenetics">www.wiley.com/go/hamiltongenetics This text also offers a highly accessible introduction to coalescent theory, the major conceptual advance in population genetics of the last two decades.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 1188

Veröffentlichungsjahr: 2011

Bewertungen

2,2 (18 Bewertungen)

Rezensionen(0 Rezensionen)

Leseprobe

Contents

Preface

CHAPTER 1: Thinking like a population geneticist

1.1 Expectations

1.2 Theory and assumptions

1.3 Simulation

CHAPTER 2: Genotype frequencies

2.1 Mendel’s model of particulate genetics

2.2 Hardy–Weinberg expected genotype frequencies

2.3 Why does Hardy–Weinberg work?

2.4 Applications of Hardy–Weinberg

2.5 The fixation index and heterozygosity

2.6 Mating among relatives

2.7 Gametic disequilibrium

CHAPTER 3: Genetic drift and effective population size

3.1 The effects of sampling lead to genetic drift

3.2 Models of genetic drift

3.3 Effective population size

3.4 Parallelism between drift and inbreeding

3.5 Estimating effective population size

3.6 Gene genealogies and the coalescent model

3.7 Effective population size in the coalescent model

CHAPTER 4: Population structure and gene flow

4.1 Genetic populations

4.2 Direct measures of gene flow

4.3 Fixation indices to measure the pattern of population subdivision

4.4 Population subdivision and the Wahlund effect

4.5 Models of population structure

4.6 The impact of population structure on genealogical branching

CHAPTER 5: Mutation

5.1 The source of all genetic variation

5.2 The fate of a new mutation

5.3 Mutation models

5.4 The influence of mutation on allele frequency and autozygosity

5.5 The coalescent model with mutation

CHAPTER 6: Fundamentals of natural selection

6.1 Natural selection

6.2 General results for natural selection on a diallelic locus

6.3 How natural selection works to increase average fitness

CHAPTER 7: Further models of natural selection

7.1 Viability selection with three alleles or two loci

7.2 Alternative models of natural selection

7.3 Combining natural selection with other processes

7.4 Natural selection in genealogical branching models

CHAPTER 8: Molecular evolution

8.1 The neutral theory

8.2 Measures of divergence and polymorphism

8.3 DNA sequence divergence and the molecular clock

8.4 Testing the molecular clock hypothesis and explanations for rate variation in molecular evolution

8.5 Testing the neutral theory null model of DNA sequence evolution

8.6 Molecular evolution of loci that are not independent

CHAPTER 9: Quantitative trait variation and evolution

9.1 Quantitative traits

Components of phenotypic variation

9.2 Evolutionary change in quantitative traits

9.3 Quantitative trait loci (QTL)

CHAPTER 10: The Mendelian basis of quantitative trait variation

10.1 The connection between particulate inheritance and quantitative trait variation

10.2 Mean genotypic value in a population

10.3 Average effect of an allele

10.4 Breeding value and dominance deviation

10.5 Components of total genotypic variance

10.6 Genotypic resemblance between relatives

CHAPTER 11: Historical and synthetic topics

11.1 Historical controversies in population genetics

11.2 Shifting balance theory

Appendix

Statistical uncertainty

Covariance and correlation

Further reading

List Of Figures

References

Index

For my wife and best friend, I-Ling.

Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientific, Technical and Medical business to form Wiley-Blackwell.

Registered office: John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloguing-in-Publication Data

Hamilton, Matthew B.

Population genetics / Matthew B. Hamilton.

p. ; cm.

Includes bibliographical references and index.

ISBN 978-1-4051-3277-0 (hbk. : alk. paper) 1. Population genetics. I. Title.

[DNLM: 1. Genetics, Population. QU 450 H219p 2009]

QH455.H35 2009

576.5′8—dc22

Preface and acknowledgments

This book was born of two desires, one simple and the other more ambitious, both of which were motivated by my experiences learning and teaching population genetics. My first desire was to create a more up-to-date survey text of the field of population genetics. Several of the widely employed and respected standard texts were originally conceived in the mid-1980s. Although these texts have been revised over time, aspects of their organization and content are inherently dated. At the same time, I set out with the more ambitious goal of offering an alternative body of materials to enrich the manner in which population genetics is taught and learned.

Much of population genetics during the twentieth century was hypothesis-rich but data-poor. The theory developed between about 1920 and 1980 spawned manifold predictions about basic evolutionary processes. However, most of these predictions could not be tested or tested with only very limited power for lack of appropriate or sufficient genetic data. In the last two decades, population genetics has become a field that is no longer data-limited. With the collection and open sharing of massive amounts of genomic data and the technical ability to collect large amounts of genetic information rapidly from almost any organism, population genetics has now become data-rich but relatively hypothesis-poor. Why? Because mainstream population genetics has struggled to develop and employ alternative testable hypotheses in addition to those offered by traditional null models. Innovation in developing context specific and testable alternative population genetic models is as much a requirement for hypothesis testing as empirical data. Such innovation, of course, first requires a sound understanding of the traditional and well-accepted models and hypotheses.

It is often repeated that the major advance in population genetics over the last decade or two is the availability of huge amounts of genetic data generated by the ability to collect genetic data and to sequence entire genomes. It is certainly true that advances in molecular biology, DNA sequencing technology, and bioinformatics have provided a wealth of genetic data, some of it in the form of divergence or polymorphism data that is grist for the mill of population genetics hypothesis testing.

An equally fundamental advance in population genetics has been the emergence of new models and expectations to match the genetic data that are now readily available. Coalescent or genealogical branching theory is primary among these conceptual advances. During the past two decades, coalescent theory has moved from an esoteric problem pursued for purely mathematical reasons to an important conceptual tool used to make testable predictions. Nonetheless, teaching of coalescent theory in undergraduate and graduate population genetics courses has not kept pace with the growing influence of coalescent theory in hypothesis testing. A major impediment has been the lack of teaching materials that make coalescent theory truly accessible to students learning population genetics for the first time. One of my goals was to construct a text that met this need with a systematic and thorough introduction to the concepts of coalescent theory and its applications in hypothesis testing. The chapter sections on coalescent theory are presented along with traditional theory of identity by descent on the same topics to help students see the commonality of the two approaches. However, the coalescence chapter sections could easily be assigned as a group.

Another of my primary goals for this text was to offer material to engage the various learning styles possessed by individuals. Learning conceptual population genetics in the language of mathematics is often relatively easy for abstract and mathematical learners. However, my aim was to cater to a wide range of learning styles by building a range of features into the text. A key pedagogical feature in the book is formed by boxes set off from the main text that are designed to engage the various learning styles. These include Interact boxes that guide students through structured exercises in computer simulation utilizing software in the public domain. The simulation problems are active rather than reflective and should appeal to trial-and-error or visual learners. Additionally, simulations uniquely demonstrate the outcome of stochastic processes where the evaluation of numerous replicates is required before a pattern or generalization can be seen. Because understanding the biological impact of stochastic processes is a major hurdle for many students, the Interact boxes should improve learning and retention. Problem boxes placed in the text rather than at the end of chapters are designed to provide practice and to reinforce concepts as they are encountered, appealing to experiential learners. Math boxes that fully explain mathematical derivations appeal to mathematical and logical learners and also provide a great deal of insight for all readers into the many mathematical approximations employed in population genetics. Finally, the large number of two-color illustrations in the text were designed to appeal to and help cultivate visual learning.

The teaching strategy employed in this text to cope with mathematics proficiency deserves further explanation. The undergraduate biology curricula employed at most US institutions has students take calculus in their first year and usually does not require the application of much if any mathematics within biology courses. This leads to students who have difficulty in or who avoid courses in biological disciplines that require explicit mathematical reasoning. Population genetics is built on basic mathematics and, in my experience, students obtain a much richer and nuanced understanding of the subject with some comprehension of these mathematical foundations. Therefore, I have attempted to deconstruct and offer step-by-step explanations the basic mathematics (mostly probability) required for a sound understanding of population genetics. For those readers with more interest or facility in mathematics, such as graduate students, the book also presents more difficult and detailed mathematical derivations in boxes that are separated from the main narrative of the text as well as chapter sections containing more mathematically rigorous content. These sections can be assigned or skipped depending on the level and scope of a course using this text. The Appendix further provides some very basic background in statistical concepts that are useful throughout the book and especially in Chapter 3 on genetic drift and Chapters 9 and 10 on quantitative genetics. This approach will hopefully provide students with the tools to develop their abilities in basic mathematics through application, and at the same time learn population genetics more fully.

Members of my laboratory and the students who have taken my population genetics course provided a range of feedback on chapter drafts, figures, and effective means to explain the concepts herein. This feedback was absolutely invaluable and helped me shape the text into a more useful and usable resource for students. James Crow graciously reviewed each chapter and offered many insightful comments on points both nuanced and technical. Rachel Adams, Genevieve Croft, and Paulo Nuin provided many useful comments on each of the chapters as I wrote them. A.W.F. Edwards reviewed the material on the fundamental theorem in Chapter 6 and also provided the photograph of R.A. Fisher. Sivan Rottenstreich and Judy Miller patiently helped me with numerous mathematical points and derivations, including material included in the Math boxes. John Braverman supplied me with insights and thought provoking discussions that contributed to this book. Ronda Rolfes and Martha Weiss also provided comments and suggestions. I also thank Paulo Nuin for his collaboration and hard work on the creation of PopGene.S2. I also thank the anonymous reviewers from Aberdeen University, Arkansas State University, Cambridge University, Michigan State University, University of North Carolina, and University of Nottingham who provided feedback on some or all of the draft chapters.

John Epifanio provided the allozyme gel picture in Chapter 2. Eric Delwart provided the original data used to draw a figure in Chapter 6. Michel Veuille shared information on Drosophila simulans DNA sequences used in an Interact box in Chapter 8. Peter Armbruster shared unpublished mosquito pupal mass data used in Chapter 9. John Dudley and Stephen Moose generously shared the Illinois Long- Term Selection experiment data used in Chapter 9. Robert J. Robbins kindly provided high-resolution scans from Sewall Wright’s Chapter in an original copy of the Proceedings of the Sixth International Congress of Genetics (see www.esp.org).

I am grateful to Nancy Wilton for pushing me at the right times and for getting this project off the ground initially. Elizabeth Frank, Haze Humbert, and Karen Chambers of Wiley-Blackwell helped bring this book to fruition. I thank Nik Prowse for his expertise as a copy editor. I owe everyone at the Mathworks an enormous debt of gratitude since all of the simulations and many of the figures for this text were produced using Matlab.

Matthew B. Hamilton

September 2008

Interact boxes

A companion website is available with interactive computer simulations for each chapter at: www.wiley.com/go/hamiltongenetics

CHAPTER 1: Thinking like a population geneticist

All scientific fields possess a body of concepts that define their domain as well as a specialized vocabulary used to express these concepts precisely. Population genetics is no different and the entirety of this book is designed to introduce, explain, and demonstrate these concepts and their vocabulary. What may be unique about population genetics among the natural sciences is the way that its practitioners approach questions about the biological world. Population genetics is a dialog between predictions based on principles of Mendelian inheritance and results obtained from empirical measurement of genotype and allele frequencies that form the basis of hypothesis tests. Idealized predictions stemming from general principles form the basis of hypotheses that can be tested. At the same time, empirical patterns observed within and among populations require explanation through the comparison of various processes that might have caused a pattern. This first chapter will explore some of the ways in which population genetics approaches and defines problems that are relevant to the topics in all chapters. The chapter is also intended to give some insight into how to approach the study of population genetics.

1.1 Expectations

What do we expect to happen?Expectations are the basis of understanding cause and effect.

In our everyday lives there are many things that we expect to occur or not to occur based on knowledge of our surroundings and past experience. For example, you probably do not expect to get hit by a meteorite walking to your next population genetics class. Why not? Meteorites do impact the surface of the Earth and on occasion strike something noticeable to people nearby. A few times in the distant past, in fact, large meteors have hit the Earth and left evidence like the Barringer Meteor crater in Arizona, USA. What influences your lack of concern? It is probably a combination of basic knowledge of the principles of physics that apply to meteors as well as your empirical observations of the frequency and location of meteor strikes. Basic physics tells us that a small meteor on a collision course with Earth is unlikely to hit the surface since most objects burn up from the friction they experience traveling through the Earth’s atmosphere. You might also reason that even if the object is big enough to pass through the atmosphere intact, and there are many fewer of these, then the Earth is a large place and just by chance the impact is unlikely to be even remotely near you. Finally, you have most probably never witnessed a large meteorite impact or even heard of one occurring during your lifetime. You have combined your knowledge of the physical world and your experience to arrive (perhaps unconsciously) at a prediction or an expectation: meteorite strikes are possible but are so infrequent that the risk of being struck on the way to class is minuscule. In this very same way, you have constructed models of many events and processes in your physical and social world and used the resulting predictions to make comparisons and decisions.

Expectation The expected value of a random variable, especially the average; a prediction or forecast.

The study of population genetics similarly revolves around constructing and testing expectations for genetic variation in populations of individual organisms. Expectations attempt to predict things like how much genetic variation is present in a population, how genetic variation in a population changes over time, and the pattern of genetic variation that might be left behind by a given biological process that acts over time or through space. Building these expectations involves the use of first principles or the set of very basic rules and assumptions that define how natural systems work at their lowest, most basic levels. A first principle in physics is the force of gravity. In population genetics, first principles are the very basic mechanisms of Mendelian particulate inheritance and processes such as mutation, mating patterns, gene flow, and natural selection that increase, decrease, and shape genetic variation. These foundational rules and processes are used and combined in population genetics with the ultimate goal of building a comprehensive set of predictions that can be applied to any species and any genetic system.

Empirical study in population genetics also plays a central role in constructing and evaluating predictions. In population genetics as in all sciences, empirical evidence is not just from informal experiences, but is drawn from intentional observations, cleverly constructed comparisons, and experiments. Genetic patterns observed in actual populations are compared with expected patterns to test models constructed using general principles and assumptions. For example, we could construct a mathematical or computer simulation model of random genetic drift (change in allele frequency due to sampling from finite populations) based on abstract principles of sampling from a finite population and biological reproduction. We could then compare the predictions of such a model to the observed change in allele frequency through time in a laboratory population of Drosophila melanogaster (fruit flies). If the change in allele frequency in the fruit fly population matched the change in allele frequency predicted using the model of genetic drift, then we could conclude that the model effectively summarizes the biological sampling processes that take place in fruit fly populations.

It is also possible to use well-tested and accepted model expectations as a basis to hypothesize what processes caused an observed pattern in a biological population. Again to use a Drosophila population as an example, we might ask whether an observed change in allele frequency over some generations in a wild population could be explained by genetic drift. If the observed allele frequency change is within the range of the predicted change in allele frequencies based on a model of genetic drift, then we have identified a possible cause of the observed pattern. Comparing expected and observed genetic patterns in populations often requires modifications to existing models or the construction of novel models in order to develop appropriate expectations. For example, a model of genetic drift constructed for Drosophila might naturally assume that all individuals in the population are diploid (individuals possess paired sets of homologous chromosomes). If we wanted to use that same model to predict genetic drift in a population of honey bees, we would have to account for the fact that in honey bee males are haploid (individuals possess single copies of each chromosome) while females are diploid. This change in reproductive biology could be taken into account by altering the assumptions of the model of genetic drift to make the prediction appropriate for honey bee populations. Note that without some modification, a single model of genetic drift would not accurately predict allele frequencies over time in both fruit flies and honey bees since their patterns of reproduction and chromosomal inheritance are different.

Parameters and parameter estimates

While developing the expectations of population genetics in this book, we will most often be working with idealized quantities. For example, allele frequency in a population is a fundamental quantity. For a genetic locus with two alleles, A and a, it is common to say that p equals the frequency of the A allele and q equals the frequency of the a allele. In mathematics, parameter is another term for an idealized quantity like an allele frequency. It is assumed that parameters have an exact value. Put another way, parameters are idealized quantities where the messy, real-life details of how to measure the quantities they represent are completely ignored.

Empirical population genetics measures quantities such as allele frequencies to give parameter estimates by sampling and then measuring the alleles and genotypes present in actual populations. All experiments, observations, and even simulations in population genetics produce parameter estimates of some sort. There is a subtle notational convention used to indicate an estimate, the hat or ˆ character above a variable. Estimates wear hats whereas parameters do not. Using allele frequency as an example, we would say (pronounced “p hat”) equals the number of A alleles sampled divided by the total number of alleles sampled. Intuitively, we can see from the denominator in the expression for that the allele frequency estimate will depend on the sample we gather to make the estimate.

In all populations a parameter has one true value. For the allele frequency p, knowing this true value would require examining the genotype of every individual and counting all A and a alleles to determine their frequency in the population. This task is impractical or impossible in most cases. Instead, we rely on an estimate of allele frequency, , obtained from a sample of individuals from the population. Sampling leads to some uncertainty in parameter estimates because repeating the sampling and parameter estimate process would likely lead to a somewhat different parameter estimate each time. Quantifying this uncertainty is important to determine whether repeated sampling might change a parameter estimate by just a little or change it by a lot. When dealing with parameters, we might expect that exactly if there are only two alleles with allele frequencies p and q. However, if we are dealing with estimates we might say the two allele frequency estimates should sum to approximately one since each allele frequency is estimated with some error. The more uncertain the estimates of and , the less we should be surprised to find that their sum does not equal the expected value of one.

Parameter A variable or constant appearing in a mathematical expression; a value (usually unknown) used to represent a certain population characteristic; any factor that defines a system and determines or limits its performance.

Estimate An indication of the value of an unknown quantity based on observed data; an approximation of a true score, parameter, or value; a statistical estimate of the value of a parameter.

It could be said that statistics sits at the intersection of theoretical and empirical population genetics. Parameters and parameter estimates are fundamentally different things. Estimation requires effort to understand sampling variation and quantify sources of error and bias in samples and estimates. The distinction between parameters and estimates is critical when comparing actual populations with expectations to test hypotheses. When large, random samples can be taken, estimates are likely to have minimal error. However, there are many cases where estimates have a great deal of uncertainty, which limits the ability to evaluate expectations. There are also instances where very different processes may produce very similar expected results. In such cases it may be difficult or impossible to distinguish the different potential causes of a pattern due to the approximate nature of estimates. While this book focuses mostly on parameters, it is useful to bear in mind that testing or comparing expectations requires the use of parameter estimates and statistics that quantify sampling error. The Appendix provides a review of some basic statistics that are used in the text.

Inductive and deductive reasoning

Population genetics employs both inductive and deductive reasoning in an effort to understand the biological processes operating in actual populations as well as to elucidate the general processes that cause population genetic phenomena. The inductive approach to population genetics involves assembling measures of genetic variation (parameter estimates) from various populations to build up evidence that can be used to identify the underlying processes that produced the observed patterns. This approach is logically identical to that used by Isaac Newton, who used knowledge of how objects fall to the surface of the Earth as well as knowledge of the movement of planets to arrive at the general principles of gravity. Application of inductive reasoning requires detailed familiarity with the various empirical data types in population genetics, such as DNA sequences, along with the results of studies that report observed patterns of genetic variation. From this accumulated empirical information it is then possible to draw more general conclusions about the qualities and quantities of genetic variation in populations. Model organisms like D. melanogaster and Arabidopsis thaliana play a large role in population genetic conclusions reached by inductive reasoning. Because model organisms receive a large amount of scientific effort, to completely sequence their genomes for example, a great deal of available genetic data are accumulated for these species. Based on this evidence, many firm conclusions have been made about the population genetics of particular model species. Although model organisms provide very rich sources of empirical information, the number of species is limited by definition so that any generalizations may not apply universally to all species.

Deductive reasoning Using general principles to reach conclusions about specific instances.

Inductive reasoning Utilizing the knowledge of specific instances or cases to arrive at general principles.

The study of population genetics can also be approached using deductive reasoning. The actions of general processes such as genetic drift, mutation, and natural selection are represented by parameters in the mathematical equations that make up population genetic models. These models can then be used to make predictions about the quantity of genetic variation and patterns of genetic variation in space and time. Such population genetic models make general predictions about things like rates of change in allele frequency, the eventual equilibrium of allele or genotype frequencies, and the net outcome of several processes operating at the same time. These predictions are very general in that they apply to any population of any species since the predictions arose from general principles in the first place. At the same time, such general predictions may not be directly applicable to a specific population because the general principles and assumptions used to make the prediction are not specific enough to match an actual population.

Historically, the field of population genetics has developed from an interplay between arguments and evidence developed using both inductive and deductive reasoning approaches. Nonetheless, most of the major ideas in population genetics can be first approached with deductive reasoning by learning and understanding the expectations that arise from the principles of Mendelian heredity. This book stresses the process of deductive reasoning to arrive at these fundamental predictions. Empirical evidence related to expectations is included to illustrate predictions and also to demonstrate hypothesis tests that result from expectations. Because the body of empirical results in population genetics is very large, readers should resist the temptation to generalize too much from the limited number of empirical studies that are presented. Detailed reviews of particular areas of population genetics, many of which are cited in the Further reading sections at the end of each chapter, are a better source for comprehensive summaries of empirical studies.

In the next chapter we will start by building expectations for the frequencies of diploid genotypes based on the foundation of particulate inheritance: that alleles are passed unaltered from parents to offspring. There is ample support for particulate inheritance both from molecular biology, which identifies DNA as the hereditary molecule, and from allele and genotype frequencies that can be observed in actual populations. The general principle of particulate inheritance has been used to formulate a wide array of expectations about allele and genotype frequencies in populations.

1.2 Theory and assumptions

What is a theory and what are assumptions?How can theories be useful with so many assumptions?

In colloquial usage, the word theory refers to something that is known with uncertainty, or a quantity that is approximate. On a day you are running late leaving work you might say, “In theory, I am supposed to be home at 6:00 pm.” In science, theory has a very different meaning. Theory is the accumulation of expectations and observations that have withstood tests and critical scrutiny and are accepted by at least some practitioners of a scientific field. Theory is the collection of all of the expectations developed for specific cases or individual biological processes that together form a more comprehensive set of general principles. The combination of Darwin’s hypothesis of natural selection with the laws of Mendelian particulate inheritance is often called the modern synthesis of evolutionary biology since it is a comprehensive theory to explain the causes of evolutionary change. The modern synthesis can offer causal explanations for biological phenomena ranging from antibiotic resistance in bacteria to the behavior of elephants to the rate of DNA sequence change as well as make predictions to guide animal and plant breeders. In all of the modern synthesis, population genetics plays a central role.

It is common for the uninitiated to ask the question “what good is theory if it is based on so many assumptions?” A body of theory is a useful tool to articulate assumptions and generate testable predictions. Theory that generates many testable predictions about the world also offers many opportunities to falsify its predictions and assumptions. Since hypotheses cannot be proven directly, but alternative hypotheses can be disproven, the generation of plausible, testable alternative hypotheses is a requirement for scientific inquiry. Strong theories are able to make accurate predictions, offer causal explanations for diverse observations, and generate alternative hypotheses based on revised assumptions.

The words theory and assumption can seem abstract, but you should not be intimidated by them. Theories are just collections of expectations, each with a set of assumptions that place bounds on the prediction being made. If you understand what motivates an expectation, its predictions, and its assumptions, then you understand theory. Most expectations in population genetics will have at least a few, and often many, assumptions used to define and bound the situation. For example, we might assume something about the size of a population or the absence of mutation, or that all genotypes are diploid with two alleles. This is a way of limiting the prediction to appropriate circumstances and also a way of defining which quantities and conditions can vary and which are fixed. Each of these assumptions can influence the generality of an expectation. Each assumption can also be relaxed or altered to see how strongly it influences the expectation. To return to the example in the last section, if one day meteorites were falling around us with regularity we would be forced to call into question some of the basic assumptions originally used to formulate our expectation that meteorite strikes should be rare events. In this way, assumptions are useful tools to ask “what if … ?” as part of the process of developing a prediction. If our initial “what if … ?” conditions are badly off the mark, then the resulting prediction will probably also be poor.

In population genetics, as in much of science where theory and expectations are involved, empirical data and model expectations are routinely compared. Imagine observing a set of genotype frequencies in a biological population. It would then be natural to construct an idealized population using theory that approximates the biological population. This is an attempt to construct an idealized population that is equivalent to the actual population from the perspective of the processes influencing genotype frequencies. For example, a large population may behave exactly like a small, randomly mating ideal population in terms of genotype frequencies. This equivalence allows us to use expectations for ideal populations with one or a few variables specified in order to describe an actual population where there are many more, usually unknown, parameters. What we strive to do is to focus on those variables that strongly influence genotype frequencies in the actual population. In this way it is often possible to reduce the complexity of a real population and determine the key variables that strongly influence a property like genotype frequencies. The ideal population is not meant to match the actual population in every detail.

Theory A scheme or system of ideas or statements held as an explanation or account of a group of facts or phenomena; the general laws, principles, or causes of something known or observed.

Infer To draw a conclusion or make a deduction based on facts or indications; to have as a logical consequence.

From the comparison of expectation and observation, we infer that the first principles used to construct the expectation are sound if they can be used to explain patterns observed in the biological world. For example, before we had detailed knowledge of the processes involved in plant and animal reproduction, organisms were thought to be produced by so-called spontaneous generation from non-living materials. Eventually, controlled experiments revealed that the expectations of spontaneous generation were not met. The emergence of flies from pieces of rotting meat was taken as proof of spontaneous generation (a phenomenon that by itself turned out to be consistent with expectations from two incompatible theories). However, when meat was placed in screened containers and allowed to rot, no flies emerged. This latter evidence was inconsistent with the expectations of spontaneous generation and indicated that some of the assumptions behind the theory were not accurate. It was Louis Pasteur in 1859 who used heat sterilization to show conclusively that the expectations of spontaneous generation were not met.

If observations match what is expected from theory in an ideal population, we infer that the observed pattern may be caused by the processes represented in the ideal population. However, there is a major distinction between considering an actual and an idealized population equivalent and considering them identical. This is seen in cases where the observed pattern in an actual population is consistent with the expectations from several model populations built around distinct and incompatible assumptions. In such cases, it is not possible to infer the processes that cause a given pattern without additional information. A common example in population genetics are cases of genetic patterns that are potentially consistent with the random process of genetic drift and at the same time consistent with some form of the deterministic process of natural selection. In such cases unambiguous inference of the underlying cause of a pattern is not possible without additional empirical information or more precise expectations.

1.3 Simulation

A method of practice, trial and error learning, and exploration.

Imagine learning to play the piano without ever touching a piano or practicing the hand movements required to play. What if you were expected to play a difficult concerto after extensive exposure (perhaps a semester) to only verbal and written descriptions of how other people play? Such a teaching style would make learning to play the piano very difficult because there would be no opportunity for practice, trial and error, or exploration. You would not have the opportunity for direct experience nor incremental improvement of your understanding. Unfortunately, this is exactly how science courses are taught to some degree. You are expected to learn and remember concepts with only limited opportunity for directly observing principles in action. In fairness, this is partly due to the difficulty of carrying out some of the experiments or observations that originally lead someone to discover and understand an important principle.

In the field of population genetics computer simulations can be used to effectively demonstrate many fundamental genetic processes. In fact, computer simulations are an important research tool in population genetics. Therefore, when you conduct simulations you are both learning by direct experience and learning using the same methods that are used by researchers. Simulations allow us to view how quantities like allele frequencies change over time, observe their dynamics, and determine whether a stable end point is reached: an equilibrium. With simulations we can view dynamics (change over time) and equilibria over very long periods of time and under a vast array of conditions in an effort to reach general conclusions. Without simulations, it would be impossible for us to directly observe allele frequencies over such long periods of time and in such diverse biological situations.

Simulations are an effective means to understand some of the fundamental predictions of population genetics. Mathematical expressions are frequently used to express dynamics and equilibria in population genetics, but the equations alone can be opaque at first. Simulations provide a means to explore the relationships among variables that are summarized in the compact language of mathematics. Many people feel that a set of mathematical equations is much more meaningful after having the chance to explore what they describe with some actual numerical values. Simulation provides the means to explore what equations predict and can make learning population genetics an easier, more rewarding experience.

Carrying out simulations has the potential to make the expectations of population genetics much more accessible and understandable. Conducting simulations is not much extra work, especially once you get into the practice of using the text and simulation software in concert. You can approach simulations as if they are games, where each one shows a visual scene that helps to solve a puzzle. In addition, simulations can help you develop a more intuitive understanding of population genetic predictions so you do not have to approach the expectations of population genetics as disembodied or unanimated “facts.”

It is important to approach simulations in a systematic and organized fashion, not as just a collection of buttons to press and text entry boxes to be filled in on a whim. It is absolutely imperative that you understand the meaning behind each variable that you can control as well as the meaning of the results you obtain. To do so successfully you will need to be aware of both specific details and larger patterns, or both individual trees and the forest that they compose. For example, in a simulation that presents results as a graph, it is important that you understand the details of what variables are represented on each axis and the range of axis values. Sometimes these details are not always completely obvious in simulation software, requiring you to use both your intuition and knowledge of the population genetic processes being simulated.

Once you are comfortable with the details of a simulation, you will also want to keep track of the “big picture” patterns that emerge as you view simulation results. Seeing these patterns will often require that you examine the results over a range of conditions. Try approaching simulations as experiments by changing only one variable at a time until you understand its effects on the outcome. Changing several things all at once can lead to confusion and an inability to see cause-and-effect relationships, unless you have fully understood the effects of indi-vidual variables. Finally, try writing down parameter values you have tried in a simulation and sketching or tabulating results on paper as you work with a simulation. Use all of your skills as a scientist and student when conducting simulations and they will become a powerful learning tool. Eventually, you may even use scripting and programming to carry out your own simulations specifically designed to explore your own genetic hypotheses.

Interact box 1.1 The textbook website

Throughout this book you will encounter Interact boxes. These boxes contain opportunities for you to interact directly with the material in the text using computer simulations designed to demonstrate fundamental concepts of population genetics. Each box will contain step-by-step instructions for you to follow in order to carry out a simulation. By following the instructions you will get started with the simulation. However, always feel free to use your own imagination and intuition. After following the instructions in the Interact box and understanding the point at hand, enter different values, push more buttons, and even read the documentation. You can also return to Interact boxes at a later time, perhaps after you have read and understood more of the text, to reconsider a simulation or view it in a different light. You can also use the simulations to answer questions that may occur to you, or to test hypotheses that you may have. Questions in population genetics that start off “What would happen if … ?” are often begging to be answered with simulation.

Using Interact boxes will require that you are in front of a computer with a connection to the internet. You will also need user privileges to download and install programs in some cases. Some simulation programs have versions for multiple operating systems (e.g. Windows and Macintosh) whereas others can be used on only one operating system. It might be a good idea to think of locations now where you have access to computers with various operating systems. Finally, bear in mind that the simulation programs have all been donated to the scientific community by their respective authors and were often written in the author’s spare time. Don’t be surprised if the programs have a few rough edges or even bugs: focus on the population genetics concepts and remember that someone devoted their time to help you learn.

You will begin many of the Interact boxes by connecting to this textbook’s website, whereas for others you will use a program downloaded in an earlier Interact box. The worldwide web address (URL) for each of the simulation programs will be given on the textbook website rather than in the text itself. This prevents problems if web addresses change because the textbook website can be updated while your copy of the text cannot.

Step 1 Open a web browser and enter http://www.wiley.com/go/hamiltongenetics.

Step 2 If you are working on a computer you use regularly, bookmark the text website so you can reach it easily in the future.

Step 3 Click on the link to Interact boxes.

Step 4 Verify that the page gives links for each of the Interact boxes listed by their number. You could also bookmark this page so you can reach it directly in the future.

Congratulations! You have completed the first Interact box.

Chapter 1 review

General principles and direct measurements taken in actual populations combine to form comprehensive expectations about amounts, patterns, and cause-and-effect relationships in population genetics.The theory of population genetics is the collection of well-accepted expectations used to articulate a wide array of predictions about the biological processes that shape genetic variation.Parameters are idealized quantities that are exact while parameter estimates wear notational “hats” to remind us that they have statistical uncertainty.Population genetics uses both inductive reasoning to generalize from knowledge of specifics and deductive reasoning to build up predictions from general principles that can be applied to specific situations.Population genetics is not a spectator sport! Direct participation in computer simulation provides the opportunity to see population genetic processes in action. You can learn by trial and error and test your own understanding by making predictions and then comparing them with simulation results.