132,99 €
The field of whole genome selection has quickly developed into the breeding methodology of the future. As efforts to map a wide variety of animal genomes have matured and full animal genomes are now available for many animal scientists and breeders are looking to apply these techniques to livestock production. Providing a comprehensive, forward-looking review of animal genomics, Genomic Selection in Animals provides coverage of genomic selection in a variety of economically important species including cattle, swine, and poultry. The historical foundations of genomic selection are followed by chapters that review and assess current techniques. The final chapter looks toward the future and what lies ahead for field as application of genomic selection becomes more widespread. A concise, useful summary of the field by one of the world's leading researchers, Genomic Selection in Animals fills an important gap in the literature of animal breeding and genomics.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 401
Veröffentlichungsjahr: 2016
Cover
Title Page
Preface: Welcome to the “Promised Land”
1 Historical Overview
Introduction
The Mendelian Theory of Genetics
The Mendelian Basis of Quantitative Variation
Detection of QTL with Morphological and Biochemical Markers
DNA-Level Markers, 1974–1994
DNA-Level Markers Since 1995: SNPs and CNV
QTL Detection Prior to Genomic Selection
MAS Prior to Genomic Selection
Summary
2 Types of Current Genetic Markers and Genotyping Methodologies
Introduction
From Biochemical Markers to DNA-Level Markers
DNA Microsatellites
Single Nucleotide Polymorphisms
Copy Number Variation
Complete Genome Sequencing
Summary
3 Advanced Animal Breeding Programs Prior to Genomic Selection
Introduction
Within a Breed Selection: Basic Principles and Equations
Traditional Selection Schemes for Dairy Cattle
Crossbreeding Schemes: Advantages and Disadvantages
Summary
4 Economic Evaluation of Genetic Breeding Programs
Introduction
National economy versus competition among breeders
Criteria for Economic Evaluation: Profit Horizon, Interest Rate, and Return on Investment
Summary
5 Least Squares, Maximum Likelihood, and Bayesian Parameter Estimation
Introduction
Least Squares Parameter Estimation
ML Estimation for a Single Parameter
ML Multiparameter Estimation
Methods to Maximize Likelihood Functions
Confidence Intervals and Hypothesis Testing for MLE
Bayesian Estimation
Parameter Estimation via the Gibbs Sampler
Summary
6 Trait-Based Genetic Evaluation
Introduction
Principles of Selection Index
The Mixed Linear Model
The Mixed Model Equations
Solving the Mixed Model Equations
Important Properties of Mixed Model Solutions
Multivariate Mixed Model Analysis
The Individual Animal Model
Yield Deviations and Daughter Yield Deviations
Analysis of DYD as the Dependent Variable
Summary
7 Maximum Likelihood and Bayesian Estimation of QTL Parameters with Random Effects Included in the Model
Introduction
Maximum Likelihood Estimation of QTL Effects with Random Effects Included in the Model, the Daughter Design
The Granddaughter Design
Determination of Prior Distributions of the QTL Parameters for the Granddaughter Design
Formula for Bayesian Estimation and Tests of Significance of a Segregating QTL in a Granddaughter Design
Summary
8 Maximum Likelihood, Restricted Maximum Likelihood, and Bayesian Estimation for Mixed Models
Introduction
Derivation of Solutions to the Mixed Model Equations by Maximum Likelihood
Estimation of the Mixed Model Variance Components
Maximum Likelihood Estimation of Variance Components
Restricted Maximum Likelihood Estimation of Variance Components
Estimation of Variance Components via the Gibbs Sampler
Summary
9 Distribution of Genetic Effects, Theory, and Results
Introduction
Modeling the Polygenic Variance
The Effective Number of QTL
The Case of the Missing Heritability
Methods for Determination of Causative Mutations for QTL in Animals and Humans
Determination of QTN in Dairy Cattle
Estimating the Number of Segregating QTL Based on Linkage Mapping Studies
Results of Genome Scans of Dairy Cattle by Granddaughter Designs
Results of Genome-Wide Association Studies in Dairy Cattle by SNP Chips
Summary
10 The Multiple Comparison Problem
Introduction
Multiple Markers and Whole Genome Scans
QTL Detection by Permutation Tests
QTL Detection Based on the False Discovery Rate
A Priori Determination of the Proportion of False Positives
Biases with Estimation of Multiple QTL
Bayesian Estimation of QTL from Whole Genome Scans: Theory
Bayes A and Bayes B Models
Bayesian Estimation of QTL from Whole Genome Scans: Simulation Results
Summary
11 Linkage Mapping of QTL
Introduction
Interval Mapping by Nonlinear Regression: The Backcross Design
Interval Mapping for Daughter and Granddaughter Designs
Computation of Confidence Intervals
Simulation Studies of CIs
Empirical Methods to Estimate CIs, Parametric and Nonparametric Bootstrap, and Jackknife Methods
Summary
12 Linkage Disequilibrium Mapping of QTL
Introduction
Estimation of Linkage Disequilibrium in Animal Populations
Linkage Disequilibrium QTL Mapping: Basic Principles
Joint Linkage and Linkage Disequilibrium Mapping
Multitrait and Multiple QTL LD Mapping
Summary
13 Marker-Assisted Selection
Introduction
Situations in Which Selection Index is Inefficient
Potential Contribution of MAS for Selection within a Breed: General Considerations
Phenotypic Selection versus MAS for Individual Selection
MAS for Sex-Limited Traits
MAS Including Marker and Phenotypic Information on Relatives
Maximum Selection Efficiency of MAS with All QTL Known, Relative to Trait-Based Selection, and the Reduction in RSE Due to Sampling Variance
Marker Information in Segregating Populations
Inclusion of Marker Information in “Animal Model” Genetic Evaluations
Predicted Genetic Gains with Genomic Estimated Breeding Values: Results of Simulation Studies
Summary
14 Genetic Evaluation Based on Dense Marker Maps
Introduction
The Basic Steps in Genomic Evaluation
Evaluation of Genomic Estimated Breeding Values
Sources of Bias in Genomic Evaluation
Marker Effects Fixed or Random?
Individual Markers versus Haplotypes
Total Markers versus Usable Markers
Deviation of Genotype Frequencies from Their Expectations
Inclusion of All Markers versus Selection of Markers with Significant Effects
The Genomic Relationship Matrix
Summary
15 Genetic Evaluation Based on Analysis of Genetic Evaluations or Daughter-Yield Evaluations
Introduction
Comparison of Single-Step and Multistep Models
Derivation and Properties of Daughter Yields and DYD
Computation of “Deregressed” Genetic Evaluations
Analysis of DYD as the Dependent Variable with All Markers Included as Random Effects
Computation of Reliabilities for Genomic Estimated Breeding Values
Bayesian Weighting of Marker Effects
Additional Bayesian Methods for Genomic Evaluation
Summary
16 Genomic Evaluation Based on Analysis of Production Records
Introduction
Single-Step Methodologies: The Basic Strategy
Computation of the Modified Relationship Matrix when only a Fraction of the Animals are Genotyped: The Problem
Criteria for Valid Genetic Relationship Matrices
Computation of the Modified Relationship Matrix when only a Fraction of the Animals are Genotyped, the Solution
Solving the Mixed Model Equations without Inverting H
Inverting the Genomic Relationship Matrix
Estimation of Reliabilities for Genomic Breeding Values Derived by Single-Step Methodologies
Single-Step Computation of Genomic Evaluations with Unequally Weighted Marker Effects
Summary
17 Validation of Methods for Genomic Estimated Breeding Values
Introduction
Criteria for Evaluation of Estimated Genetic Values
Methods Used to Validate Genomic Genetic Evaluations
Evaluation of Two-Step Methodology Based on Simulated Dairy Cattle Data
Evaluation of Multistep Methodology Based on Actual Dairy Cattle Data
Evaluation of Single-Step Methodologies Based on Actual Dairy Cattle Data
Evaluation of Single- and Multistep Methodologies Based on Actual Poultry Data
Evaluation of Single- and Multistep Methodologies Based on Actual Swine Data
Evaluation of GEBV for Plants Based on Actual Data
Summary
18 By-Products of Genomic Analysis
Introduction
The Effects of Incorrect Parentage Identification on Breeding Programs
Principles of Parentage Verification and Identification with Genetic Markers
Paternity Validation Prior to High-Density SNP Chips
Paternity Validation and Determination with SNP Chips
Validation of More Distant Relationships
Pedigree Reconstruction with High-Density Genetic Markers
Summary
19 Imputation of Missing Genotypes
Introduction
Determination of Haplotypes for Imputation
Imputation in Humans versus Imputation in Farm Animals
Algorithms Proposed for Imputation in Human and Animal Populations
Comparisons of Accuracy and Speed of Imputation Methods
Effect of Imputation on Genomic Genetic Evaluations
Summary
20 Detection and Validation of Quantitative Trait Nucleotides
Introduction
GWAS for Economic Traits in Commercial Animals
Detection of QTN: Is It Worth the Effort?
QTN Determination in Farm Animals: What Constitutes Proof?
Concordance between DNA-Level Genotypes and QTL Status
Determination of Concordance by the “APGD”
Determination of Phase for Grandsires Heterozygous for the QTL
Determination of Recessive Lethal Genes by GWAS and Effects Associated with Heterozygotes
Verification of QTN by Statistical and Biological Methods
Summary
21 Future Directions and Conclusions
Introduction
More Markers versus More Individuals with Genotypes
Computation of Genomic Evaluations for Cow and Female Calves
Improvement of Genomic Evaluation Methods
Long-Term Considerations
Weighting Evaluations of Old versus Young Bulls
Direct Genetic Manipulation in Farm Animals
Velogenetics: The Synergistic Use of MAS and Germ-Line Manipulation
Summary
References
Index
End User License Agreement
Chapter 03
Table 3.1 Expected annual genetic gains in units of the genetic standard deviation for the half-sib (HS) and progeny test (PT) designs for a trait with a heritability of 0.25.
Chapter 10
Table 10.1 Estimation of FDR for granddaughter design results.
Chapter 03
Figure 3.1 Typical half-sib test breeding program.
Figure 3.2 Typical progeny test breeding program.
Chapter 05
Figure 5.1 The backcross design.
Chapter 07
Figure 7.1 The daughter design.
Figure 7.2 The granddaughter design.
Chapter 10
Figure 10.1 The
q
value (—), FWER (
…
), and comparison-wise type I error rate (CWER) (- - -) for the analysis of the granddaughter design data.
Figure 10.2 The
q
value (—), FWER (
…
), and comparison-wise type I error rate (CWER) (- - -) for the permuted granddaughter design data.
Chapter 11
Figure 11.1 The backcross design with flanking markers.
Chapter 19
Figure 19.1 Illustration of imputation based on common haplotypes for a specific chromosomal segment. Only two markers are assumed to be genotyped on the low-density (3K) chip and 21 markers on the medium-density (50K) chip. The two haplotypes of a single individual genotyped for the low-density chip are shown in the top part of the figure, and the pair of haplotypes for five individuals genotyped for the medium-density chip are illustrated in the lower part of the figure. Only the second haplotype listed for the medium-density chip corresponds to the first haplotype of the individual genotyped for the low-density chip and is indicated with a dotted line box. Three haplotypes of the medium-density chip correspond to the second haplotype for the individual genotyped for the low-density chip, and these are indicated with solid line boxes. All three have the same haplotype. Thus by imputation we can assume that the missing genotypes of the individual genotyped for the low-density chip correspond to the second and third rows in the lower part of the figure.
Chapter 20
Figure 20.1 Manhattan plot of the a posteriori granddaughter design for net merit. Nominal
P
(−log
10
) is plotted as a function of haplotype segment. The dotted line at 4.3 corresponds to genome-wide
P
= 0.05.
Cover
Table of Contents
Begin Reading
iii
iv
v
xiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
89
90
91
92
93
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
111
112
113
114
115
116
117
119
120
121
122
123
124
125
126
127
128
129
130
131
133
134
135
136
137
139
140
141
142
143
144
145
146
147
148
149
150
151
153
154
155
156
157
159
160
161
162
163
164
165
166
167
168
169
171
172
173
174
175
JOEL IRA WELLER
Institute of Animal SciencesAgricultural Research OrganizationBet Dagan, Israel
Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data
Names: Weller, Joel Ira, author.Title: Genomic selection in animals / Joel Ira Weller.Description: Hoboken, New Jersey : John Wiley & Sons Inc., [2015] | Includes bibliographical references and indexes.Identifiers: LCCN 2015042390 | ISBN 9780470960073 (cloth)Subjects: | MESH: Animals, Domestic–genetics. | Breeding. | Genetic Markers–genetics. | Quantitative Trait Loci–genetics. | Selection, Genetic.Classification: LCC SF105.3 | NLM SF 105 | DDC 636.08/21–dc23 LC record available at http://lccn.loc.gov/2015042390
Cover credit: Domestic animals © Jevtic/iStock/Getty Image Plus; Colorful smooth twist light lines background © VikaSuh/iStockphoto; Genes © Ingram Publishing/Getty Images
For Elisha Eliyahu, my special grandson
…And I saw a man who was standing in the gate. He looked as if he were bronze. In his hands, he had a string and a measuring stick…. The stick that the man had was 6 long cubits. But each cubit was a cubit plus the width of a hand. The man measured the wall. It was one stick high and one stick wide….
Ezekiel 40: 3–5
I have been involved in the field of genetic markers and quantitative trait loci since I began my doctorate under the direction of Prof. Morris Soller and Dr. Thomas Brody in 1977. In my doctorate thesis we grew 2000 tomato plants and used morphological and biochemical markers (isozymes). Since the early 1980s, Dr. Soller was convinced that marker-assisted selection was “just around the corner.” Now I can say without any exaggeration that we have arrived in the “promised land.” Marker-assisted selection, now generally termed genomic selection, has become a reality over the last 5 years for most of the important farm animals, especially dairy cattle. However, genomic evaluation is still very much a “work in progress.” Although there is definitely sufficient material in the literature to justify a text of this nature for graduate students, I am quite sure that a similar text in 5 years will look quite different.
When writing a book of this nature, one is always confronted with the problem of what to assume is already known by the reader and what has to be explained. Generally with respect to biology, very little is required of the reader. Anyone with a B.A. or B.Sc. in biology should have no problem with any biological concepts presented. Specifically with respect to genetics, I am assuming that the reader has a basic understanding of quantitative genetics, such as could be obtained from the classic Quantitative Genetics of Falconer (1964), or Genetics and Analysis of Quantitative Traits by Lynch and Walsh (1998). With respect to mathematics, I am assuming that the reader is familiar with both differential and integral calculus and has a basic familiarity with matrix algebra. Applications of matrix algebra specific to animal breeding are explained in some detail, even though this information has become quite standard for any graduate student in applied genetics. Detailed explanation of the physics and chemistry of current technologies used to genotype large numbers of markers and whole genome sequencing is outside the scope of this book.
Finally I want to thank those people who made this book possible. I have already mentioned my teachers Morris Soller and Thomas Brody, and I would also add the late Ram Moav and Reuven Bar-Anan. Also I thank my colleagues both in Israel and the United States, especially Micha Ron, Ephraim Ezra, Ignacy Misztal, George Wiggans, Paul VanRaden, and John Cole. I also thank my editors Justin Jeffryes and Stephanie Dollan, who I have yet to meet face-to-face, and last but not least my family, and especially my lovely wife Hedva, who has given me every support in this and in all my other endeavors.
Menachem Av, 5775
Genomic selection is based on the synthesis of statistical and molecular genetics that occurred during the last three decades. In this introductory chapter we will review the landmark breakthroughs that lead to this synthesis. The first section reviews the milestones in the synthesis of Mendelian and quantitative genetics. The next section reviews the early experiments of quantitative trait locus (QTL) detection using morphological and biochemical markers, beginning with Sax’s landmark experiment with beans (Phaseolus vulgaris). The following sections describe the development of DNA-level markers starting with restriction fragment length polymorphisms (RFLPs) to single nucleotide polymorphisms (SNPs) and copy number variations (CNV). The final sections describe QTL detection and marker-assisted selection (MAS) prior to genomic selection.
Modern genetics is usually considered to have started with the rediscovery of Mendel’s paper in 1900. The rediscovery of Mendel’s laws led to a rapid first synthesis of genetics, statistics, and cytology. Boveri (1902) and Sutton (1903), first proposed the “chromosomal theory of inheritance” that the Mendelian factors were located on the chromosomes. Using Drosophila, Morgan (1910) demonstrated that Mendelian genes were linked and could be mapped into linear linkage groups of a number equal to the haploid number of chromosomes. Hardy (1908) and Weinberg (1908) independently derived their famous equation to describe the distribution of genotypes in a segregating population at equilibrium. That is, the frequencies of genotypes for a locus with two alleles with frequencies p and 1 − p will be p2, 2p(1 − p), and (1 − p)2 for homozygotes for p-allele and heterozygotes and homozygotes for the other allele, respectively.
In 1919 Haldane derived a formula to convert recombination frequencies into additive “map units” denoted “Morgans” or “centimorgans,” assuming a random distribution of events of recombination along the chromosome. The Haldane mapping function (Haldane, 1919) is based on the assumption of zero “interference” throughout the genome. That is, all events of recombination are statistically independent. In this case the number of events of recombination in any given chromosomal segment corresponds to a Poisson distribution. The map distance between two genes in Morgans, M, which is a function of the frequency of observed recombination between them, R, is derived as follows:
Unlike the morphological traits analyzed first by Mendel and then by Morgan, most traits of economic interest in agricultural species display continuous variation, rather than the discrete distribution associated with Mendelian genes. Despite the early synthesis between Mendelian genetics and cytogenetics, there seemed to be no apparent connection between Mendelian genetics on the one hand and quantitative variation and natural selection on the other.
Experiments by Johanssen (1903) with beans demonstrated that environmental factors are a major source of variation in quantitative traits, leading to the conclusion that the phenotype for these traits is not a reliable indicator for the genotype. Yule in 1906 first suggested that continuous variation could be explained by the cumulative action of many Mendelian genes, each with a small effect on the trait. (Many different terminologies have been employed for these genes. I will use the term “QTL” throughout.) Fisher in 1918 demonstrated that segregation of QTL in an outcrossing population would generate correlations between relatives. Payne (1918) demonstrated that the X chromosome from selected lines of Drosophila contains multiple factors, which influenced scutellar bristle number. Thus, by 1920, the basic theory necessary for detection of individual genes affecting quantitative traits was in place.
In 1923 Sax demonstrated with beans that the effect of an individual locus on a quantitative trait could be isolated through a series of crosses, resulting in randomization of the genetic background with respect to all genes not linked to the genetic markers under observation. Even though all of his markers were morphological seed markers with complete dominance, he was able to show a significant effect on seed weight associated with some of his markers.
During the next 50 years, there were relatively few successful experiments that found marker–QTL linkage in plant and animal populations, and of these even fewer were independently repeated. A major problem was the relatively small size of most experiments. In most cases in which QTL effects were not found, power was too low to find segregating QTL of a reasonable magnitude (Soller et al., 1976).
In 1961 Neimann-Søressen and Robertson proposed a half-sib design for QTL detection in commercial dairy cattle populations. Although the actual results were disappointing, this was the first attempt to detect QTL in an existing segregating population. All previous studies were based on experimental populations produced specifically for QTL detection. This study was also groundbreaking in other aspects. It was the first study to use blood groups rather than morphological markers, and the proposed statistical analyses—a χ2 (chi-squared) test, based on a squared sum of normal distributions, and ANOVA—were also unique. This was the first study that attempted to estimate the power to detect QTL and to consider the problem of multiple comparisons when several traits and markers were analyzed jointly.
Lewontin and Hubby showed in 1966 that electrophoresis could be used to disclose large quantities of naturally occurring enzyme polymorphisms in Drosophila. Almost all enzymes analyzed showed some polymorphism that could be detected by the speed of migration in an electric field. Studies with domestic plant and animal species found that electrophoretic polymorphisms were much less common in agricultural populations. During the 1980s there were a number of QTL detection studies in agricultural plants based on isozymes using crosses between different strains or even species in order to generate sufficient electrophoretic polymorphism (Tanksley et al., 1982; Kahler and Wherhahn, 1986; Edwards et al., 1987; Weller et al., 1988). It was clear, though, that naturally occurring biochemical polymorphisms were insufficient for complete genome analyses in populations of interest.
The first detected DNA-level polymorphisms were RFLPs. Grodzicker et al. (1974) first showed that restriction fragment band patterns could be used to detect genetic differences in viruses. Solomon and Bodmer (1979) and Botstein et al. (1980) proposed RFLP as a general source of polymorphism that could be used for genetic mapping. Although RFLPs are diallelic, initial theoretical studies demonstrated that they might be present throughout the genome. Beckmann and Soller (1983) proposed using RFLP for detection and mapping of QTL. The first genome-wide scan for QTL using RFLP was performed on tomatoes by Paterson et al. (1988). In animal species, however, RFLP markers were homozygous in most individuals and therefore have not been as useful for QTL mapping.
A major breakthrough came at the end of the decade with the discovery of DNA microsatellites. Mullis et al. (1986) proposed the “polymerase chain reaction” (PCR) to specifically amplify any particular short DNA sequence. Using the PCR, large enough quantities of DNA could be generated so that standard analytical methods could be applied to detect polymorphisms consisting of only a single nucleotide. Since the 1960s, it has been known that the DNA of higher organisms contains extensive repetitive sequences. In 1989 three laboratories independently found that short sequences of repetitive DNA were highly polymorphic with respect to the number of repeats of the repeat unit (Litt and Luty, 1989; Tautz, 1989; Weber and May, 1989). The most common of these repeat sequences was poly(TG), which was found to be very prevalent in all higher species. These sequences were denoted “simple sequence repeats” (SSR) or “DNA microsatellites.”
Microsatellites were prevalent throughout all genomes of interest. Nearly all poly(TG) sites were polymorphic, even within commercial animal populations. These markers, unlike most morphological markers, were by definition “codominant.” That is, the heterozygote genotype could be distinguished from either homozygote. Furthermore, microsatellites were nearly always polyallelic. That is, more than two alleles were present in the population. Thus, most individuals were heterozygous. Relatively dense genetic maps based on microsatellites were generated nearly in all agricultural species (e.g., Ihara et al., 2004), and these markers were also used to detect and map segregating QTL. The weaknesses of microsatellites are twofold: First their distribution throughout the genome is not sufficiently dense for determination of causative polymorphisms responsible for observed QTL. (The causative polymorphisms will be denoted “quantitative trait nucleotides” (QTN).) Second, due to the repeat structure of microsatellites, PCR amplification was generally not exact, and “stutter bands” with varying numbers of the repeat unit were generated. Various rules were developed to estimate the actual genotype from the observed PCR product, but the analysis could not be fully automated. A technician still had to review each individual genotype, and error rates in genotype determination were in the range of 1–5%.
Since 2000 “SNPs” (reviewed by Brookes (1999)) have supplanted microsatellites as the marker of choice for genetic analysis. An SNP is generally defined as a DNA base pair location at which the frequency of the most common base pair is lower than 99%. Unlike microsatellites, which usually have multiple alleles, SNPs are generally biallelic, but are much more prevalent throughout the genome, with an estimated frequency of one SNP per 300–500 base pairs. In human populations differences in the base pair sequence of any two randomly chosen individuals occur at a frequency of approximately one per 1000 kb (Brookes, 1999). Thus, SNPs can be found in genomic regions that are microsatellite poor. SNPs are apparently more stable than microsatellites, with lower frequencies of mutation. Beginning in 2005, methods were developed for automated scoring of first thousands and then hundreds of thousands of microsatellites per individual. Genotyping error rates are in the range of 0.05–0.01% with “BeadChip” technology (Weller et al., 2010). A detailed description of the technologies developed for high-throughput SNP analysis is beyond the scope of the current text. For details, see Matukumalli et al. (2009).
Generally both natural and commercial populations are at linkage equilibrium for the vast majority of the genome. The exception is genomic sites that are closely linked on the same chromosome. Unlike genetic linkage within families that extends over tens of centimeters, population-wide linkage disequilibrium (LD) extends in animals over less than 1 cM (Sargolzaei et al., 2008; Qanbari et al., 2010). Therefore, unless a segregating genetic marker is closely linked to a QTL segregating in the population with an effect on some trait of interest, no effect will be associated with the marker genotypes. Thus naturally occurring LD could not be exploited prior to the advent of high-density genome scans. To detect the effect of a single QTL in outbred populations prior to high-density genome scans, it was necessary to generate LD.
In an analysis of inbred lines we are confronted with the opposite problem. That is, a significant effect associated with a genetic marker may be due to many genes throughout the genome and not necessarily to genes linked to the genetic markers. In crosses between inbred lines it was necessary to devise an experimental design that isolates the effects of the chromosomal segments linked to the segregating genetic markers.
Experimental designs can be divided into designs that are appropriate for crosses between inbred lines and those designs that can be used for segregating populations. Most early analyses performed to detect QTL have been based on planned crosses, although studies on humans, large farm animals, and trees have used existing populations. For humans, most species of domestic animals, and fruit trees, it is impractical to produce the inbred lines. Instead, experimental designs were based on the analysis of families within existing populations. Three basic types of analyses have been proposed—the “sib-pair” analysis for analysis of many small full-sib families, the “full-sib” design for analysis of large full-sib families, and the “half-sib” or “daughter design” analysis for large half-sib families.
Prior to genomic selection, two MAS breeding programs were initiated in dairy cattle based on microsatellites in German and French Holsteins (Bennewitz et al., 2004b; Boichard et al., 2006). Both programs computed marker-assisted genetic evaluations (MA-BLUP) based on the algorithm of Fernando and Grossman (1989).
In the German program, markers on three chromosomes were used. The evaluations were distributed to Holstein breeders who used these evaluations for selection of bull dams and preselection of sires for progeny testing. The algorithm only included equations for bulls and bull dams, and the dependent variable was the bull’s daughter yield deviation (VanRaden and Wiggans, 1991; derivation and use of daughter yield deviations will be discussed in detail in Chapter 6). Linkage equilibrium throughout the population was assumed. To close the gap between the grandsire families analyzed in the German granddaughter design and the bulls in use in 2004, 3600 bulls were genotyped in 2002. Until 2008, about 800 bulls were evaluated each year. Only bulls and bull dams were genotyped, since tissue samples were already collected for paternity testing. Thus additional costs due to MAS were low, and even a very modest genetic gain could be economically justified. This scheme was similar to the “top-down” scheme of Mackinnon and Georges (1998) in that evaluation of the sons was used to determine which grandsires were heterozygous for the QTL and their linkage phase. This information was then used to select grandsons based on which haplotype was passed from their sires. It differed from the scheme of Mackinnon and Georges (1998) in that the grandsons were preselected for progeny test based on MA-BLUP evaluations (Fernando and Grossman, 1989), which include general pedigree information in addition to genotypes.
The French MAS program included elements of both the “top-down” and “bottom-up” MAS designs. Similar to the German program, genetic evaluations including marker information were computed by a variant of MA-BLUP, and only genotyped animals and nongenotyped connecting ancestors were included in the algorithm. Genotyped females were characterized by their average performance based on precorrected records (with the appropriate weight), whereas males were characterized by twice the “yield deviations” of their nongenotyped daughters (yield deviations will also be explained in Chapter 6). Twelve chromosomal segments, ranging in length from 5 to 30 cM, were analyzed. Regions with putative QTL affecting milk production or composition were assumed to be located on bovine chromosomes 3, 6, 7, 14, 19, 20, and 26; segments affecting mastitis resistance on chromosomes 10, 15, and 21; and chromosomal segments affecting fertility on chromosomes 1, 7, and 21. Each region was found to affect one to four traits, and on an average three regions with segregating QTL were found for each trait. Each region was monitored by two to four evenly spaced microsatellites, and each animal included in the MAS program was genotyped for at least 43 markers. Sires and dams of candidates for selection, all male AI ancestors, up to 60 AI uncles of candidates, and sampling daughters of bull sires and their dams are genotyped. The number of genotyped animals was 8000 in 2001 and 50,000 in 2006.
Guillaume et al. (2008) estimated by simulation the efficiency of the French program. Breeding values and new records were simulated based on the existing population structure and knowledge of the variances and allelic frequencies of the QTL under MAS. Reliabilities of genetic values of animals less than 1 year old obtained with and without marker information were compared. Mean gains of reliability ranged from 0.015 to 0.094 and from 0.038 to 0.114 in 2004 and 2006, respectively. The larger number of animals genotyped and the use of a new set of genetic markers can explain the improvement of MAS reliability from 2004 to 2006. This improvement was also observed by the analysis of information content for young candidates. The gain of MAS reliability with respect to classical selection was larger for sons of sires with genotyped daughters with records.
By 2005 dense genetic maps based on DNA-level genetic markers were developed for nearly all economically important animal species. Numerous studies demonstrated that QTL affecting traits of economic importance could be detected via linkage to genetic markers. Theory was developed for MAS based on selection of a relatively small number of chromosomal segments, and several MAS breeding programs for dairy cattle were implemented in two countries. The “rules of the game” were to change dramatically in 2006 with the development of high-throughput SNP chips, which will be discussed in detail in the next chapter.
Although a detailed description of DNA technology is outside the scope of this book, a brief discussion of the types of markers that were used for marker-assisted selection and the markers currently used for genomic selection has been included, as the characteristics of these markers affect the methodologies that have been developed for marker-assisted and genomic selection. In the final section we briefly review the current state of complete genome sequencing, which in all likelihood is the “wave of the future.”
As noted in the previous chapter, the first study to use biochemical markers (as opposed to morphological markers) to detect segregating QTL was the study of Neimann-Sørensen and Robertson (1961), which used blood groups as genetic markers. During the 1960s it became clear that there was considerable variation in enzyme sequence that could be detected by electrophoresis. A number of studies were concluded during the 1980s using electrophoretic markers to detect segregating QTL in plant species (e.g., Weller et al., 1988). However, electrophoretic markers were not polymorphic in commercial animal species. In addition to blood group markers, polymorphisms were also found in milk proteins, and several studies were performed to detect QTL via linkage to these markers (e.g., Bovenhuis and Weller, 1994).
The first DNA-level genetic markers found in animal species were restriction fragment length polymorphisms (RFLP). Although several studies were performed in plants to detect QTL via linkage to RFLP (Paterson et al., 1988), these markers were not found to be very polymorphic in domestic animal species. A major breakthrough occurred with the development of the polymerase chain reaction (PCR) (Mullis et al., 1986). Via the PCR it was possible to specifically amplify any particular short DNA sequence, provided unique primer sequences could be constructed. Thus large enough quantities of DNA could be generated so that standard analytical methods could be applied to detect polymorphisms consisting of only a single nucleotide.
Since the 1960s it has been known that the DNA of higher organisms contains extensive repetitive sequences. In 1989 three laboratories independently found that short sequences of repetitive DNA were highly polymorphic with respect to the number of repeats of the repeat unit (Litt and Luty, 1989; Tautz, 1989; Weber and May, 1989). The most common of these repeat sequences was poly(TG), which was found to be very prevalent in all higher species. These sequences were denoted “simple sequence repeats” (SSR) or “DNA microsatellites.” Microsatellites were prevalent throughout all genomes of interest. Nearly all poly(TG) sites were polymorphic in the number of TG repeats, even within commercial animal populations. These markers were by definition “codominant.” That is, the heterozygote genotype could be distinguished from either homozygote. Furthermore, microsatellites were nearly always polyallelic. That is, more than two different alleles were present in the population. Thus, most individuals were heterozygous.
During the 1990s genotyping costs per polymorphism were reduced from approximately $10 per genotype to about 1$ per genotype, due to development of machines specifically designed for this purpose, specifically the ABI DNA sequencer, which implemented nonradioactive analysis methods. In addition costs were reduced due to multiplexing of PCR, which runs several PCR in the same sample, and improved software for analysis. Dense genetic maps based on microsatellites were generated for most agricultural species, and genome scans for segregating QTL were performed for most agricultural animal populations of interest (reviewed by Weller (2007)).
Despite these advantages, microsatellites had several significant drawbacks, due chiefly to the prevalence of “stutter bands.” (These bands are generated by “mistakes” in DNA replication during the PCR, in which a unit of the repeat motif is either deleted or added. Thus instead of a single clear band for each allele, secondary bands are also generated.) First, although software was developed to determine genotypes from the banding pattern, genotyping could not be completely automated. It was still necessary for a qualified technician to review the software results and make corrections. Second, genotyping error rates were often unacceptably high (e.g., Weller et al., 2004). Finally the average density of microsatellites in the genome was not sufficient to capture population-wide linkage disequilibrium, which we will see is now the basis of genomic selection.
Since 1995 new classes of markers have also come into use. Chief among them are “single nucleotide polymorphisms” (SNP) (reviewed by Brookes (1999)). An SNP is generally defined as a base pair location at which the frequency of the most common base pair is lower than 99%. Unlike microsatellites, which usually have multiple alleles, SNPs are generally diallelic, but are much more prevalent throughout the genome, with an estimated frequency of one SNP per 300–500 base pairs. In human populations differences in the base pair sequence of any two randomly chosen individuals occur at a frequency of approximately one per 1000 kb (Brookes, 1999). Thus, SNPs can be found in genomic regions that are microsatellite poor. SNPs are apparently more stable than microsatellites, with lower frequencies of mutation. Ranade et al. (2001) first described conditions for genotyping large numbers of individuals for any SNP and computational methods that allow genotypes to be assigned automatically.
Several companies developed genotyping platforms for high-throughput genotyping of tens and even hundreds of thousands of SNPs simultaneously. By 2008 genotyping costs for SNPs were reduced to below $0.01 per genotype and are currently approximately $0.002 per genotype. Currently the leading technology for high-throughput SNP genotyping is “Infinium HD assay” (http://support.illumina.com/content/dam/illumina-support/documents/myillumina/67f59f89-51ee-44d6-b1bb-a53dcb5bd01e/infinium_hd_ultra_user_guide_11328087_revb.pdf). Based on this technology, “mid-density BeadChips” have been developed for all the major agricultural animal species including 50–60 thousand markers. The “BovineHD BeadChip” (Illumina, Inc., San Diego, CA) was developed which includes 777,000 SNPs that span the entire cattle genome. A poultry array with over 580,000 markers has also been developed and is commercially available (http://www.affymetrix.com/catalog/prod670010/AFFY/Axiom%26%23174%3B+Genome%26%2345%3BWide+Chicken+Genotyping+Array#1_1). “High-density” marker arrays with more than half a million markers are under development for other major agricultural species.
DNA copy number variation (CNV) has long been associated with specific chromosomal rearrangements and genomic disorders, but its ubiquity in mammalian genomes was not fully realized until 2006. Copy number variants account for a substantial amount of genetic variation. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation (Freeman et al., 2006). A total of 1447 copy number variable regions, which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the human genome), were identified (Redon et al., 2006). These sequences contained hundreds of genes, disease loci, functional elements, and segmental duplications. Notably, the copy number variable regions encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution.
To date CNV has not been used significantly as a source of genetic polymorphism for detection or analysis of QTL. However, Maher (2008) proposed CNV as one of the reasons that only a small fraction of the total additive genetic variation in human height could be explained by genes detected in genome scans based on SNP.
The ultimate method for determining all variation in DNA is complete sequencing of the genome. The first DNA sequences were obtained in the early 1970s using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with automated analysis, DNA sequencing became easier and orders of magnitude faster. Several new methods for high-throughput DNA sequencing were developed in the mid to late 1990s and were implemented in commercial DNA sequencers by the year 2000. In general these methods are termed “next-generation sequencing.” Resequencing is necessary, because the genome of a single individual of a species will not indicate all of the genome variations among other individuals of the same species. All of these methods parallelize the sequencing process, producing thousands or millions of sequences concurrently. In ultrahigh-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel.
These techniques have drastically lowered the cost of complete sequence of the genome. A $3-billion project to sequence the human genome was founded in 1990 by the US Department of Energy and the National Institutes of Health and was expected to take 15 years. A “rough draft” of the genome was finished in 2000. Ongoing sequencing led to the announcement of the essentially complete genome on April 14, 2003, 2 years earlier than planned. By 2015 complete genome sequencing costs have been reduced to several thousand dollars per individual.
The 1000 Genomes Project was launched in January 2008 to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within 3 years. McVean et al. (2012) reported on the completion of the sequencing of 1092 human genomes. The complete genome of an individual cow was first sequenced in 2009 (Bovine Genome Sequencing and Analysis Consortium et al., 2009). In 2012 the 1000 bull genomes project was initiated. Daetwyler et al. reported in 2014 on the complete sequencing of 234 bulls from different breeds to an average of 8.3-fold genome coverage.
In this chapter we reviewed the major milestones in the development of methodologies for high-throughput genotyping of large numbers of markers per individual. Since the original discovery of microsatellites in 1990, which were the first class of polymorphisms that made genome scans possible, costs were reduced from $10 per genotype to $0.002 per genotype. Among the SNP chips that are currently available for cattle are arrays that genotype 3000, 8000, 54,609, 139,480, 640,000, and 777,000 markers. Through 2015 genotyping costs have continued to decrease, making possible complete genome analyses based on next-generation sequencing methodologies at costs of several thousand dollars on the one hand and genotypes of several thousand markers at costs attractive to the individual farmer on the other.
Before considering how marker-assisted selection (MAS) or genomic selection can be applied to animal breeding programs, it is necessary to understand the basic mechanics of animal breeding programs prior to MAS. All animal breeding programs are based on the principles of quantitative genetics, which will not be considered in details in this book. Advanced animal breeding programs can be divided into two groups: within-breed selection and programs based on crossbreeding among different breeds. Within-breed selection has been applied and studied most extensively for dairy cattle. Breeding programs based on crossbreeding are the norms for beef cattle, poultry, and swine. Crossbreeding programs can be further divided into those programs that are based on crossing two, three, or four breeds. The main advantages of crossbreeding schemes are twofold: utilization of heterosis and the fact that economic traits have different values in males and females. The disadvantage is the cost of maintaining the pure lines.
In the next section we will describe the basic principles used to evaluate selection within a breed. In the following section we will apply these principles to the specific problems related to dairy cattle breeding and the major breeding schemes that have been applied or proposed. In the following section we will also consider in more detail the advantages and limitations of crossbreeding programs, especially as related to MAS.
The genetic gain due to selection within a breed per generation, Φ, will be a function of the selection intensity, is; the accuracy of the evaluation, ac; and the additive genetic standard deviation, σg. In most animal breeding schemes, the selection intensity and the accuracy of the evaluation will be different along the four paths of inheritance: sire to son, sire to daughter, dam to son, and dam to daughter. In general the genetic gain per generation along the four paths of inheritance can be computed by the following equation:
where Φi is the genetic gain per generation for path i, isi is the selection intensity for path i, aci is the accuracy of the genetic evaluation for path i, and σg is the genetic standard deviation, which will be the same for all four paths of inheritance. The selection intensity is the difference between the mean of the individuals selected as parents and the general population mean in units of the standard normal distribution. If a fraction, p, of the population is selected to be parents of the next generation, then is can be computed as the density of the standard normal curve at the point of truncation divided by p (Falconer, 1964). The accuracy of the genetic evaluation is defined as the correlation between the genetic evaluation and the actual genetic value. Although the actual genetic value is unknown, the accuracy of the evaluation can be estimated as will be explained in Chapter 6, Section “Important Properties of Mixed Model Solutions.” The square of the accuracy is termed the “reliability” of the evaluation. The annual gain for the entire population is then computed as
where Φss, Φsd, Φds, and Φdd are the genetic gains per generation along the four paths of inheritance and Gss, Gsd, Gds, and Gdd are the generation intervals in years along the four paths.
