156,99 €
OGENOTYPING BY SEQUENCING FOR CROP IMPROVEMENT A thoroughly up-to-date exploration of genotyping-by-sequencing technologies and related methods in plant science In Genotyping by Sequencing for Crop Improvement, a team of distinguished researchers delivers an in-depth and current exploration of the latest advances in genotyping-by-sequencing (GBS) methods, the statistical approaches used to analyze GBS data, and its applications, including quantitative trait loci (QTL) mapping, genome-wide association studies (GWAS), and genomic selection (GS) in crop improvement. This edited volume includes insightful contributions on a variety of relevant topics, like advanced molecular markers, high-throughput genotyping platforms, whole genome resequencing, QTL mapping with advanced mapping populations, analytical pipelines for GBS analysis, and more. The distinguished contributors explore traditional and advanced markers used in plant genotyping in extensive detail, and advanced genotyping platforms that cater to unique research purposes are discussed, as is the whole-genome resequencing (WGR) methodology. The included chapters also examine the applications of these technologies in several different crop categories, including cereals, pulses, oilseeds, and commercial crops. Genotyping by Sequencing for Crop Improvement also offers: * A thorough introduction to molecular marker techniques and recent advancements in the technology * Comprehensive explorations of the genotyping of seeds while preserving their viability, as well as advances in genomic selection * Practical discussions of opportunities and challenges relating to high throughput genotyping in polyploid crops * In-depth examinations of recent advances and applications of GBS, GWAS, and GS in cereals, pulses, oilseeds, millets, and commercial crops Perfect for practicing plant scientists with an interest in genotyping-by-sequencing technology, Genotyping by Sequencing for Crop Improvement will also earn a place in the libraries of researchers and students seeking a one-stop reference on the foundational aspects of - and recent advances in - genotyping-by-sequencing, genome-wide association studies, and genomic selection.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 900
Veröffentlichungsjahr: 2022
Cover
Title Page
Copyright Page
Dedication
List of Contributors
Preface
1 Molecular Marker Techniques and Recent Advancements
1.1 Introduction
1.2 What is a Molecular Marker?
1.3 Classes of Molecular Markers
1.4 Sequencing‐based Markers
1.5 Recent Advances in Molecular Marker Technologies
1.6 SNP Databases
1.7 Application of Molecular Markers
1.8 Summary
References
2 High‐throughput Genotyping Platforms
2.1 Introduction
2.2 SNP Genotyping Platforms
References
3 Opportunity and Challenges for Whole‐Genome Resequencing‐based Genotyping in Plants
3.1 Introduction
3.2 Basic Steps Involved in Whole‐Genome Sequencing and Resequencing
3.3 Whole‐Genome Resequencing Mega Projects in Different Crops
3.4 Whole‐Genome Pooled Sequencing
3.5 Pinpointing Gene Through Whole‐Genome Resequencing‐based QTL Mapping
3.6 Online Resources for Whole‐Genome Resequencing Data
3.7 Applications and Successful Examples of Whole‐Genome Resequencing
3.8 Challenges for Whole‐Genome Resequencing Studies
3.9 Summary
References
4 QTL Mapping Using Advanced Mapping Populations and High‐throughput Genotyping
4.1 Introduction
4.2 The Basic Objectives of QTL Mapping
4.3 QTL Mapping Procedure
4.4 The General Steps for QTL Mapping
4.5 Factors Influencing QTL Analysis
4.6 QTL Mapping Approaches
4.7 Statistical Methods for QTL Mapping
4.8 Software for QTL Mapping
4.9 Bi‐parental Mapping Populations
4.10 QTL Mapping Using Bi‐parental Populations
4.11 Multiparental Mapping Populations
4.12 QTL Mapping Using Multiparental Populations
4.13 Use of High‐throughput Genotyping for QTL Mapping
4.14 Next‐Generation Sequencing‐based Genotyping
4.15 Challenges with QTL Mapping Using Multiparental Populations and High‐throughput Genotyping
References
5 Genome‐Wide Association Study: Approaches, Applicability, and Challenges
5.1 Introduction
5.2 Methodology to Conduct GWAS in Crops
5.3 Statistical Modeling in GWAS
5.4 Efficiency of GWAS with Different Marker Types
5.5 Computational Tools for GWAS
5.6 GWAS Challenges for Complex Traits
5.7 Factors Challenging the GWAS for Complex Traits
5.8 GWAS Applications in Major Crops
5.9 Candidate Gene Identification at GWAS Loci
5.10 Meta‐GWAS
5.11 GWAS vs. QTL Mapping
References
6 Genotyping of Seeds While Preserving Their Viability
6.1 Introduction
6.2 Genotyping‐by‐Sequencing with Minimum DNA
6.3 DNA Extraction from Half Grain
6.4 GBS with Half Seed
6.5 Applications of GBS as Diagnostic Tool
6.6 Summary
References
7 Genomic Selection: Advances, Applicability, and Challenges
7.1 Introduction
7.2 Natural Selection
7.3 Breeding Selection
7.4 Marker‐assisted Selection
7.5 Genomic Selection
7.6 Genotyping for Genomic Selection
7.7 Integration of Genomic Selection in MAS Program
7.8 The Efficiency of Genomic Selection for Complex Traits
7.9 Integration of Genomic Selection in the Varietal Trial Program
7.10 Cost Comparison of GS vs MAS
References
8 Analytical Pipelines for the GBS Analysis
8.1 Introduction
8.2 Applications of NGS
8.3 NGS Sequencing Platforms
8.4 Tools for NGS Data Analysis
8.5 Generalized Procedure for NGS Data Analysis
8.6 Variant Annotation
8.7 Role of NGS Informatics in Identifying Variants
8.8 Genotyping by Sequencing
8.9 Analytical Pipelines for GBS
8.10 Comparison of GBS Pipelines
References
9 Recent Advances and Applicability of GBS, GWAS, and GS in Maize
9.1 Introduction
9.2 Maize Genetics
9.3 Importance of Genomics and Genotyping‐based Applications in Maize Breeding Programs
9.4 GBS‐based QTL Mapping in Maize
9.5 GBS Protocols and Analytical Pipelines for Maize
9.6 Maize Genome Sequencing and Resequencing
9.7 Genotyping‐by‐Sequencing‐based GWAS and GS Efforts in Maize
9.8 Summary
References
10 Recent Advances and Applicability of GBS, GWAS, and GS in Soybean
10.1 Introduction
10.2 GBS Efforts in Soybean
10.3 High‐Density Linkage Maps in Soybean
10.4 GBS Protocols and Analytical Pipelines for Soybean
10.5 GBS‐based QTL Mapping Efforts in Soybean
10.6 Soybean Genome Sequencing and Resequencing
10.7 GBS‐based GWAS Efforts in Soybean
10.8 GBS‐based Genomic Selection Efforts in Soybean
References
11 Advances and Applicability of Genotyping Technologies in Cotton Improvement
11.1 Introduction
11.2 Challenges due to Polyploidy in Cotton
11.3 Applications of Genomics and Genotyping for Cotton Breeding Programs
11.4 Genotyping Efforts in Cotton
11.5 High‐Density Linkage Maps in Cotton
11.6 Whole‐Genome Sequencing of Cotton Germplasm
11.7 Application of GBS Technology in Cotton Research
11.8 GBS‐based Bi‐Parental QTL Mapping and Association Mapping in Cotton
11.9 Summary and Outlook
References
12 Recent Advances and Applicability of GBS, GWAS, and GS in Millet Crops
*
12.1 Introduction
12.2 GBS Efforts in Millet Crops
12.3 High‐density Linkage Maps in Millet Crops
12.4 GBS‐based QTL Mapping Efforts in Millet Crops
12.5 Genome Sequencing and Resequencing of Millet Crops
12.6 GBS‐based GWAS Efforts in Millet Crops
12.7 GBS‐based Genomic Selection (GS) Efforts in Millet Crops
12.8 Summary
References
13 Recent Advances and Applicability of GBS, GWAS, and GS in Pigeon Pea
13.1 Introduction
13.2 Pigeon Pea Sequencing and Resequencing
13.3 Development of Pigeon Pea High‐density Genotyping Platforms
13.4 Development of High‐density Linkage Maps in Pigeon Pea
13.5 QTL Analysis Using High‐density Genotyping Platforms and GBS
13.6 GWAS Efforts in Pigeon Pea
13.7 Genomic Selection (GS) Efforts in Pigeon Pea
13.8 Summary
References
14 Opportunity and Challenges for High‐throughput Genotyping in Sugarcane
14.1 Introduction
14.2 Sugarcane Genome and Genetics
14.3 Genetic Studies and Marker Systems
14.4 Genotyping‐by‐Sequencing (GBS)
14.5 SNP Calling Using GBS Pipelines
14.6 Sugarcane Genome Sequencing
14.7 Linkage and QTL Mapping in Sugarcane
14.8 GWAS in Sugarcane
14.9 Genomic Selection in Sugarcane
14.10 Summary
References
15 Recent Advances and Applicability of GBS, GWAS, and GS in Polyploid Crops
15.1 Introduction
15.2 Challenges for Genotyping in Polyploidy Crops
15.3 Genotyping Platforms for Barley
15.4 Long‐Read Sequencing‐based Genotyping in Polyploid Canola
15.5 Peanut Genotyping with Targeted Amplicon Sequencing
15.6 SNP Genotyping Methods and Platforms Available for Sugarcane
15.7 Recent Advances and Applicability of GBS, GWAS, and GS in Polyploidy Crop Species
15.8 Haplotype‐based Genotyping
15.9 GBS Analytical Pipelines for Polyploids
15.10 GBS‐based QTL Mapping Efforts in Polyploids
15.11 GWAS and GS Using High‐throughput Genotyping in Polyploidy Crops
References
16 Recent Advances and Applicability of GBS, GWAS, and GS in Oilseed Crops
16.1 Introduction
16.2 GBS Efforts in Oilseed Crops
16.3 High‐density Linkage Maps for Oilseed Crops
16.4 GBS Protocols and Analytical Pipelines
16.5 GBS‐based QTL Mapping Efforts in Oilseed Crops
16.6 GBS‐based GWAS Efforts in Oilseed Crops
References
Index
End User License Agreement
Chapter 1
Table 1.1 Details of the other important molecular markers.
Table 1.2 Comparison between different marker techniques commonly used in p...
Table 1.3 List of important online SNP databases.
Chapter 2
Table 2.1 Customized SNP array details in plant species.
Chapter 4
Table 4.1 The different software used for quantitative loci (QTL) mapping....
Table 4.2 Studies that utilized high‐throughput genotyping for QTL mapping....
Chapter 5
Table 5.1 Popular bioinformatics software and tools available for GWAS anal...
Table 5.2 Genome‐wide association studies (GWAS) conducted for dissection o...
Table 5.3 Candidate genes identified through genome‐wide association studie...
Chapter 7
Table 7.1 Table showing different crop plants where GBS was used (Adapted f...
Table 7.2 Table showing various GS studies carried out in different crops (...
Chapter 8
Table 8.1 Different variant identification tools.
Table 8.2 Tools for variant annotation.
Chapter 9
Table 9.1 Linkage map developed in maize using genotyping by sequencing (GB...
Table 9.2 List of GBS‐based QTL mapping studies in maize.
Table 9.3 GBS‐based GWAS efforts in maize.
Chapter 10
Table 10.1 High‐throughput genomics platforms used for soybean genotyping....
Table 10.2 List of GBS‐based QTL mapping studies in soybean.
Table 10.3 Details of efforts performed for whole‐genome resequencing and r...
Table 10.4 List of GBS‐based GWAS studies in soybean.
Chapter 11
Table 11.1 List of genomic resources available in cotton.
Table 11.2 Development of various interspecific and intraspecific linkage (...
Table 11.3 List of genome wide association studies (GWAS) in cotton.
Table 11.4 List of GBS‐based QTL mapping studies in cotton.
Chapter 12
Table 12.1 High‐throughput genomic platforms used for genotyping of millet ...
Table 12.2 List of GBS‐based QTL mapping studies in millet crops.
Table 12.3 Details of efforts performed for whole‐genome resequencing and r...
Table 12.4 Summarized millet genome assembly statistics.
Table 12.5 The list of GBS‐based GWAS studies in millets.
Chapter 13
Table 13.1 Whole‐genome resequencing studies in pigeon pea.
Table 13.2 List of high‐density genotyping platforms developed for pigeon p...
Table 13.3 High‐density linkage maps generated in pigeon pea by using the r...
Table 13.4 List of significant QTLs identified using GBS and high‐density S...
Chapter 14
Table 14.1 Studies on linkage and QTL mapping in sugarcane through GBS‐base...
Table 14.2 Studies on GWAS for various traits in sugarcane using GBS marker...
Chapter 16
Table 16.1 List of GBS‐based QTL in conducted in oilseed crops.
Table 16.2 List of GBS‐based GWAS studies in oilseed crops.
Chapter 1
Figure 1.1 An example of GBS and GBS data analysis workflow for identificati...
Figure 1.2 Steps in KASP reaction: (a) annealing: allele‐specific primer bin...
Chapter 2
Figure 2.1 A pipeline for SNP discovery (S1–S10 are different diverse access...
Figure 2.2 A schematic representation of different SNP genotyping technologi...
Figure 2.3 Illustration of various steps involved in the generation of RAD‐b...
Figure 2.4 Schematic illustration of work‐flow in a MALDI‐TOF MS
Chapter 3
Figure 3.1 Diagrammatic representation of various high‐throughput‐sequencing...
Figure 3.2 Genome‐wide association studies (GWAS) in rice seedling for salt‐...
Chapter 4
Figure 4.1 Steps involved in bulked segregant analysis (BSA) used for QTL ma...
Figure 4.2 Steps involved in MutMap approach used to map the QTLs of target ...
Figure 4.3 Steps involved in the QTL‐seq approach used to map the QTLs of th...
Figure 4.4 Steps involved in bulked segregant RNA sequencing (BSR‐Seq) used ...
Figure 4.5 Steps involved in Indel sequencing used to map the QTLs of the ta...
Figure 4.6 Schematic representation of different types of biparental mapping...
Figure 4.7 Genotyping of segregating population using KASPar assay.
Figure 4.8 Genotyping of segregating population using Sequenom MassARRAY sys...
Chapter 5
Figure 5.1 Methodology to conduct GWAS in crops. It can be divided into thre...
Figure 5.2 (a) The figure illustrating the traits for which genome‐wide asso...
Chapter 6
Figure 6.1 DNA extraction from seed endosperm. *
Figure 6.2 Seed DNA‐based genotyping‐by‐sequencing using laser microdissecti...
Chapter 7
Figure 7.1 Overview of estimate marker effects in order to get a genomic est...
Figure 7.2 The methodology involved in marker‐assisted section (MAS) and gen...
Chapter 8
Figure 8.1 Evolution of next‐generation sequencing.
Figure 8.2 Sequencing and assembly of DNA.
Figure 8.3 The workflow showing the steps involved in NGS data analysis.
Chapter 10
Figure 10.1 World soybean production and productivity in 2019–2020. (a) Prod...
Figure 10.2 World soybean oil production and soymeal export in the year 2019...
Figure 10.3 Various pipelines and steps are involved in analyzing GBS data. ...
Figure 10.4 Schematic representations of steps involved in association mappi...
Figure 10.5 General steps involved in the GBS protocol for plant breeding....
Figure 10.6 Diagrammatic representation of application of genotyping by sequ...
Chapter 11
Figure 11.2 Name of the nine intra‐specific and four inter‐specific datasets...
Figure 11.1 Integrated genomics and breeding approaches for cotton improveme...
Chapter 12
Figure 12.1 Schematic representation of the important characteristics of mil...
Figure 12.2 Applications of whole‐genome sequence (WGS) of millets.
Figure 12.3 General steps involved in genomic selection.
Chapter 13
Figure 13.1 Depicting the genomics and phenomics for the exploitation of pig...
Chapter 14
Figure 14.1 GBS adapters, PCR and sequencing primers. (a) Sequences of doubl...
Figure 14.2 Steps in GBS library construction. Note: Up to 96 DNA samples ca...
Figure 14.3 Schematic representation of genomic selection processes from tra...
Chapter 15
Figure 15.1 GBS data of seven barley chromosomes 1H–7H showing genetic diver...
Figure 15.2 Illustration to depict that the simplex markers show similar mod...
Figure 15.3 Diagrammatic overview of UGbs‐Flex Pipeline developing GBS refer...
Chapter 16
Figure 16.1 Genomic distribution of single‐nucleotide polymorphism (SNPs) ma...
Cover Page
Title Page
Copyright Page
Dedication
List of Contributors
Preface
Table of Contents
Begin Reading
Index
WILEY END USER LICENSE AGREEMENT
iii
iv
v
xv
xvi
xvii
xviii
xix
xx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
Edited by
Humira Sonah
National Agri‐Food Biotechnology Institute
Punjab, India
Vinod Goyal
CCS Haryana Agriculture University
Hisar, India
S.M. Shivaraj
Laval University
Quebec City, QC, Canada
Rupesh K. Deshmukh
National Agri‐Food Biotechnology Institute
Punjab, India
This edition first published 2022© 2022 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Humira Sonah, Vinod Goyal, S.M. Shivaraj, and Rupesh K. Deshmukh to be identified as the authors of the editorial material in this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyThe contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Sonah, Humira, editor. | Goyal, Vinod, editor. | Shivaraj, S. M., editor. | Deshmukh, Rupesh editor.Title: Genotyping by sequencing for crop improvement / edited by Humira Sonah, Vinod Goyal, S. M. Shivaraj, Rupesh K. Deshmukh.Description: First edition. | Hoboken, NJ, USA : John Wiley & Sons, Inc., [2022] | Includes bibliographical references and index.Identifiers: LCCN 2021046855 (print) | LCCN 2021046856 (ebook) | ISBN 9781119745655 (hardback) | ISBN 9781119745662 (adobe pdf) | ISBN 9781119745679 (epub)Subjects: LCSH: Genetics–Technique. | Gene mapping. | Genomics. | Plant genomes.Classification: LCC QK981.45 .G46 2021 (print) | LCC QK981.45 (ebook) | DDC 572.8/62–dc23/eng/20211001LC record available at https://lccn.loc.gov/2021046855LC ebook record available at https://lccn.loc.gov/2021046856
Cover Design: WileyCover Image: © Billion Photos/Shutterstock
Dedicated to the two most eminent agricultural scientists of Canada whose work in plant genomics and breeding helped in food security and inspired many young scientists worldwide.
Prof. Richard Bélanger Département de phytologie Université Laval, Canada
Prof. François Belzile Département de phytologie Université Laval, Canada
Dr. Humira SonahDr. Vinod GoyalDr. S.M. ShivarajDr. Rupesh K. Deshmukh
Alish Alisha, Department of Gene Expression, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
Gagandeep Singh Bajwa, Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, India
Vitthal T. Barvkar, Department of Botany, Savitribai Phule Pune University, Pune, Maharashtra, India
Shubham Bhardwaj, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
National Institute of Plant Genome Research (NIPGR), New Delhi, India
Dharminder Bhatia, Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, India
Bharat Char, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Viswanathan Chinnusamy, Division of Plant Physiology, ICAR‐IARI, New Delhi, India
Shalu Choudhary, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Rupesh K. Deshmukh, Agricultural Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Vikas Devkar, Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, USA
Pallavi Dhiman, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Kishor Gaikwad, ICAR – National Institute for Biotechnology, New Delhi, India
Naina Garewal, Department of Biotechnology, Panjab University, Chandigarh, India
Dhananjay Narayanrao Gotarkar, International Rice Research Institute, Los Baños, Philippines
Md Aminul Islam, Department of Botany, Majuli College, Majuli, Assam, India
Priyanka Jain, ICAR – National Institute for Biotechnology, New Delhi, India
Riya Joon, Department of Biotechnology, Panjab University, Chandigarh, India
Swapnil B. Kadam, Department of Botany, Savitribai Phule Pune University, Pune, Maharashtra, India
Ravindra Ramrao Kale, ICAR‐Indian Institute of Rice Research, Hyderabad, Telangana, India
Ravneet Kaur, Department of Biotechnology, Panjab University, Chandigarh, India
Suneetha Kota, ICAR‐IIRR, Hyderabad, Telangana, India
Amit Kumar, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Kuldeep Kumar, ICAR – Indian Institute of Pulses Research, Kanpur, Uttar Pradesh, India
Manish Kumar, Department of Seed Science and Technology, Dr. Yashwant Singh Parmar University of Horticulture and Forestry, Solan, Himachal Pradesh, India
Sandeep Kumar, Xcelris Lab Pvt Ltd., Ahmedabad, Gujarat, India
Virender Kumar, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Surbhi Kumawat, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Brij Kishore Kushwaha, Department of Molecular Biology and Genetic Engineering, Bihar Agricultural University, Sabour Bhagalpur, Bihar, India
Omkar Maharudra Limbalkar, ICAR‐Division of Genetics, Indian Agriculture Research Institute, New Delhi, India
Rushil Mandlik, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Venugopal Mikkilineni, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Pankaj S. Mundada, Department of Botany, Savitribai Phule Pune University, Pune, Maharashtra, India
Department of Biotechnology, Yashavantrao Chavan Institute of Science, Satara Maharashtra, India
Narender Negi, Department of Fruit Science, ICAR‐NBPGR Regional Station, Shimla, Himachal Pradesh, India
Anupama A. Pable, Department of Microbiology, Savitribai Phule Pune University, Maharashtra, India
Gunashri Padalkar, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Jayendra Padiya, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Arushi Padiyal, Department of Seed Science and Technology, Dr. Yashwant Singh Parmar University of Horticulture and Forestry, Solan, Himachal Pradesh, India
Brajendra Parmar, ICAR‐IIRR, Hyderabad, Telangana, India
Gunvant B. Patil, Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, USA
Vinaykumar Rachappanavar, Department of Seed Science and Technology, Dr. Yashwant Singh Parmar University of Horticulture and Forestry, Solan, Himachal Pradesh, India
Department of Agriculture, MS Swaminathan School of Agriculture, Shoolini University, Solan, Himachal Pradesh, India
Nitika Rajora, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Santosh Rathod, ICAR‐IIRR, Hyderabad, Telangana, India
Gaurav Raturi, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Rita, ICAR – National Institute for Biotechnology, New Delhi, India
Akshay S. Sakhare, Division of Plant Physiology, ICAR‐IARI, New Delhi, India
Swati Saxena, ICAR – National Institute for Biotechnology, New Delhi, India
Senthilkumar Shanmugavel, Crop Improvement Division, ICAR – Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
Jitender Kumar Sharma, Department of Agriculture, School of Agriculture, Baddi University of Emerging Sciences & Technology, Baddi, Himachal Pradesh, India
Sandhya Sharma, ICAR – National Institute for Biotechnology, New Delhi, India
Shivani Sharma, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Yogesh Sharma, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Prashant Raghunath Shingote, Vasantrao Naik College of Agricultural Biotechnology, Dr. Panjabrao Deshmukh Krishi Vidyapeeth, Akola, Maharashtra, India
Harsha Srivastava, ICAR – National Institute for Biotechnology, New Delhi, India
Anuradha Singh, Department of Genomics, ICAR – National Institute on Plant Biotechnology, New Delhi, India
Kashmir Singh, Department of Biotechnology, Panjab University, Chandigarh, India
Manipal Singh, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Nisha Singh, Department of Genomics, ICAR – National Institute on Plant Biotechnology, New Delhi, India
Avinash Singode, ICAR – Indian Institute of Millets Research, Hyderabad, Telangana, India
Sweta Sinha, Department of Molecular Biology and Genetic Engineering, Bihar Agricultural University, Sabour Bhagalpur, Bihar, India
Sreeja Sudhakaran, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Lakshmipathy Thalambedu, Crop Improvement Division, ICAR – Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
Vandana Thakral, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Prathima P. Thirugnanasambandam, Crop Improvement Division, ICAR – Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
Anshuman Tiwari, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Kishor Tribhuvan, ICAR – Indian Institute of Agricultural Biotechnology, Ranchi, Jharkhand, India
Abhijit Ubale, Mahyco Research Centre, Mahyco Private Limited, Jalna, Maharashtra, India
Sanskriti Vats, Agricultural Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, IndiaRegional Centre for Biotechnology, Faridabad, Haryana (NCR Delhi), India
Joshita Vijayan, ICAR – National Institute for Biotechnology, New Delhi, India
Dhiraj Lalji Wasule, Vasantrao Naik College of Agricultural Biotechnology, Dr. Panjabrao Deshmukh Krishi Vidyapeeth, Akola, Maharashtra, India
Himanshu Yadav, Department of Agriculture Biotechnology, National Agri‐Food Biotechnology Institute (NABI), Mohali, Punjab, India
Recent advances in sequencing technology and computational resources have accelerated genomics and translational research in crop science. The technological advances have provided many opportunities in genomics‐assisted plant breeding to address issues related to food security. Among the several applications, genotyping‐by‐sequencing (GBS) technology has evolved as one of the frontier areas facilitating high‐throughput plant genotyping. The GBS approaches have proved effective for the utilization in genotyping‐based applications like quantitative trait loci (QTL) mapping, genome‐wide association study (GWAS), genomic selection (GS), and marker‐assisted breeding (MAB). Considering the current affairs in plant breeding, we decided to compile the advances in GBS methods, statistical approaches to analyze the GBS data, and its applications including QTL mapping, GWAS, and GS in crop improvement.
Presently, the food produced around the world is adequate for the existing population. However, the constantly increasing population mounting pressure on a food production system. Hence efficient utilization of technological advances and existing knowledge is essential to enhance food production to match the growing food demand. In this direction, most of the countries around the globe have adopted advanced genomic methodologies to breed superior plant genotypes. Among such technological advances, the high‐throughput genotyping using GBS has shown promising results in different crop plants. The GBS has predominantly been used for germplasm evaluation, evolutionary studies, development of dense linkage map, QTL mapping, GWAS, GS, and MAB. The cost‐effectiveness and whole‐genome coverage make GBS more reliable than other next‐generation sequencing (NGS) techniques.
This book describes advanced molecular markers, high‐throughput genotyping platforms, whole‐genome resequencing (WGR), QTL mapping using advanced mapping populations, analytical pipelines for the GBS analysis, advances in GWAS, advances in GS, application of GBS, GWAS, and GS in different crop plants. The different marker types including traditional and advanced markers used in plant genotyping have been presented in great detail. DNA extraction directly from seeds without germination can save time and effort. Several modified and crop‐specific nondestructive seed DNA extraction protocols have been compiled and presented. Many advanced genotyping platforms are now available which cater to specific research purposes because of the differences in terms of reaction chemistry involved, cost, method of signal detection, and flexibility in the protocols. Such advanced platforms along with their principles have been discussed. The WGR methodology and available resources have been covered in detail. The WGR has emerged as a powerful method to identify genetic variation among individuals. The recent advancement in WGR includes pool‐Seq which provides an alternative to individual sequencing and a cost‐effective method for GWAS. Compared to biparental populations the multi‐parental population provides an opportunity to interrogate multiple alleles and to provide an increased level of recombination and mapping resolution of QTLs. The use of such improved populations in the era of high‐throughput genotyping has been presented in one of the chapters. The dedicated section focused on the basic principle of GWAS, the efficiency of different markers, candidate gene identification, meta‐GWAS, and statistical methods involved in GWAS analysis has been included. For genetic mapping, and marker‐assisted selection, rapid and quality DNA isolation is mandatory to accelerate the whole process. A focused section about GS has been included which gives an account of the basic concept, advances, applicability, and challenges of GS. Similarly, a separate chapter is included which discusses the analytical pipelines used for GBS data. Application of technologies such as GBS, GWAS, and GS in different crop categories like cereals, pulses, oilseeds, and commercial crops has been discussed in different chapters.
Here, we have tried to compile basic aspects and recent advances in GBS, GWAS, and GS in plant breeding. We believe that the book will be helpful to researchers and scientists to understand and plan future experiments. This book will enable plant scientists to explore GBS application more efficiently for basic research as well as applied aspects in various crops improvement projects.
EditorsDr. Humira SonahDr. Vinod GoyalDr. S. M. ShivarajDr. R. K. Deshmukh
Dharminder Bhatia and Gagandeep Singh Bajwa
Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, India
Plant selection and systematic breeding efforts led to the development of present‐day improved cultivars of crop plants. From a historical perspective, increased crop yield is the result of genetic improvement (Fehr 1984). Markers play an important role in the selection of traits of interest. Markers can be morphological, biochemical, or molecular in nature. Morphological markers are visual phenotypic characters such as growth habit of the plant, seed shape, seed color, flower color etc. Biochemical markers are the isozyme‐based markers characterized by variation in molecular form of enzyme showing a difference in mobility on an electrophoresis gel. Very few morphological and biochemical markers are available in plants, and they are influenced by developmental stage and environmental factors. Since a large number of economically important traits are quantitative in nature, which are affected by both genetic and environmental factors, the morphological and biochemical markers‐based selection of traits may not be much reliable. The subsequent discovery of abundantly available DNA‐based markers made possible the selection of almost any trait of interest. DNA‐based markers are not affected by the environment. Besides, these markers are highly reproducible across labs and show high polymorphism to distinguish between two genetically different individuals or species.
In the last four decades, DNA‐based molecular marker technology has witnessed several advances from low throughput hybridization‐based markers to high‐throughput sequencing‐based markers. These advances have been possible due to critical discoveries such as polymerase chain reaction (PCR) (Mullis et al. 1986), Sanger sequencing method (Sanger et al. 1977), automation of Sanger sequencing (Shendure et al. 2011), next‐generation sequencing (NGS) technologies (Mardis 2008), and development of bioinformatics tools. This chapter will briefly discuss different types of molecular markers while particularly focusing on recent developments in molecular marker technologies. These developments have expedited the mapping and cloning of several loci governing important traits, precise trait selection, and transfer into elite germplasm.
DNA or molecular marker is a fragment of the DNA that is associated with a particular trait in an individual. These molecular markers aid in determining the location of genes that control key traits.
Generally, molecular markers do not represent the gene of interest but act as “flags” or “signs.” Similar to genes, all the molecular markers occupy a specific position within the chromosomes. Molecular markers located close to genes (i.e. tightly linked) are referred to as “gene tags.”
DNA‐based molecular markers are the most widely used markers predominantly due to their abundance. They arise from different classes of DNA mutations such as substitution mutations (point mutations), rearrangements (insertions or deletions), or errors in replication of tandemly repeated DNA. These markers are selectively neutral because they are usually located in noncoding regions of DNA. Unlike morphological and biochemical markers, DNA markers are practically unlimited in number and are not affected by environmental factors and/or the developmental stage of the plant.
DNA markers show genetic differences that can be visualized by using a gel electrophoresis technique and staining ethidium bromide or hybridization with radioactive or colorimetric probes. Markers that can identify the difference between two individuals are referred to as polymorphic markers, whereas those that do not distinguish the individuals are called monomorphic markers. Based on how polymorphic markers can discriminate between individuals, they are described as codominant or dominant. Codominant markers indicate differences in size whereas dominant markers reveal differences based on their presence or absence. The different forms of a DNA marker in the form of band size on gels are known as marker “alleles.” Dominant marker has only two alleles whereas codominant markers may have many alleles.
Based on the method of their detection, DNA markers are broadly classified into three groups: (i) hybridization‐based, (ii) PCR‐based, and (iii) DNA sequence‐based molecular markers. Molecular markers have been discussed earlier in several reviews (Collard et al. 2005; Semagn et al. 2006; Gupta and Rustgi 2004) and book chapters (Mir et al. 2013; Singh and Singh 2015), which readers can also consult for more details. However, a brief description of each of these markers has been presented below.
These are the first molecular markers used by Grodzicker et al. (1975) in adenovirus and Botstein et al. (1980) in human genome mapping. These were first used in plants by Helentjaris et al. (1986). In this type of marker, polymorphism is detected by cutting DNA into fragments by the use of restriction enzymes followed by hybridization of radioactively labeled DNA probes which are single or low copy DNA fragments and visualized by autoradiography. DNA probes could be genomic clones, cDNA clones, or even cloned genes. The RFLP markers show co‐dominance and are highly reliable in linkage analysis and breeding (Semagn et al. 2006). However, this technique requires a large quantity of DNA, labor‐intensive, relatively expensive, and hazardous. RFLP shows polymorphism in two different species if they differ due to point mutations, insertion/deletion, inversion, translocation, and duplication.
This is a high‐throughput DNA polymorphism analysis method which combines microarray and restriction‐based PCR methods. It is similar to AFLP where hybridization is used for the detection of polymorphism. It can able to provide a comprehensive genome coverage even in those organisms not having genome sequence information (Jaccoud et al. 2001). Diversity array technology (DArT) is a solid‐state open platform method for analyzing DNA polymorphism. DArT procedure includes (i) Generating a diversity panel and (ii) Genotyping using a diversity panel. The diversity panel is generated using a set of lines representing the breadth of variability in germplasm (~10 lines). An equal quantity of DNA from each representative line is pooled followed by restriction with two to three restriction endonucleases (REs) and ligation of RE‐specific adaptors. Later DNA fragments are amplified using adaptor complementary primers. The representation fragments are ligated to vector and transformed into Escherichia coli cells. The transformed cells with recombinant DNA are selected and amplified using M13 forward and reverse primer. The amplified DNA is isolated and purified. The purified DNA is coated onto polylysine‐coated glass slides to generate a diversity array.
For genotyping, the representation fragments of the target genotypes are prepared in the same as in the diversity panel. The DNA fragments are column purified and fluorescently labeled with two different dyes (Cy3 or Cy5). The labeled DNA fragments are used for hybridization onto the diversity array. Two representative panels – one labeled with Cy3 and another with Cy5 – can be hybridized simultaneously and hybridization signal intensities are measured for each spot. DArT, thus detects DNA polymorphism at several hundred genomic loci in a single array without relying on sequence information.
Simple‐sequence repeats (SSRs) (Litt and Luty 1989) are also known as microsatellites or short tandem repeats (STRs) or simple sequence length polymorphism (SSLP). These are widely used markers and are also referred to as the mother of all the markers. These are STRs, generally of one to eight nucleotide length. These are found dispersed throughout the genome and are hypervariable. These repeat regions are flanked with unique sequences that are highly conserved. The flanking unique sequences are used to design complementary primers which can be assayed with PCR. SSRs are highly polymorphic and codominant markers. These show polymorphism as a result of the variable number of repeat units. Before the era of genome sequencing, it was difficult to develop SSRs due to the extensive cost and labor involved in the identification of repeat regions and flanking unique sequences. However, with the availability of genome sequences of several organisms, the development of SSR has become very easy which involves in silico identification of STRs, designing of SSR from flanking unique sequences, and validation through experimentation. SSR markers have shown immense application in population genetic analysis, gene mapping, and cloning due to their abundance in the genome and high polymorphism, and very high reproducibility across labs. SSR‐based linkage maps have been developed in several important crop plants such as rice (Temnykh et al. 2000; McCouch et al. 2002; Orjuela et al. 2010), wheat (Roder et al. 1998), maize (Sharopova et al. 2002), potato (Milbourne et al. 1998), etc.
Sequence‐tagged sites (STSs) were first developed for physical mapping of the human genome by Olsen et al. (1989). STS is the short unique sequences developed from polymorphic RFLP probe or AFLP fragment which is linked to desirable traits. The RFLP probes or AFLP fragments showing polymorphism are end‐sequenced and primers are designed to specifically amplify these fragments. STS markers are co‐dominant and highly reproducible. For example, STS markers have been developed for RFLP markers linked with bacterial blight resistance genes xa5, xa13, and Xa21 (Huang et al. 1997). One major limitation of these types of markers is the reduced polymorphism than the corresponding RFLP probe.
Williams et al. (1990) first developed these markers to amplify DNA without prior sequence information. In this type of marker, the arbitrary decamer sequences are used as primers at low annealing temperatures for DNA amplification. These markers are referred to as dominant markers because the polymorphism is determined based on the presence or absence of a particular amplified fragment. Polymorphism may also be due to varying brightness of bands at a particular locus due to copy number differences. These markers have been used for constructing linkage maps in several species (Hunt 1997; Laucou et al. 1998) and also for tagging genes of economic importance. However, due to the dominant nature, these may not be appropriate for genetic mapping and marker‐assisted selection (MAS). One major limitation of these markers is the lack of repeatability in certain cases. Variations of RAPD include AP‐PCR (arbitrarily primed PCR) and DAF (DNA amplification fingerprinting (Table 1.1).
Table 1.1 Details of the other important molecular markers.
Marker
Description
Variable number tandem repeat (VNTR) or minisatellites
A short DNA sequence (10–100 bp) is present as tandem repeats and is a highly variable copy number
DNA amplification fingerprinting (DAF)
A variation of RAPD, where 4–5 bp single and arbitrary primer is used to detect polymorphism
Arbitrary‐primed PCR (AP‐PCR)
A variation of RAPD, where 18–32 bp long single and arbitrary primer is used to detect polymorphism
Inter‐simple sequence repeat (ISSR)
Primers are designed based on the repeat region of microsatellites. These primers are used to amplify the region between two microsatellites. The stretches of unique DNA in between or flanking the SSRs are amplified. A single SSR‐based primer is used to prime PCR
Selective amplification of microsatellite polymorphic loci (SAMPL)
A modification of ISSR, where SSR‐based primer is used along with AFLP primer. The template is identical to the AFLP template and the rare cutter primer is replaced by SSR‐based primer
Cleaved amplified polymorphic sequences (CAPS)
These markers are also called PCR‐RFLP, where amplified PCR product is digested with endonucleases to reveal polymorphism. These are used when PCR product does not show polymorphism and restriction enzyme site present in amplified PCR product may detect polymorphism
Derived cleaved amplified polymorphic sequences (dCAPS)
A variation of CAPS, where a primer containing one or more mismatches to template DNA is used to create a restriction enzyme recognition site in one allele but not in another due to the presence of SNP. Thus, obtained PCR product is subjected to restriction enzyme digestion to find the presence or absence of the SNP
Single‐strand conformational polymorphism (SSCP)
DNA fragments of size ranging from 200 to 800 bp were amplified by PCR using specific primers (20–25 bp), followed by gel‐electrophoresis of single‐strand DNA to detect nucleotide sequence variation. The method is based on a principle that the secondary structure of single‐strand DNA molecule changes significantly if it harbors mutation. This method detects nucleotide variation without sequencing a DNA sample
Denaturing/temperature gradient gel electrophoresis (DGGE, TGGE)
These methods reveal polymorphism due to differential movement of the same genomic double‐stranded region with different base‐pair composition. As an example, the AT‐rich region would have a lower melting temperature than the GC‐rich region
Target region amplification polymorphism (TRAP)
This method employs primers designed from the EST database for detecting polymorphism around a selected candidate gene. This includes two primers of 18 bp, one of which is designed from targeted EST and the other is an arbitrary primer
These markers overcome the limitation of RAPDs. In this case, the RAPD fragments that are linked to a gene of interest are cloned and sequenced. Based on the terminal sequences, longer primers (20 mer) are designed. These SCAR primers more specifically amplify a particular locus. These are similar to STS markers in design and application. The presence or absence of the band indicates variation in sequences. The SCAR markers thus are dominant in nature. These, however, can be converted to codominant markers in certain cases by digesting the amplified fragment with tetranucleotide recognizing restriction enzymes. There are several examples where the RAPD markers linked to the gene of importance have been converted to SCAR markers (Joshi et al. 1999; Liu et al. 1999; Kasai et al. 2000; Akkurt et al. 2007; Chao et al. 2018).
This marker technique was developed by Vos et al. (1995) and is patented by Keygene (www.keygene.com). In this technique, DNA is cut into fragments by a combination of restriction enzymes which are frequent (four bases) and rare (six bases) cutters that generate restriction overhangs on both sides of fragments. This is followed by the annealing of double‐stranded oligonucleotide adapters of a few oligonucleotide bases with respective restriction overhangs. The oligonucleotide adapters are designed in such a way that the original restriction sites are not reinstated and also provide the PCR amplification sites. The fragments are PCR amplified and visualized on agarose gel. This method produces many restriction fragments enabling the polymorphism detection. The number of amplified DNA fragments can be controlled by selecting different number or composition of bases in the adapters. The stringent reaction conditions used for primer annealing make this technique more reliable. This method is a combination of both RFLP and PCR techniques and is extremely useful in the detection of polymorphism between closely related genotypes. Like RAPD, AFLP is a dominant marker and is not preferred for genetic mapping studies and MAS. AFLP maps have been constructed in several species and integrated into already existing RFLP maps e.g. tomato (Haanstra et al. 1999), rice (Cho et al. 1997), and wheat (Lotti et al. 2000).
These markers are developed by end sequencing (generally 200–300 bp) of random cDNA clones. The sequence thus obtained is referred to as expressed sequence tags (ESTs). A large number of ESTs have been synthesized in several crop plants and are available in the EST database at NCBI (https://www.ncbi.nlm. nih.gov/dbEST/). These markers were originally developed to identify gene transcripts and have played important role in the identification of several genes and the development of markers such as RFLP, SSR, SNPs, CAPS, etc. (Semagn et al. 2006). However, EST‐based SSRs show less polymorphism as compared to genomic DNA‐based SSRs. Since EST markers are from expressed sequence regions, these are highly conserved among the species and can be used for synteny mapping. Most of these could also be functional genes. A large number of EST markers have been used in rice for developing a high‐density linkage map (Harushima et al. 1998) and for chromosome bin mapping in wheat using deletion stocks (Qi et al. 2003). In addition to these, several other molecular marker variants have been developed. The description of those markers is presented in Table 1.1.
Single‐nucleotide polymorphisms (SNPs) are more abundant resulted from single‐base pair variations. These are evenly distributed in a whole genome that can tag almost any gene or locus of a genome (Brookes 1999). However, the distribution of SNPs varies among species with 1 SNP per 60–120 bp in maize (Ching et al. 2002) and 1 SNP per 1000 bp in humans (Sachidanandam et al. 2001). SNPs are more prevalent in the noncoding region. In the coding region, SNPs could be synonymous or nonsynonymous. In synonymous SNPs, there is no change in the amino acid resulting in no phenotypic differences. However, phenotypic differences could be produced due to modified mRNA splicing (Richard and Beckman 1995). In nonsynonymous SNPs, change in amino acid results in phenotypic differences. SNPs are mostly bi‐allelic and cause polymorphism due to nucleotide base substitution. The two types of nucleotide base substitutions result in SNPs. A transition substitution occurs between purines (A, G) or between pyrimidines (C, T). This type of substitution constitutes two‐thirds of all SNPs. A transversion substitution occurs between a purine and pyrimidine. SNPs can be detected by the alignment of the similar genomic region of two different species. The SNPs have only two alleles compared to typical multiallele SSLP; however, this disadvantage can be compensated by using the high density of SNPs.
Initially, identification of SNP markers was laborious and expensive and involved allele‐specific sequencing (Ganal et al. 2009). This includes sequencing of unigene‐derived amplicons using Sanger’s method from two or more than two lines. In an experiment, about 350 bp of the RFLP clone, A‐519 was end sequenced in soybean and the flanking amplification primers were designed (Coryell et al. 1999). Primers were used to screen for allele diversity using PCR from ten genotypes and the amplicons were sequenced followed by sequence comparison to identify SNP. SNPs were also identified through mining a large number of EST sequences in EST databases, which are generated through improved sequencing technologies (Soleimani et al. 2003). These SNPs are further validated using PCR (Batley et al. 2003). These approaches allowed the identification of mainly gene‐based SNPs, but their frequency is generally low. Additionally, SNPs located in low‐copy noncoding regions and intergenic spaces could not be identified.
Several assays have been developed for genotyping based on identified SNPs which include, allele‐specific hybridization, primer extension, oligonucleotide ligation, and invasive cleavage (Sobrino et al. 2005). Besides, DNA chips, allele‐specific PCR, and primer extension were also attractive options since these are suitable for automation and can be used for the development of dense genetic maps. Allele‐specific hybridization was used for the identification of polymorphism in 570 genotypes of soybean (Coryell et al. 1999).
The improvement of Sanger sequencing technology in the 1990s combined with the beginning of EST and genome sequencing projects in model plants led to the spurt in the identification of variation at the single‐base resolution (Wang et al. 1998). From 2005 onward, the emergence of NGS platforms such as Roche 454, Illumina HiSeq2500, ABI 5500xl SOLiD, Ion Torrent, PacBio RS, Oxford Nanopore, and advances in bioinformatics tools simplified the process of identification of genome‐wide SNPs and changed the face of molecular marker technology. NGS‐based genotyping platforms such as genotyping‐by‐sequencing (GBS), whole‐genome resequencing (WGR), and high‐density SNP arrays helped to type thousands of SNPs in a single reaction in hundreds of individuals.
GBS is an NGS‐based reduced representation sequencing technique for the identification of genome‐wide SNPs and genotyping large populations (Bhatia et al. 2013). GBS is a one‐step approach for the identification and utilization of markers in a single reaction. It is a complexity reduction procedure where a combination of restriction enzymes is used to separate low copy sequences from high copy repetitive regions. In general, GBS involves the sequencing of fragments generated through restriction digestion of the genome on the NGS platform. In this process, the DNA of the population is digested with RE followed by ligation of RE‐specific adaptors containing genotype‐specific barcode sequences and sites for binding PCR and sequencing primers (Figure 1.1). The fragments thus generated can be PCR amplified and an equal volume of PCR product from different individuals are pooled in a tube. The fragments in the pool can be selected based on their size and sequenced on the NGS platform. The choice of restriction enzymes depends upon the complexity and size of the genome. Presently, different versions of GBS are available, which includes RAD‐seq (restriction associated DNA sequencing), ddRAD‐seq (double‐digest restriction associated sequencing), SLAF‐seq (specific‐locus amplified fragment sequencing), Rest‐seq (restriction DNA sequencing), Skim GBS (skim‐based GBS) (Bhatia 2020). These versions differ with respect to fragment size selection, the extent of complexity reduction, and genome coverage. Since GBS is a population‐dependent genotyping method, to make it cost‐effective a low‐depth sequencing is adopted which caused a high rate of missing data. The low‐depth sequencing makes it an ineffective genotyping approach in heterozygous populations. GBS has low genome coverage due to reduced representation sequencing.
Figure 1.1 An example of GBS and GBS data analysis workflow for identification of SNP markers.
GBS is being widely used to capture SNPs and other marker variations by NGS. GBS overtook the conventional genotyping procedures involving the use of traditional markers such as RAPD, AFLP, SSR, and many others in terms of time, labor, and cost involved. As an example, GBS can generate data of thousands of markers in a large population in a week, which can be analyzed in a month (Bhatia et al. 2018). The approach has been utilized in the mapping of several economically important traits in a number of crop plants (Poland and Rife 2012). Most of the developing countries have in‐house computational facilities that are being used for GBS analysis. Few online servers are also available, where GBS analysis can be done using in‐built pipelines such as cyverse (www.cyverse.org); however, these are unable to analyze the large dataset. Further speed of analysis depends upon the internet speed. Alignment of NGS‐based reads and calling SNPs and Indels are the two major steps in GBS analysis, for which several pipelines are available publically such as Stacks, IGST, GB‐eaSY, TASSEL‐GBS, FAST‐GBS, UNEAK, etc. (Wickland et al. 2017).
Another important pipeline widely used for NGS data analysis is dDocent pipeline (www.dDocent.com) which is a simple bash wrapper to quality analysis, assemble, map, and call SNPs from almost any kind of RAD sequencing (Puritz et al. 2014). However, most of these pipelines are hard to code for a student with little bioinformatics background. Most of these pipelines vary with respect to the complexity of the genome and computational space required. Besides there are several bioinformatics tools such as BWA, Bowtie2, SAM tools, GATK, BCFtools including a set of Perl utility scripts (Kagale et al. 2016) that can be used for GBS data analysis. However, there should be knowledge of the installation and usage of these tools for proper utilization in data analysis. With the advancements in NGS approaches, GBS has become a widely used approach in plant breeding and genetics, particularly for understanding complex quantitative traits.
DArT‐seq GBS (https://www.diversityarrays.com/technology‐and‐resources/dartseq/) somehow overcomes the limitation of the missing data point. The technique is an extension of traditional DArT technology where DArT representations are sequenced on the NGS platform. The fragment sequencing enables a dramatic increase in the number of genomic fragments analyzed and an increase in the number of reported markers thus making it a cost‐effective technology than the initial DArT method.
WGR with high coverage and depth overcomes the limitations of GBS due to missing data points and heterozygous calls. In general, WGR involves the sequencing of enough DNA fragments (>5×–20×) to cover the whole genome of an organism. Due to sequencing cost, the technique is suitable in crop plants having smaller genome sizes such as rice. In such cases, GBS can be replaced by resequencing of a larger size population at 5–6× depth. However, WGR for few samples can be done at a much higher read depth of 10–20× as in the case of the BSA‐seq approach (Nguyen et al. 2019). One of the important BSA‐seq‐based approaches is quantitative trait loci (QTL)‐seq developed by Takagi et al. (2015) in rice. Later this technique has been widely used in several crop plants. Takagi et al. (2015) developed a pipeline for analysis of the whole genome sequence of bulks and identification of causative variants. WGR has been used in several studies for identification of genome‐wide SNPs, genotyping mapping populations for construction of high‐density linkage maps and QTL mapping, linkage and genome‐wide association studies (GWASs), of reference genome improvement, and genomic selection (Poland and Rife 2012; Bhatia et al. 2013; Chung et al. 2017; Nguyen et al. 2019).
Along with GBS, high‐density DNA array‐based SNP chips or SNP arrays have become a widely used SNP detection platform for high multiplex genotyping. SNP arrays work by hybridization of DNA fragments with allele‐specific oligonucleotide probes (SNP probes) and fluorescence‐based detection of signals. In general, SNP arrays can be roughly categorized into two types based on SNP detection methods: (i) nonenzymatic differential hybridization including allele‐specific hybridization, (ii) enzymatic reactions including primer extension, and mini‐sequencing (Ding and Jin 2009). For making SNP arrays, the first step is the identification of genome‐wide SNPs by sequencing (preferably WGR) of a large diverse panel. The SNPs arrays may include SNPs from coding (genic) regions only and/or genome‐wide SNPs from other noncoding regions. SNPs are in silico validated with several custom tools and final filtered SNPs are identified. The oligonucleotide probes containing SNP alleles are designed and bound on a solid glass plate surface. SNP chips can be custom designed commercially from two widely used platforms: Affymetrics (www.affymetrics.com) as Axiom Affymetrics SNP Chips (Affymetrix/Thermo Fisher Axiom®) or Illumina (https://www.illumina.com/science/technology/microarray.html) as Immunia Infinium assay (Illumina Infinium®). Affymetrics SNP array relies on differential hybridization due to different melting temperatures for matched and mismatched SNPs binding to target DNA sequence. On the other hand, Illumina Infinium assay uses Illumina BeadArray technology that relies on primer extension to distinguish two SNP alleles. The Affymetrix SNP array uses 25‐mer for SNP calling while the Illumina BeadArray uses 50‐mer for target capture. In rice, a high‐resolution 44K Affymetrix array, 50K Infinium array, and 700K high‐density rice array are available for rice SNP genotyping (McCouch et al. 2010; Tung et al. 2010; Chen et al. 2013; McCouch et al. 2015). Additionally, high‐density SNP arrays have been developed for other crop plants such as maize (Ganal et al. 2011) and sunflower (Bachlava et al. 2012) as well as domestic animal species, including cattle (Gibbs et al. 2009; Matukumalli et al. 2009) and pig (Ramos et al. 2009). One major advantage of SNP arrays is the reproducibility of data points where GBS does have some shortcomings. However, the disadvantage is the less polymorphism as compared to GBS and WGR and detection of only alleles present in the array (Table 1.2).
Table 1.2 Comparison between different marker techniques commonly used in plant research.
SSR
GBS
WGR
SNP array
KASP™
DNA quality
Moderate
High
High
High
High
PCR‐based
Yes
Yes
No
No
No
Allele detection
High
High
High
Low
Low
Polymorphism
High
High
High
Low
Low
Ease to use
Easy
Not easy
Not easy
Easy
Easy
Reproducibility
High
Low
High
High
High
Cost
Moderate
Low to moderate
High
High
moderate
Cost for analysis
High
High
High
Low
Low
Suitability for different approaches
Genetic diversity analysis
High
Moderate
High (cost concerns)
High
High
Bi‐parental QTL mapping
High
High
High
High
High
Genome wide association analysis
Moderate
High
High
High
Low
Genomic selection
Low
Moderate
High (cost concerns)
High
Low
KASP™ is a trademark technology of KBiosciences (http://www.kbioscience.co.uk/) or LGC genomics (http://www.lgcgenomics.com
