114,99 €
Molecular Analysis and Genome Discovery, Second Edition is a completely revised and updated new edition of this successful book. The text provides a comprehensive overview of recent developments in the fast moving field of molecular based diagnostics of disease markers. Key concepts and applications are provided alongside practical information on current techniques currently being researched and developed. Each chapter offers an up-to-date analysis of the subject encompassing the very latest technology platforms and is an essential reference for researchers in the field looking for an up-to-date overview of the subject. The book will also be an indispensable resource for those working in the biotechnology and pharmaceutical industries. New for this edition: chapters on Genotyping through Mutation Detection; Differential Gene Expression; Haplotyping and Molecular Profiling.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 593
Veröffentlichungsjahr: 2011
Table of Contents
Title Page
Copyright
Preface
Contributors
Chapter 1: Overview of Genotyping
Introduction
Methods for interrogating SNPs
Commercial platforms for SNP genotyping
Practical recommendations
SNP databases
Methylation analysis
Copy number variation analysis
Second generation sequencing technologies
Conclusions
References
Chapter 2: DNA Chip Analysis in Genome Discovery
Introduction
Interrogating a genome
Cross-species hybridization
Comparative genomic hybridization and microarray-based genotyping
Barcodes, DNA microarrays and organism identification
Concluding remarks
References
Chapter 3: qPCR, Theory, Reliability and Use in Molecular Analysis
Sample preparation
RNA quality
Reagents
Assay design
Transparency of published data
Further considerations
Conclusion
References
Chapter 4: DNA Analysis in Droplet-Based Microfluidic Devices
Introduction
Continuous-flow microPCR chips
PCR inhibition and carryover contamination
PCR in droplets
Conclusions
References
Chapter 5: High-Resolution Melt Profiling
Introduction
Basic concepts of melt profiling
HRMP and polymerase chain reaction
DNA specimens and HRMP
Determining a temperature window for melting
Dyes and platforms for melt profiling
Scanning PCR products for sequence variation
Genotyping with high-resolution melt profiling
Other applications of HRMP
Final notes
References
Chapter 6: Massively Parallel Sequencing
Sanger sequencing
Massively parallel sequencing
Commercially available massively parallel sequencers
Future technologies
Paired-end or mate-paired reads
Target-enrichment strategies for MPS
Applications of MPS
Summary
References
Chapter 7: Aptamers for Analysis: Nucleic Acids Ligands in the Post-Genomic Era
Introduction
SELEX
Aptamers in analysis
Imaging with aptamers
Conclusions, outlooks and perspectives
References
Chapter 8: Use of Nanotechnology for Enhancing of Cancer Biomarker Discovery and Analysis: A Molecular Approach
Introduction
Proteomics and nanotechnology
Nanoscale multicomponent separation
Nanoscale protein detection strategies
Surface-enhanced Raman scattering (SERS)
References
Chapter 9: Chip-Based Proteomics
Introduction
Lab-on-a-chip
Arrays
Chip-based mass spectrometry
Surface plasmon resonance (SPR) and quartz crystal microbalance (QCM) chip instruments
Microfluidics
Conclusion
References
Chapter 10: Antibody Microarrays in Proteome Profiling
Introduction
Technical aspects
Antibody array applications
Summary
Acknowledgements
References
Chapter 11: Biomarker Detection and Molecular Profiling by Multiplex Microbead Suspension Array Based Immunoproteomics
Introduction
Principles of microbead-based multiplexing
Experimental aspects of the multiplex microbead assay
Multiplex microbead assay design and comparison with other methods
Applications of the multiplex microbead assay system for biomedical research and clinical studies
Selected investigational fields for multiplex analysis and examples of applications
Challenges and current limitations
Summary and future directions
Acknowledgements
References
Chapter 12: Mass Spectrometry in Metabolomics
Introduction
Sample collection and preparation
Data acquisition
Data analysis
Applications
Conclusion
Acknowledgement
References
Index
This edition first published 2012 © 2012 by John Wiley & Sons
Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical and Medical business with Blackwell Publishing.
Registered office:
John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial offices:
9600 Garsington Road, Oxford, OX4 2DQ, UK
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
111 River Street, Hoboken, NJ 07030-5774, USA
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell.
The rights of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloguing-in-Publication Data
Molecular analysis and genome discovery / edited by Ralph Rapley and Stuart Harbron. 2nd ed.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-0-470-75877-9 (cloth)
1. Molecular diagnosis. 2. Genomics. 3. Proteomics. 4. Pharmacogenomics. 5. Polymerase chain reaction. 6. DNA microarrays. I. Rapley, Ralph. II. Harbron, Stuart.
[DNLM: 1. Genomics methods. 2. Drug Design. 3. Genetic Techniques. 4. Metabolomics methods. 5. Proteomics methods. QU 58.5]
RB43.7.M595 2011
615'.7 dc23
A catalogue record for this book is available from the British Library.
This book is published in the following electronic formats: ePDF 9781119977445; Wiley Online Library 9781119977438; ePub 9781119978442; Mobi 9781119978459
Preface
In our preface to the first edition of Molecular Analysis and Genome Discovery we indicated that the face of diagnostics and drug discovery had changed beyond recognition over the past decade. With the publication of this second edition this statement is even more apposite. There have been numerous advances in the technology and in the discovery of biological systems yielding new areas of analysis such as transcriptomics and metabolomics. There can be no doubt that these continued advances will lead to the ultimate goal of the development and use of personalized and stratified medicines.
This book aims to build upon the discovery and analysis aspects of the first edition by detailing the way in which techniques have been further developed or new methods implemented in the areas of molecular analysis and genome discovery. Following an updated overview of the important areas of genotyping, there are a number of chapters dealing with the methods of DNA analysis. These include the further use of DNA chips and qPCR, two mainstays of the area. Further analysis methods are presented including the use of microfluidic devices, high resolution melt profiling and the ability to analyse DNA on a large scale with parallel sequencing systems. Analysis of nucleic acids using aptamers has also been revisited and updated, providing further exciting analytical approaches for the post-human genome era. A chapter on nanotechnology in cancer biomarker discovery essentially bridges the nucleic acid analysis and discovery aspects and leads into chapters that are more orientated to proteins. Indeed, the emergence of nanotechnology has been spectacular, typifying our opening statement. The advancement of quantum dots, carbon nanotubes and nanoengineering presented in this chapter is a facet which thirty years ago would have been in the realms of science fiction. Chip analysis follows on from the perspective of protein analysis and discovery, after which antibody arrays in proteome profiling and multiplex microbead suspension array based immunoproteomics are addressed. The application of mass spectrometry as applied to metabolomics is detailed in the final chapter.
In compiling this second edition of Molecular Analysis and Genome Discovery we have sought again to combine both current and emerging approaches to the analysis of genomes and proteomes. This has been undertaken with an eye on how they may be of benefit for areas such as drug and biomarker discovery. We are again indebted to the panel of expert and distinguished authors who have provided vital insights into these important and exciting areas.
Ralph Rapley
Stuart Harbron
Contributors
Ahmed, Farid E.
GEM Tox Consultants & Labs, Inc., Greenville, NC 27834, USA
Alhamdani, Mohamed Saiel Saeed
Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
Bailes, Julian
School of Biological Sciences, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
Bayés, Mònica
Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028 Barcelona, Spain
Bustin, Stephen A.
Academic Surgical Unit, 3rd Floor Alexandra Wing, Royal London Hospital, Whitechapel, London E1 1BB, UK
Dobrowolski, Steven F.
Department of Pathology, University of Utah, School of Medicine, Salt Lake City, Utah, USA
Friedman, Jan M.
Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, V6H 3N1 Canada and Child & Family Research Institute, Vancouver, British Columbia, V5Z 4H4 Canada
Griffiths, William J.
Institute of Mass Spectrometry, School of Medicine, Room 352 Grove Building, Swansea University, Singleton Park, Swansea SA2 8PP, Wales, UK
Gut, Ivo Glynne
Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028 Barcelona, Spain
Hoheisel, Jörg D.
Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
Khan, Imran H.
Center for Comparative Medicine, and Department of Pathology and Laboratory Medicine, University of California, Davis CA 95616, USA
Krishhan, V. V.
Department of Chemistry, California State University, Fresno CA 93740 and Center for Comparative Medicine, and Department of Pathology and Laboratory Medicine, University of California, Davis, CA 95616, USA
Luciw, Paul A.
Center for Comparative Medicine, Department of Pathology and Laboratory Medicine, and California National Primate Research Center, University of California, Davis, CA 95616, USA
Marra, Marco
Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, V6H 3N1 Canada and BC Cancer Agency Genome Sciences Centre, Vancouver, British Columbia, V5Z 4S6 Canada
Milnthorpe, Andrew
School of Biological Sciences, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
Murphy, Jamie
Academic Surgical Unit, 3rd Floor Alexandra Wing, Royal London Hospital, Whitechapel, London E1 1BB, UK
Nadal, Pedro
Department d'Enginyeria Quimica, Universitat Rovira i Virgili, Avinguda Països Catalans 26, 43007 Tarragona, Spain
Nazar, Ross N.
Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada N1G 2W1
O'Sullivan, Ciara K.
Department d'Enginyeria Quimica, Universitat Rovira i Virgili, Avinguda Països Catalans 26, 43007 Tarragona, Spain and Institució Catalana de Recerca i Estudis Avançats, Passeig Lluís Companys, 23, 08010 Barcelona, Spain
Ozdemir, Pinar
Department of Mechanical Engineering, University of Strathclyde, Glasgow, G1 1XJ, UK
Pinto, Alessandro
Department d'Enginyeria Quimica, Universitat Rovira i Virgili, Avinguda Països Catalans 26, 43007 Tarragona, Spain
Robb, Jane
Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada N1G 2W1
Svobodova, Marketa
Department d'Enginyeria Quimica, Universitat Rovira i Virgili, Avinguda Països Catalans 26, 43007 Tarragona, Spain
Smieszek, Sandra
School of Biological Sciences, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
Soloviev, Mikhail
School of Biological Sciences, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
Tucker, Tracy
Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, V6H 3N1 Canada
Wang, Yuqin
Institute of Mass Spectrometry, School of Medicine, Room 352 Grove Building, Swansea University, Singleton Park, Swansea SA2 8PP, Wales, UK
Wittwer, Carl T.
Chapter 1
Overview of Genotyping
Mónica Bayés and Ivo Glynne Gut
Introduction
Several types of variants exist in the human genome: single nucleotide polymorphisms (SNPs), short tandem repeats (STRs) also called microsatellites, small insertions or deletions (InDels), copy number variants (CNVs) and other structural variants (SVs) (Figure 1.1). SNPs are changes in a single base at a specific position in the genome, in most cases with only two alleles (Brookes 1999). By definition the rarer allele should be more abundant than 1% in the general population otherwise referred to as mutations. SNPs are found at a frequency of about one every 100–300 bases in the human genome. Since the completion of the Human Genome Project (HGP) (International Human Genome Sequencing Consortium 2004), SNPs have been discovered at an unprecedented rate and currently there are more than 24 million human reference SNP (rs) entries in the most extensive SNP database (dbSNP Build 132, www.ncbi.nlm.nih.gov/projects/SNP/). SNPs, however, are not randomly distributed across the genome and occur much less frequently in coding sequences than in noncoding regions. SNPs located in regulatory or protein coding regions are more likely to alter the biological function of a gene than those in intergenic regions.
Figure 1.1 Types of genetic variants. Each arrow represents a DNA segment of more than 1 kb
Genotyping is the process of assignment of different variants in an otherwise conserved DNA region. The relative simplicity of methods for SNP genotyping, the abundance of SNPs in the human genome and their low mutation rates have made them very popular in the past decade. SNP genotyping has currently many applications: disease gene localization and identification of disease-causing variants, quantitative trait loci (QTL) mapping, pharmacogenetics, identity testing based on genetic fingerprinting, just to mention the major ones. Genotyping applications extend beyond human genetics to animals and plants.
Although some SNP alleles confer susceptibility to complex disorders (asthma, cardiovascular disease, diabetes, etc.), most SNPs are not solely responsible for a disease state. Instead, they serve as biological markers for identifying disease-related variants on the human genome map, based on the fact that alleles of SNPs that are located nearby tend to be inherited together (Jorde 1995). This is termed linkage disequilibrium. For disease gene identification two basic strategies are applied. In the linkage study, related individuals are genotyped with several hundreds to thousands of polymorphisms distributed throughout the genome and attempts are made to identify genetic markers that cosegregate with the disease. Genetic linkage methods have been applied successfully to identify the mutated gene in Mendelian diseases (Risch 1991). If investigating the genetic basis of complex disorders, the association or linkage disequilibrium approach is more powerful (Risch and Merikangas 1996). It involves establishing genotype–phenotype correlations in unrelated individuals that are solely selected on the basis of being affected by a phenotype or not (Clark 2003).
Genetic association studies require a large number of samples to achieve statistically significant results that indicate that a particular allele in a particular region of the genome confers an increased risk of developing the disorder. Many association studies based on the analysis of candidate genes that involve genotyping of tens or hundreds of SNPs in hundreds or thousands of samples have been published. In the past five years, the ability to assay for more than 100 000 SNPs distributed across the genome has enabled the systematic study of complex disorders under a whole genome approach, without any preconceived hypothesis or candidates. Successful genome-wide association studies (GWAS) have been conducted for common diseases such as age-related macular degeneration, rheumatoid arthritis, asthma, Crohn's disease, bipolar disorder, coronary heart disease, type 1 and type 2 diabetes among many others (Klein et al. 2005; Wellcome Trust Case Control Consortium 2007; Moffatt et al. 2007, 2010; Hindorff et al. 2009). A list of all GWAS and associated polymorphisms is kept up to date at www.genome.gov/gwastudies.
In the next sections the most popular methods and platforms for SNP genotyping are discussed, highlighting some practical aspects. Other related applications such as methylation, copy number analysis and second generation sequencing using the same underlying molecular approaches are covered thereafter.
Methods for interrogating SNPs
There are many mature SNP genotyping technologies that have been integrated into large-scale genotyping operations. SNP genotyping methods are still being improved, perfected, integrated and new methods are emerging to satisfy the needs of genomics and epidemiology. No one SNP genotyping method fulfils the requirements of every study that might be undertaken. The choice of a method depends on the scale of the envisioned genotyping project and the resources available. A project might require genotyping of a limited number of SNP markers in a large population or the analysis of a large number of SNP markers in a few samples. Flexibility in choice of SNP markers and DNAs to be genotyped or the possibility to precisely quantify an allele frequency in pooled DNA samples might also be issues.
SNP genotyping methods are very diverse (Syvänen 2001; Kim and Misra 2007). Broadly, each method can be separated into two elements, the biochemical method for discriminating SNP alleles and the actual analysis or measurement of the allele-specific products, which can be an array reader, a plate reader, a mass spectrometer, a gel separator/reader system, or other. In addition, most technologies also require a PCR amplification step to increase the number of target SNP-containing DNA molecules and to reduce the complexity of the template material used for the allele discrimination step.
The most popular methods for allele discrimination are restriction endonuclease digestion, primer extension, hybridization and oligonucleotide ligation (Figure 1.2a).
Figure 1.2 SNP genotyping technologies separated into allele discrimination methods (A) and detection of allele-specific products (B). Arrows denote genotyping assays that combine different allele discrimination and detection methods. 1 Restricion endonuclease digestion; 2 SNPlex; 3 iPLEX GOLD assay; 4 GoldenGate assay; 5 Infinium assay; 6 TaqMan assay; 7 GeneChip assay
Restriction endonuclease digestion
Restriction fragment length polymorphisms (RFLPs) are one of the first typing methods described and by far predate the coining of the term SNP (Botstein et al. 1980). Restriction endonuclease digestion is still a common format for SNP genotyping in a standard laboratory (Parsons and Heflich 1997). PCR products are digested with restriction endonucleases that are specifically chosen for the base change at the position of the SNP, resulting in a restriction cut for one allele but not the other (Figure 1.2a). In some cases, specific restriction sites can be created during the amplification step by using primers with minor changes in the sequence. Digestion patterns are used for allele assignment after gel electrophoresis. Major limitations of the restriction method are that it is only applicable to a fraction of SNPs and that it does not lend itself to automation.
Primer extension
Primer extension is a stable and reliable way of distinguishing alleles of a SNP. Nucleotides are added by a DNA polymerase generating allele-specific products (Syvänen 1999). Allele-specific primer extension (ASPE) is based on the ability of DNA polymerases to extend with high efficiency those oligonucleotides with 3′ perfectly matched ends (Figure 1.2a). It requires two allele-specific primers that have the nucleotide that corresponds to the allelic variant at their 3′ ends. In single base primer extension (SBE) an oligonucleotide hybridizes immediately before the SNP nucleotide and the DNA polymerase incorporates a single nucleotide that is complementary to the SNP allele (Figure 1.2a). SBE uses dideoxynucleotides (ddNTP) as terminators.
Hybridization
Alleles differing by one base can be distinguished by hybridizing complementary oligonucleotide sequences to the target DNA (ASO or allele-specific oligonucleotide hybridization), without any enzymatic reaction (Figure 1.2a). As the two alleles of a SNP are very similar in sequence, significant cross-talk can occur. Several approaches have been taken to overcome this problem: the use of multiple probes per SNP, the use of modified oligonucleotides such as peptide nucleic acids (PNAs) (Egholm et al. 1993) or locked nucleic acids (LNAs) (Ørum et al. 1999) that increase stability of DNA–DNA complexes, the real-time monitoring of the hybridization kinetics or the combination of hybridization and 5′ nuclease activity of polymerases.
Oligonucleotide ligation (OLA)
OLA relies on the specificity of DNA ligases to repair DNA nicks. For OLA, two oligonucleotides adjacent to each other are ligated enzymatically by a DNA ligase when the bases next to the ligation position are fully complementary to the template strand (Barany 1991; Jarvius et al. 2003) (Figure 1.2a). The assay requires three probes to be designed: two allele-specific probes that have at their 3′ ends the nucleotide complementary to the SNP variants and one common probe that anneals to the target DNA that is immediately adjacent. Padlock is a variant of OLA that employs two allele-specific oligonucleotides with target complementary sequences separated by a linker. When perfectly annealed to the target sequence, padlock probes are circularized by ligation (Nilsson et al. 1994).
Major detection methods include gel electrophoresis, mass spectrometry, fluorescence analysis, and chemiluminescence detection (Figure 1.2b). Nearly all of the above-described methods for allele-distinction have been combined with all of these analysis formats.
Gel electrophoresis
Allele-specific DNA fragments of different sizes can be separated by electrophoretic migration through gels (Szántai and Guttman 2006). Throughput and resolution can be increased if ‘gel-filled’ capillaries are used. Advantages of capillary systems over slab gel systems include the potential for 24-hour unsupervised operation, the elimination of cumbersome gel pouring and loading, and that no lane tracking is required. Instrumentation with 96- or 384-capillaries is commercially available.
Mass spectrometry
MALDI-TOF MS (matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry) can be used to measure the mass of the allele-specific products. It has been demonstrated as an analysis tool for SNP genotyping (Haff and Smirnov 1997; Tost and Gut 2005). The allele-specific products are deposited onto a matrix on the surface of a chip, ionized with a short laser pulse and accelerated towards the detector (Jurinke et al. 2002). The time-of-flight of a product to the detector is directly related to its mass. High resolution and speed are major advantages of the MALDI-TOF MS detection method. Resolution of the current generation of mass spectrometers allows the distinction of base substitutions in the range of 1.000–6.000 Da (this corresponds to product sizes of 3–20 bases, the smallest mass difference for a base change thymine to adenine is 9 Da).
Fluorescent analysis
Allele-specific products can be labelled with different fluorescent dyes and detected using fluorescent readout systems, either microtitre plate or array based (Landegren et al. 1998). Most readers use a white light source and optical filters to select specific excitation and emission wavelengths. Some of them can also measure parameters such as fluorescence polarization (FP, measures the increase in polarization of fluorescence caused by the decreased mobility of larger molecules) (P.Y. Kwok 2002) and Förster resonance energy transfer (FRET, measures the changes in fluorescence due the separation of two dyes of a donor/acceptor system) (Tong et al. 2001). Most popular fluorescent dyes used in SNP genotyping are Cy3 and Cy5.
Most current genotyping methods are generally based on the combination of one of the allelic discrimination and one of the detection methods described above (Figure 1.2). Often very different methods share elements, for example, reading out a fluorescent tag in a plate reader, or the primer extension method, which can be analysed in many different analysis formats.
Commercial platforms for SNP genotyping
A plethora of SNP genotyping platforms is currently commercially available (Ragoussis 2009). Many of them require purchasing expensive proprietary equipment and expensive laboratory set-up. However, they offer streamlined laboratory and analysis workflows. They range from individual SNP genotyping platforms (Life Technologies TaqMan) to focused content genotyping (Sequenom iPLEX Gold, Illumina GoldenGate) and to platforms for whole genome genotyping (WGG) (Illumina Infinium and Affymetrix GeneChips) (Fan et al. 2006). WGG arrays contain from 100 000 to 2.5 million SNPs selected by different approaches and with minor allele frequencies >0.05 in the general population.
TaqMan assay
The TaqMan assay (Life Technologies, www.appliedbiosystems.com) is based on allele-specific hybridization coupled with the 5′ nuclease activity of Taq polymerase during PCR (Holland et al. 1991; Livak 2003; Livak et al. 1995). The detection is performed by measuring the decrease of FRET from a donor fluorophore to an acceptor-quencher molecule. TaqMan probes are allele-specific probes labelled with a fluorescent reporter at the 5′ end and a common quencher attached to the 3′ end that virtually eliminates the fluorescence in the intact probe. Each assay uses two TaqMan probes that differ at the SNP site, and one pair of PCR primers. During PCR, successful hybridization of the TaqMan probe due to matching with one allele of the SNP results in its degradation by the 5′- to 3′-nuclease activity of the employed DNA polymerase whereby the fluorescent dye and quencher are separated, which promotes fluorescence. TaqMan probes can be designed to detect multiple nucleotide polymorphisms (MNPs) and insertion/deletions (InDels). Because of the simplicity in chemistry, the reaction set-up can be easily automated using liquid handling robots. The 7900HT Fast Real-Time PCR system (Life Technologies) allows up to eighty-four 384-well plates to be processed without manual intervention in less than 4 days. It is a very contamination-safe procedure as plates do not need to be opened after PCR for reading. In contrast, the limiting factors of the technology are the low SNP multiplexing level and the relatively high cost of the dual-labelled probes. Life Technologies has developed a library with 4.5 million genome-wide human TaqMan assays (of which 160 000 are validated assays) for which reagents are commercially available.
In the recent years, a couple of high-throughput real-time PCR instruments have been introduced. The Biomark system (Fluidigm, www.fluidigm.com) contains integrated fluidic circuits or ‘dynamic arrays’ that allow setting up 9216 genotyping reactions in a single experiment (Wang et al. 2009). The user has to simply dispense 96 DNA samples and 96 TaqMan genotyping assays and the dynamic array will then do the work of assembling the samples in all possible combinations. The OpenArray system (www.appliedbiosystems.com) (Morrison et al. 2006) can also perform SNP analysis using TaqMan probes. The OpenArray plate contains 3072 reaction through-holes generated by a differential coating process that deposits hydrophilic coatings on the interior of each through-hole and hydrophobic coatings on the exterior. This enables OpenArray plates to hold solutions in the open through-holes via capillary action. The company provides the researcher with OpenArray plates that are preloaded with the selected TaqMan probes (from 16 to 256 different assays per plate depending on the plate format). The main advantages of the Biomark and OpenArray systems compared to conventional thermocyclers are higher throughput, small sample requirement, low reagent consumption and less liquid handling.
iPLEX GOLD assay
The iPLEX Gold reaction (Sequenom, www.sequenom.com) is a method for detecting insertions, deletions, substitutions, and other polymorphisms that combines multiplex PCR followed by a single-base extension and MALDI-TOF MS detection (Jurinke et al. 2002; Oeth et al. 2009). After the PCR, remaining nucleotides are deactivated using shrimp alkaline phosphatase (SAP). The SAP cleaves a phosphate from the unincorporated dNTPs, converting them to dNDPs which renders them unavailable to future polymerization reactions. Next, a single base primer extension step is performed incorporating one of the four terminator nucleotides into the SNP site. The extension products are desalted and transferred onto chips containing 384 matrix spots. The allele-specific extension products of different masses are analysed using MALDI-TOF MS. In theory up to 40 different SNPs can be assayed together if the different allele-products have distinct masses; however, generally multiplexes on the order of 24 are more realistic. The whole lab workflow is highly automated and it takes less than 10 hours to process one 384 plate. The MassARRAY Analyzer 4 (Sequenom) can analyse from dozens to over 100,000 genotypes per day, and from tens to thousands of samples. Significant advantages of the method are that it requires standard unmodified oligonucleotides which are cheap and easy to come by. It is a very sensitive method with low input sample requirements and finally generates highly accurate data because it relies on the direct detection of the allele-specific product.
GoldenGate assay
The GoldenGate assay (Illumina, www.illumina.com) (Shen et al. 2005) can interrogate 48, 96, 144, 192, 384, 768 or 1536 SNPs simultaneously. The assay combines allele-specific primer extension and ligation for generating allele-specific products followed by PCR amplification with universal primers. Three oligonucleotides are designed for each SNP locus, two of which are allele-specific (ASO) with the SNP allele on their 3′ end, and a locus specific oligonucleotide (LSO) that hybridizes several bases downstream the SNP site. The LSO primer also contains a unique address sequence that allows separating the SNP assay products for individual readout. In the protocol, during the hybridization process, the oligonucleotides hybridize to the genomic DNA that has been first immobilized on a solid support. The complementary ASO is extended and ligated to the LSO, providing high locus specificity. The ligated products are then amplified using universal PCR primers P1, P2 and P3. Primers P1 and P2 are specific for each ASO and carry a fluorescent tag that is used for allele calling.
The separation of the assay products in solution onto a solid format is done using Veracode technology (48, 96, 144, 192 or 384-plex) (Lin et al. 2009). It uses cylindrical glass microbeads (240 microns in length) with unique digital holographic codes and coated with capture oligonucleotides that are complementary to one of the addresses present in the PCR products. When excited by a laser, each VeraCode bead emits a unique holographic code image. The BeadXpress reader (Illumina) can identify the individual bead types and in addition detect the results from the two-colour genotyping assay. The Veracode technology contains assay replicates of 20–30 beads per bead type, providing a high level of quality control.
Infinium assay
The Infinium II assay (Illumina, www.illumina.com) uses a two-colour SBE protocol for allelic discrimination coupled with the BeadChip technology for assay detection (Steemers and Gunderson 2007). Whole genome amplified (WGA) samples are hybridized to 50-mer oligonucleotide probes covalently attached to particular microspheres or beads that are randomly assembled in microwells on planar silica slides (BeadChips). After the hybridization, the SNP locus-specific oligonucleotides are extended with the corresponding fluorescently labelled dideoxynucleotides. The intensities of the bead's fluorescence are detected by the iScan Reader (Illumina).
Currently available BeadChips for human allow profiling samples with 300 000 to 2.5 M SNPs distributed throughout the genome. SNP selection in these chips is based on results from the HapMap project (www.hapmap.org) providing high coverage across the genome (see ‘SNP databases’). New arrays with up to 5 M common and rare variants from the 1000 Genomes Project (www.1000genomes.org) are in development. The Infinium assay can also be used also to develop BeadChips with customized SNP content (iSelect). Genome-wide genotyping BeadChips are also available for other species such as cattle, pigs and dogs.
GeneChip assay
In the GeneChip assay (Affymetrix, www.affymetrix.com) allelic discrimination is achieved by direct hybridization of labelled DNA to arrays containing allele-specific oligonucleotides. These 25-mer probes are synthesized in an ordered fashion on a solid surface by a light-directed chemical process (photolithography) (Fodor et al. 1991). Oligonucleotides covering the complementary sequence of the two alleles of a SNP are on specific positions of the array. Multiple probes for each SNP are used to increase the genotyping accuracy. The hybridization pattern of all oligonucleotides spanning the SNP is used to evaluate positive and negative signals.
Genomic DNA is digested with a restriction endonuclease and ligated to adaptors that recognize the cohesive 4 bp overhangs. The ligation products are then amplified by PCR using a single universal primer and creating a reduced representation of the genome (Kennedy et al. 2003). Next, PCR amplicons are fragmented, end-labelled and hybridized to the array under stringent conditions. After extensive washing steps, the remaining fluorescence signal is automatically recorded by the GeneChip 3000 scanner (Affymetrix). A specific fluidics station and a hybridization oven are also required to carry out the procedure.
Affymetrix has developed several microarrays designed specifically to interrogate SNPs distributed throughout the human genome. The most comprehensive array, the Genome-Wide Human SNP Array 6.0 has 1.8 million genetic markers, including 906 600 SNPs. The median inter-marker distance over all 1.8 million SNP and copy number markers combined is less than 700 bases. Affymetrix has also launched a new high-throughput genotyping assay, the Axiom Genotyping Solution. It is based on a 96-sample format and can process more than 750 samples per week. The initial Axiom Genome-Wide Human Array contains more than 560 000 SNPs.
Other popular platforms for SNP genotyping are SNPstream (Beckman Coulter) and Pyrosequencing (Qiagen) (Table 1.1) (Syvänen 2001; Sobrino et al. 2005; Ragoussis 2009).
Table 1.1 Characteristics of commercially available genotyping systems
Practical recommendations
Different aspects have to be taken into consideration when setting-up a genotyping platform: DNA quality assessment, contamination control, automation and data quality control measures.
In high-throughput laboratories, liquid handling automation is essential both for the SNP allele-discrimination and allele-detection processes (Gut 2001). It not only speeds up the genotyping process but also reduces errors introduced by human handling and pipetting and minimizes the possibility of cross-contamination of samples. Many suppliers of laboratory robotics offer liquid handling robots that can be integrated into high-throughput genotyping workflows. In general, the ease of automation is directly correlated to the complexity of an SNP genotyping protocol. Steps such as gel-filtration and manipulation of magnetic beads can be more problematic to automate. Current liquid handling robots can support both plates and slide microarray formats.
One of the biggest challenges in running SNP genotyping at high-throughput is the management of the production line. A Laboratory Information Management System (LIMS) is a software tool for keeping track of samples, laboratory users, instruments, lab processes, quality standards, and results. Originally, LIMS were developed in-house but currently there are several commercial solutions available such as Biotracker (Ocimum Biosolutions), Geneus (GenoLogics) and StarLIMS (StarLIMS Corporation). Complete systems for the entire high-throughput SNP genotyping process, with automation and LIMS, are marketed as off-the-shelf products. Examples of this are systems from Affymetrix, Sequenom and Illumina. In addition, all platforms discussed in the previous section have developed analysis software for fully automatic scoring of alleles and genotypes and monitoring the performance of all controls (Figure 1.3).
Figure 1.3 Genotype cluster plot for one SNP genotyped across 270 samples using the GoldenGate assay and the Veracode technology. Each data point represents one sample, the y-axis is normalized signal intensity (sum of intensities of the two fluorescent signals) and the x-axis is the theta value that indicates the allelic angle. The software automatically clusters the DNA samples into two homozygous clusters (red and blue) and a heterozygote cluster (yellow). Points depicted in black are unsuccessfully genotyped samples
One of the greatest concerns in optimizing a genotyping laboratory is to control for PCR contamination. The high-throughput and repetition of assays with common primer pairs can easily lead to amplification of cross-contamination. The most important recommendation for preventing contamination is to maintain separate areas, dedicated equipment and supplies for pre-PCR steps (sample preparation and PCR set-up) and post-PCR steps (thermocycling and analysis of PCR products). The rule of thumb should be never to bring amplified PCR products into the PCR set-up area. Uracil-DNA glycosylase (UNG) can also be used to prevent carryover contamination of the PCR products (Longo et al. 1990). By using dUTP instead of dTTP in all PCRs, UNG treatment can prevent the reamplification of carryover PCR products by removing any uracil incorporated into the amplicons and then cleaving the DNA at the created abasic sites. Finally, laboratory practices such as the use of disposable filter tips, positive-displacement pipettes, non-contact dispensing options and periodical lab and instrument cleaning also help reduce the risk of carryover contamination (S. Kwok and Higuchi 1989).
Genotyping errors have a deleterious effect on the statistical analysis of the data. To address this issue several quality controls should be carried out in each genotyping experiment: negative controls to monitor cross-contamination, positive controls to check concordance with publicly available data and replicate DNA samples to account for intra- and inter-plate reproducibility (Pompanon et al. 2005). Analysis statistics such as deviation from the Hardy–Weinberg equilibrium, Mendelian inconsistencies in pedigrees or the number of inferred recombinants can also be of great value for identifying potential genotyping errors. Finally, it is also recommended to check regularly a subset of SNPs with at least two different platforms to evaluate platform performance (Lahermo et al. 2006). Most of the common genotyping platform vendors described in the previous section provide extensive quality measures of several protocol steps to ensure an overall assay accuracy of >99%.
Monitoring the quality of DNA samples prior to genotyping is the most important factor for achieving optimum genotyping results. Low quantity and/or quality DNA samples negatively affect the call rate (proportion of SNPs receiving a genotype call) and also lead to a higher number of genotyping errors. DNA needs to be in a reaction with sufficient representation of the two alleles—1 ng of genomic human DNA corresponds to 300 copies of the genome. This is more than sufficient starting material for genotyping an individual polymorphism. Reducing the amount of genomic DNA starting material may result in allele-dropout and an increased risk of contamination. High-multiplex genotyping methods tend to be cheap in terms of DNA requirements per polymorphism.
DNA quantification and quality control is often conducted with a UV spectrophotometer at wavelengths of 260 nm and 280 nm. The ratio of absorbance readings at the two wavelengths should be between 1.8 and 2.2, while protein contamination can be assessed by measuring the A260/230 ratio (1.6–2.4). A more precise quantification of the double-stranded DNA target can obtained using a fluorescent nucleic acid stain such as Picogreen (Invitrogen) and a fluorometer (excitation and emission wavelengths of 502 nm and 523 nm, respectively) or by real-time qPCR using a single-copy gene as a copy number reference. Finally, the integrity and molecular weight of DNA are measured by gel electrophoresis using either agarose gels or an instrument such as the Agilent Bioanalyzer.
SNP databases
SNP databases such as dbSNP and HapMap are essential resources for the study of human complex disorders and for evolutionary studies.
The Single Nucleotide Polymorphism database (dbSNP, www.ncbi.nlm.nih.gov/projects/SNP) was launched in 1998 as a public-domain archive of simple genetic polymorphisms. It contains SNP-related information such as SNP flanking DNA sequences, alleles, allele frequencies, validation status and functional relationships to genes (Sherry et al. 2001). As of build 132 (September 2010), dbSNP has collected over 244 million submissions corresponding to more than 87 million reference SNP clusters (refSNP) from 100 organisms, including Homo sapiens, Mus musculus, Gallus gallus, Oryza sativa, Zea mays and many other species. A full list of organisms and the number of reference SNP clusters for each can be found at www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi. The data of dbSNP is also included in repositories such as ENSEMBL (www.ensembl.org) and the UCSC Genome Browser (genome.ucsc.edu).
The international HapMap project (www.hapmap.org) started in 2002 with the aim of cataloguing the vast amount of genetic variation in humans and describing how it is organized in short stretches of strong linkage disequilibrium (haplotype blocks) that coincide with ancient ancestral recombination events (Couzin 2002; Pääbo 2003). Since then more than 3 million SNPs (with an average density of 1 SNP per kb and minor allele frequency >0.05) have been analysed in 270 individuals from populations with African, Asian and European ancestry (International HapMap Consortium 2007). HapMap results provide researchers with a selection of SNP markers that tag haplotype blocks to reduce the number of genotypes that have to be measured for a genome-wide association study (more than 500 000 tag SNPs are required to capture all Phase II SNPs with r2 ≥ 0.8 in a population from Northern Europe (CEU)). In Phase III, 1,184 reference individuals from 11 global populations have been genotyped for 1.6 million SNPs (International HapMap Consortium 2010).
Recent improvements in sequencing technology (see ‘Second generation sequencing’) fostered the creation of the 1000 Genomes Project (1000 GP, www.1000genomes.org) in 2008. The aim of the project is to obtain a nearly complete catalogue of all human genetic variations with frequencies greater than 1% by sequencing the genomes of 2500 individuals from different populations. Data from three pilot projects is already available: low coverage sequencing of 180 individuals, sequencing at deep coverage of six individuals and sequencing gene regions in 900 individuals. 1000 GP data is further improving the process of identification of disease-associated regions.
Resources such as dbSNP, HapMap and 1000 GP have unquestionably saved medical researchers a lot of time and cost in their projects. All of the information generated by these projects is rapidly released into the public domain. In addition, DNA samples used in the HapMap and 1000 Genomes projects are also publicly available through Coriell Institute (ccr.coriell.org).
Methylation analysis
In mammals, epigenetic modifications are known to play a critical role in the regulation of gene expression across the genome and in maintaining genomic stability (Bernstein et al. 2007). Many studies have implicated aberrant methylation in the aetiology of common human diseases, including cancer, multiple sclerosis, diabetes and schizophrenia (Tost 2010). Alterations in DNA methylation can be used as biomarkers for early cancer detection, to discriminate among tumour subtypes or to predict disease outcome (Shames et al. 2007).
In animals and plants, methylation occurs preferentially at the 5 position of cytosines within CpG dinucleotides. Reversible methylation of cytosine in the sequence context CpG adds a dynamic component to DNA because it can act as a switch of transcription of a gene if the CpGs are in the promoter region of the gene (Suzuki and Bird 2008). CpGs that can be methylated or not are termed methylation variable positions (MVPs). Methylation information is lost during PCR or primer extension reactions. Nonetheless, measuring the degree of DNA methylation can be done by virtually any SNP genotyping method if a genomic DNA sample is prior treated with bisulfite (L. Shen and Waterland 2007). Bisulfite treatment of DNA results in conversion of non-methylated C into U, while methylated C remains unchanged. After bisulfite conversion and PCR amplification, determination of the degree of methylation at a given MVP in the genomic DNA sample can be achieved by quantifying the degree of C and T at that position. The quantitative resolution of the genotyping method determines the accuracy of measurement that can be achieved. One of the most widely used methods for quantifying MVPs is pyrosequencing (Dupont et al. 2004). Pyrosequencing is a sequencing-by-synthesis method based on the detection of pyrophosphate (PPi) which is released during DNA synthesis in a quantity equimolar to the amount of incorporated nucleotide. Assay set-up is straight forward and the accuracy of quantification is better than 2% if using the PyroMark MD instrument (Qiagen, www.pyrosequencing.com). MALDI-TOF mass spectrometry can also be used to detect cytosine methylation using bisulfite conversion biochemistry, followed by PCR and base-specific cleavage process that generate a distinct signal pattern from the methylated and non-methylated template DNA (Sequenom, www.sequenom.com) (Ehrich et al. 2005).
There are several array-based methods to identify global patterns of CpG methylation (Beck and Rakyan 2008; Laird 2010). The most popular one is the Human-Methylation27 and HumanMethylation450 BeadChips developed by Illumina (www.illumina.com). It uses the Infium II assay (see ‘Commercial platforms for SNP genotyping’) to interrogate bisulfite converted DNA for up to 450 000 CpG methylation sites from 99% of RefSeq genes. It reliably detects a difference of 20% in methylation.
The first complete maps of DNA methylation (or methylome) with a one base pair resolution were obtained for Arabidopsis thaliana (Lister et al. 2008) and two human cell types (Lister et al. 2009) using bisulfite treatment and second generation sequencing technologies (see ‘Second generation sequencing’).
Copy number variation analysis
Copy number variants (CNVs) are a common form of genetic variation in human populations (Database of Genomic Variants, projects.tcag.ca/variation) (Redon et al. 2006; McCarroll et al. 2008). By analogy to the standard definition of a SNP, a CNV is a copy number polymorphism that ranges from one kilobase to several megabases in size and has a minor allele frequency of 1% or greater.
CNVs may alter gene function by affecting gene dosage, positional effect or by directly interrupting genes. Although most CNVs are neutral polymorphic variants, some of them have been demonstrated to be associated with human diseases such as autism, schizophrenia, mental retardation or psoriasis (Zhang et al. 2009). Screening for CNVs, in addition to SNP genotyping, in disease gene identification studies is a major trend in current research projects.
There are several methods to quantify copy number variation across the genome for research and diagnostic purposes. Array comparative genome hybridization (array CGH) is a powerful tool for discovering previously unrecognized submicroscopic aberrations in cancer and genomic disorders just by measuring hybridization intensities (Carter 2007; Gresham et al. 2008). Current arrays for CGH contain up to 4.2 million oligonucleotide probes that enable genome-wide detection of CNVs down to 1.5–5 Kb resolution (Nimblegen, www.nimblegen.com) (Agilent, www.agilent.com).
Current WGG SNP arrays, although originally designed for SNP genotyping, can also be used to capture CNVs at a genome-wide scale (Cooper et al. 2008; Carter 2007). SNP-CGH, unlike conventional array CGH, can detect copy neutral genetic abnormalities such as uniparental disomy (UPD) and loss of heterozygosity (LOH). HumanOmin2.5 Beadarrays (Illumina) contain nearly 2.5 million genetic markers (median spacing between markers is 1.5 kb), including 60 000 CNV-targeted markers. The human SNP Array 6.0 (Affymetrix) features 1.8 million genetic markers, including more than 946 000 probes for the detection of copy number variation. The Cytogenetics Whole-Genome 2.7 M Array (Affymetrix) provides a comprehensive analysis of structural variation. It contains 2.7 million copy number markers, including 2.3 million of non-polymorphic markers and 400 000 SNPs.
Over the past few years, a number of software packages have been developed to detect changes in copy number by using SNP-CGH data and analysing total signal intensities and allelic intensity ratios (Winchester et al. 2009). Different methods based on Hidden Markov Models, circular binary segmentation or mixed models are used to detect CNV segment boundaries. In a second step particular segments that are different in copy number compared with values from a reference individual or group of individuals are identified. For robust detection, a CNV interval requires significant ratio shifts in several consecutive probes.
Because of the relatively low signal-to-noise ratio and high experimental variation that characterizes many of the platforms, candidate CNVs should be validated by alternative low-throughput techniques such as multiplex ligation-dependent probe amplification (MLPA) or real-time PCR.
Second generation sequencing technologies
Second generation sequencing (2ndGS) technologies have dramatically increased the throughput and reduced the costs of DNA sequencing compared with conventional Sanger sequencing methods. Although the cost is still one order of magnitude higher than whole genome genotyping (WGG), the general goal of sequencing a human genome for $1000 in less than one day seems realistic in the near-term. This will open the door for WGG by whole genome sequencing (WGS).
2ndGS platforms allow sequencing of millions of clonally amplified and spatially separated DNA fragments simultaneously. The sequencing process itself is a repetition of cycles of enzymatic reactions (polymerase-based nucleotide incorporations or oligonucleotide ligations, depending on the platform) and imaging-based data collection (Shendure and Ji 2008). The resulting sequence tags or reads are then aligned to a reference genome and genetic polymorphisms (SNPs, InDels, SVs) identified. 2ndGS instruments from Roche/454 (Genome FLX, www.454.com), Illumina (HiSeq2000, www.illumina.com) and Life Technologies (SOLiD 5500xl, solid.appliedbiosystems.com) are commercially available. The best performing of these instruments can generate tens of gigabases of sequence per day.
The main limitations of some of the 2ndGS technologies for WGS are the short read lengths (50 to 500 bp depending on the platform) and the relatively high error rate. The first problem can be partially circumvented by using an approach that generates sequences from both ends of each DNA molecule (paired-end or mate-pair sequencing) thus facilitating the alignment process. Nevertheless, analysis of large and highly repetitive regions is still not feasible. Increasing the depth of coverage (obtaining multiple reads from the same region) is the best option for improving the consensus read accuracy and ensuring high confidence in determination of genetic variants.
WGS enables the cataloguing of all kinds of genetic variation. The sequences of several individual genomes using 2ndGS technologies have been reported recently. Between 3 and 4 million SNPs per genome were identified using different algorithms (Metzker 2010). A large number of InDels that were undetectable with any of the high-throughput genotyping systems described above are starting to emerge in the databases thanks to 2ndGS methods. Finally, paired-end and mate-pair sequencing methods are also able to discern, at one base pair resolution, many CNVs and other SVs such as inversions and translocations in individual genomes. Recently, de novo assembly of two human genomes have been reported, allowing the discovery of new SVs in an unbiased manner (Li et al. 2010).
A third generation of sequencing technologies based on single-molecule sequencing is under development (Check Hayden 2009). Companies such as Pacific Biosciences (www.pacificbiosciences.com) and Oxford Nanopore Technologies (www.nanoporetech.com) are currently leading this sector. It is likely that routinely DNA sequencing of whole genomes for clinical or research purposes will happen in the near future.
Conclusions
Methods for genotyping and sequencing have come a long way in the past decade. Microsatellites that were the markers of choice a decade ago are no longer used and have been replaced by high-resolution SNP genotyping. Association studies that were very difficult to carry out and for which few positive results had been achieved a decade ago are now used routinely with great success. Several commercial solutions for WGG using hundreds of thousands of SNPs exist. Use of such methods has reached a level where it is now possible to join datasets that were produced in different laboratories by different groups. Meta-analysis has been very successful and added much insight. All of this has revolutionised molecular genetics and resulted in the identification of an unprecedented number of disease-associated genes. However, markers on the commercial genome-wide genotyping arrays have been selected based on the ‘common disease–common variant’ hypothesis that says that common pathologies are associated with variants of polymorphisms that are present at quite a high level in the general population—have a high minor allele frequency. Results of GWAS have nearly exclusively been variants that confer only marginal additional risk. Very few of the associated variants lead to amino acid changes and could thus be assigned a change of function. Thinking is now shifting that rare variants, that by chance are distributed unevenly onto frequent haplotypes, might be causative. However, including rarer polymorphisms in the genome-wide genotyping system is a game of diminishing returns. It still holds the risk that disease-causing variants might be very rare and private to a few families and thus not represented on the arrays. With the advent of second generation sequencing methods scientists are starting to look towards identifying/genotyping private and rare variants by sequencing. There is a crossover point where sequencing technologies (see ‘Second generation sequencing’) will become more cost effective than genotyping methods and a point where high-resolution WGG and WGS will have the same price tag and possibly comparable throughput. At this point it will be of interest to consider merging the linkage and association strategy and population genetic strategies in study design.
The ever-expanding toolbox for genetics has added refinement and standardization to the procedures. Methods for genome-wide, custom and quantitative genotyping exist. They can be applied for the reliable genotyping of SNPs, CNVs, InDels, and DNA methylation. Based on this, many interesting results have already been generated and with the continuing improvement of technologies the era of genomics is well underway.
References
Barany, F. (1991) Genetic disease detection and DNA amplification using cloned thermostable ligase. Proc Natl Acad Sci USA88: 189–193.
Beck, S. and Rakyan, V.K. (2008) The methylome: approaches for global DNA methylation profiling. Trends Genet24: 231–237.
Bernstein, B.E., Meissner, A. and Lander, E.S. (2007) The mammalian epigenome. Cell128: 669–681.
Botstein, D. White, R.L., Skolnick, M. and Davis, R.W. (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet32: 314–331.
Brookes, A.J. (1999) The essence of SNPs. Gene234: 177–186.
Carter, N.P. (2007) Methods and strategies for analyzing copy number variation using DNA microarrays. Nat Genet 39(7 Suppl): S16–21.
Check Hayden, E. (2009) Genome sequencing: the third generation. Nature457: 768–769.
Clark, A.G. (2003) Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr Opin Genet Dev13: 296–302.
Cooper, G.M., Zerr, T., Kidd, J.M., Eichler, E.E. and Nickerson, D.A. (2008) Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat Genet40: 1199–203.
Couzin, J. (2002) Human genome. HapMap launched with pledges of 100 M$. Science298: 941–942.
Dupont, J.M., Tost, J., Jammes, H. and Gut, I.G. (2004) De novo quantitative bisulfite sequencing using the pyrosequencing technology. Anal Biochem333: 119–127.
Egholm, M., Buchardt, O., Christensen, L. et al. (1993) PNA hybridizes to complementary oligonucleotides obeying the Watson–Crick hydrogen bonding rules. Nature365: 566–568.
Ehrich, M., Nelson, M.R., Stanssens, P. et al. (2005) Quantitative high-throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry. Proc Natl Acad Sci USA102: 15785–15790.
Fan, J.B., Chee, M.S. and Gunderson, K.L. (2006) Highly parallel genomic assays. Nat Rev Genet7: 632–644.
Fodor, S.P., Read, J.L., Pirrung, M.C. et al. (1991) Light-directed, spatially addressable parallel chemical synthesis. Science251: 767–773.
Gresham, D., Dunham, M.J. and Botstein, D. (2008) Comparing whole genomes using DNA microarrays. Nat Rev Genet9: 291–302.
Gut, I.G. (2001) Automation in genotyping of single nucleotide polymorphisms. Human Mut17: 475–492.
Haff, L. and Smirnov, I.P. (1997) Single-nucleotide polymorphism identification assays using a thermostable DNA polymerase and delayed extraction MALDI-TOF mass spectrometry. Genome Res7: 378–388.
Hindorff, L.A., Sethupathy, P., Junkins, H.A. et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA106: 9362–9367.
Holland, P.M., Abramson, R.D., Watson, R. and Gelfand, D.H. (1991) Detection of specific polymerase chain reaction product by utilizing the 5′–3′ exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci USA88: 7276–7280.
International HapMap Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature449: 851–861.
International HapMap 3 Consortium. (2010) Integrating common and rare genetic variation in diverse human populations. Nature467: 52–8.
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature431: 931–945.
Jarvius, J., Nilsson, M. and Landegren, U. (2003) Oligonucleotide ligation assay. Methods Mol Biol212: 215–228.
Jorde, L.B. (1995) Linkage disequilibrium as a gene-mapping tool. Am J Hum Genet56: 11–14.
Jurinke, C., van den Boom, D., Cantor, C.R. and Köster, H. (2002) The use of MassARRAY technology for high throughput genotyping. Adv Biochem Eng Biotechnol77: 57–74.
Kennedy, G.C., Matsuzaki, H., Dong, S. et al. (2003) Large-scale genotyping of complex DNA. Nat Biotechnol21: 1233–1237.
Kim, S. and Misra, A. (2007) SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng9: 289–320.
Klein, R.J., Zeiss, C., Chew, E.Y. et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science308: 385–289.
Kwok, P.Y. (2002) SNP genotyping with fluorescence polarization detection. Hum Mutat19: 315–23.
Kwok, S. and Higuchi, R. (1989) Avoiding false positives with PCR. Nature339: 237–238.
Lahermo, P., Liljedahl, U., Alnaes, G. et al. (2006) A quality assessment survey of SNP genotyping laboratories. Hum Mutat27: 711–714.
Laird, P.W. (2010) Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet11: 191–203.
Landegren, U., Nilson, M. and Kwok, P.-Y. (1998) Reading bits of genetic information: methods for single-nucleotide polymorphism analysis. Genome Res8: 769–776.
Li, R., Zhu, H., Ruan, J. et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res20: 265–272.
Lin, C.H., Yeakley, J.M., McDaniel, T.K. and Shen, R. (2009) Medium- to high-throughput SNP genotyping using VeraCode microbeads. Methods Mol Biol496: 129–142.
Lister, R., O'Malley, R.C., Tonti-Filippini, J. et al. (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell133: 523–536.
Lister, R., Pelizzola, M., Dowen, R.H. et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature462: 315–322.
Livak, K., Marmaro, J. and Todd, J.A. (1995) Towards fully automated genome-wide polymorphism screening. Nat Genet9: 341–342.
Livak, K.J. (2003) SNP genotyping by the 5′-nuclease reaction. Methods Mol Biol212: 129–147.
Longo, M.C., Berninger, M.S. and Hartley, J.L. (1990) Use of uracil DNA glycosylase to control carry-over contamination in polymerase chain reactions. Gene93: 125–128.
McCarroll, S.A., Kuruvilla, F.G., Korn, J.M. et al. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet40: 1166–74.
Metzker, M.L. (2010) Sequencing technologies—the next generation. Nat Rev Genet11: 31–46.
Moffatt, M.F., Gut, I.G., Demenais, F. et al.; GABRIEL Consortium (2010) A large-scale genomewide association study of asthma. New Engl J Med363: 1211–1221.
Moffatt, M.F., Kabesch, M., Liang, L. et al. (2007) Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature448: 470–473.
Morrison, T., Hurley, J., Garcia, J. et al. (2006) Nanoliter high throughput quantitative PCR. Nucleic Acids Res34: e123.
Nilsson, M., Malmgren, H., Samiotaki, M. et al. (1994) Padlock probes: circularizing oligonucleotides for localized DNA detection. Science265: 2085–2088.
Oeth, P., del Mistro, G., Marnellos, G. et al. (2009) Qualitative and quantitative genotyping using single base primer extension coupled with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MassARRAY). Methods Mol Biol578: 307–343.
Ørum, H., Jakobsen, M.H., Koch, T. et al. (1999) Detection of the factor V Leiden mutation by direct allele-specific hybridization of PCR amplicons to photoimmobilized locked nucleic acids. Clin Chem45: 1898–1905.
Pääbo, S. (2003) The mosaic that is our genome. Nature421: 409–412.
Parsons, B.L. and Heflich, R.H. (1997) Genotypic selection methods for the direct analysis of point mutations. Mutat Res387: 97–121.
Pompanon, F., Bonin, A., Bellemain, E. and Taberlet, P. (2005) Genotyping errors: causes, consequences and solutions. Nat Rev Genet6: 847–859.
Ragoussis, J. (2009) Genotyping technologies for genetic research. Annu Rev Genom Hum Genet10: 117–133.
Redon, R., Ishikawa, S., Fitch, K.R. et al. (2006) Global variation in copy number in the human genome. Nature444: 444–454.
Risch, N. (1991) Developments in gene mapping with linkage methods. Curr Opin Genet Dev1: 93–8.
Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science273: 1516–1517.
Shames, D.S., Minna, J.D. and Gazdar, A.F. (2007) DNA methylation in health, disease, and cancer. Curr Mol Med7: 85–102.
Shen, L. and Waterland, R.A. (2007) Methods of DNA methylation analysis. Curr Opin Clin Nutr Metab Care10: 576–581.
Shen, R., Fan, J.B., Campbell, D. et al. (2005) High-throughput SNP genotyping on universal bead arrays. Mutat Res573: 70–82.
Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat Biotechnol26: 1135–1145.
Sherry, S.T., Ward, M.H., Kholodov, M. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res29: 308–311.
Sobrino, B., Brión, M. and Carracedo, A. (2005) SNPs in forensic genetics: a review on SNP typing methodologies. Forensic Sci Int154: 181–194.
Steemers, F.J. and Gunderson, K.L. (2007) Whole genome genotyping technologies on the BeadArray platform. Biotechnol J2: 41–49.
Suzuki, M.M. and Bird, A. (2008) DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet9: 465–476.
Syvänen, A.-C. (1999) From gels to chips: ‘minisequencing’ primer extension for analysis of point mutations and single nucleotide polymorphisms. Hum Mutat13: 1–10.
Syvänen, A.-C. (2001) Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat Rev Genet2: 930–942.
Szántai, E. and Guttman, A. (2006) Genotyping with microfluidic devices. Electrophoresis27: 4896–4903.
Tong, A.K., Li, Z., Jones, G.S. et al. (2001) Combinatorial fluorescence energy transfer tags for multiplex biological assays. Nat Biotechnol19: 756–759.
Tost, J. (2010) DNA methylation: an introduction to the biology and the disease-associated changes of a promising biomarker. Mol Biotechnol44: 71–81.
Tost, J. and Gut, I.G. (2005) Genotyping single nucleotide polymorphisms by MALDI mass spectrometry in clinical applications. Clin Biochem38: 335–350.
Wang, J., Lin, M., Crenshaw, A. et al. (2009) High-throughput single nucleotide polymorphism genotyping using nanofluidic Dynamic Arrays. BMC Genomics10: 561.
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature447: 661–678.
Winchester, L., Yau, C. and Ragoussis, J. (2009) Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic8: 353–366.
Zhang, F., Gu, W., Hurles, M.E. and Lupski, J.R. (2009) Copy number variation in human health, disease, and evolution. Annu Rev Genom Hum Genet10: 451–81.
Chapter 2
DNA Chip Analysis in Genome Discovery
Ross N. Nazar and Jane Robb
Introduction
