A Statistical Approach to Genetic Epidemiology - Andreas Ziegler - E-Book

A Statistical Approach to Genetic Epidemiology E-Book

Andreas Ziegler

0,0
76,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A Statistical Approach to Genetic Epidemiology

After studying statistics and mathematics at the University of Munich and obtaining his doctoral degree from the University of Dortmund, Andreas Ziegler received the Johann-Peter-Süssmilch-Medal of the German Association for Medical Informatics, Biometry and Epidemiology for his post-doctoral work on “Model Free Linkage Analysis of Quantitative Traits” in 1999. In 2004, he was one of the recipients of the Fritz-Linder-Forum-Award from the German Association for Surgery.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 938

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Copyright

Preface

Acknowledgments

1 Molecular Genetics

1.1 WHAT IS THE NATURE OF GENETIC INFORMATION?

1.2 HOW IS GENETIC INFORMATION TRANSMITTED FROM GENERATION TO GENERATION?

1.3 WHAT IS INDIVIDUAL VARIATION IN GENETIC INFORMATION?

1.4 PROBLEMS

URLs

2 Formal Genetics

2.1 WHAT ARE MENDEL’S LAWS?

2.2 HOW ARE PHENOTYPES TRANSMITTED IN FAMILIES?

2.3 WHICH COMPLICATIONS TO THE GENERAL INHERITANCE PATTERNS EXIST?

2.4 WHAT IS THE LAW DETECTED BY HARDY AND WEINBERG?

2.5 PROBLEMS

URLs

3 Genetic Markers

3.1 WHAT IS A GENETIC MARKER?

3.2 WHAT TYPES OF GENETIC MARKERS ARE THERE?

3.3 WHAT ARE GENOTYPING METHODS FOR SINGLE NUCLEOTIDE POLYMORPHISMS?

3.4 PROBLEMS

URLs

4 Data Quality

4.1 HOW CAN PEDIGREE ERRORS BE DETECTED?

4.2 HOW CAN GENOTYPING ERRORS BE DETECTED IN FAMILY-BASED STUDIES?

4.3 HOW SHOULD GENOTYPING ERRORS BE CHECKED IN POPULATION-BASED STUDIES USING THE HARDY–WEINBERG EQUILIBRIUM?

Algorithm 4.1.

Algorithm 4.2.

Algorithm 4.3.

4.4 HOW CAN GENOTYPING ERRORS BE DETECTED IN HIGH-THROUGHPUT GENOTYPING STUDIES?

4.5 HOW SHOULD CLUSTER PLOTS AND HOW CAN THE QUALITY OF CLUSTERS BE INVESTIGATED IN HIGH-THROUGHPUT GENOTYPING STUDIES?

4.6 PROBLEMS

URLs

5 Genetic Map Distances

5.1 WHAT IS PHYSICAL DISTANCE?

5.2 WHAT IS MAP DISTANCE?

5.3 WHAT ARE LINKAGE DISEQUILIBRIUM UNITS?

5.4 PROBLEMS

URLs

6 Familiality, Heritability, and Segregation Analysis

6.1 WHAT IS THE DIFFERENCE BETWEEN THE FAMILY HISTORY METHOD AND THE FAMILY STUDY METHOD?

6.2 HAS THE PHENOTYPE OF INTEREST A FAMILIAL COMPONENT? WHAT ARE RECURRENCE RISK RATIOS?

6.3 WHAT IS THE CONCEPT OF HERITABILITY?

6.4 WHAT ARE TWIN STUDIES? WHAT ARE ADOPTION STUDIES?

6.5 WHAT ARE CRITICAL ASPECTS WHEN INVESTIGATING FAMILIAL RESEMBLANCE?

6.6 HOW CAN EVIDENCE FOR A MAJOR GENE EFFECT BE ESTABLISHED? HOW CAN A SEGREGATION PATTERN FOLLOWING MENDELIAN INHERITANCE BE DETERMINED?

6.7 PROBLEMS

URLs

7 Model-Based Linkage Analysis

7.1 HOW CAN THE RECOMBINATION FRACTION BE ESTIMATED BETWEEN TWO GENETIC MARKERS?

7.2 HOW CAN THE RECOMBINATION FRACTION BE ESTIMATED BETWEEN A GENETIC MARKER AND A DISEASE?

7.3 WHAT IS SIGNIFICANT EVIDENCE OF LINKAGE?

7.4 PROBLEMS

URLs

8 Model-Free Linkage Analysis for Dichotomous Traits

8.1 WHAT IS THE BASIC IDEA OF MODEL-FREE LINKAGE ANALYSIS?

8.2 WHY IS AFFECTED SIB-PAIR ANALYSIS REASONABLE?

8.3 WHAT ARE COMMON TESTS FOR AFFECTED SIB-PAIR ANALYSIS?

8.4 IS THERE AN OPTIMAL AFFECTED SIB-PAIR TEST? ARE AFFECTED SIB-PAIR TESTS RELATED TO MODEL-BASED LINKAGE TESTS?

8.5 HOW CAN SAMPLE SIZE OR POWER BE CALCULATED FOR AN AFFECTED SIB-PAIR STUDY?

8.6 HOW ARE MODEL-FREE METHODS EXTENDED TO MULTIPLE MARKER LOCI?

8.7 WHAT ARE STANDARD APPROACHES FOR THE ANALYSIS OF LARGE SIBSHIPS?

8.8 HOW CAN THE AFFECTED SIB-PAIR METHOD BE EXTENDED TO ARBITRARY UNILINEAL RELATIONSHIPS?

8.9 WHAT ARE POSSIBLE EXTENSIONS OF MODEL-FREE METHODS FOR DICHOTOMOUS TRAITS?

8.10 PROBLEMS

URLs

9 Model-Free Linkage Analysis for Quantitative Traits

9.1 WHAT ARE ADVANTAGES AND DISADVANTAGES OF QUANTITATIVE TRAITS?

9.2 WHAT IS THE HASEMAN–ELSTON METHOD?

9.3 WHAT ARE COMMON EXTENSIONS OF THE HASEMAN–ELSTON METHOD?

9.4 WHAT ARE VARIANCE COMPONENTS MODELS FOR LINKAGE ANALYSIS?

9.5 HOW SHOULD SIB-PAIRS BE ASCERTAINED FOR LINKAGE ANALYSIS?

9.6 HOW CAN P-VALUES BE DETERMINED EMPIRICALLY?

Algorithm 9.1.

9.7 PROBLEMS

URLs

10 Fundamental Concepts of Association Analyses

10.1 WHAT IS GENETIC ASSOCIATION?

10.3 PROBLEMS

URLs

11 Association Analysis with Unrelated Individuals

11.1 HOW SHOULD CASES AND CONTROLS BE SELECTED?

11.2 HOW CAN GENETIC ASSOCIATION BE TESTED?

Algorithm 11.1.

11.3 WHAT SAMPLE SIZE IS REQUIRED TO TEST FOR ASSOCIATION?

11.4 WHAT IS POPULATION STRATIFICATION, AND WHAT CAN BE DONE?

11.5 WHAT IS INTERACTION?

Algorithm 11.2.

11.6 PROBLEMS

URLs

12 Association Analysis in Families

12.1 WHAT IS THE HAPLOTYPE RELATIVE RISK METHOD?

12.2 WHAT IS THE TRANSMISSION DISEQUILIBRIUM TEST?

12.3 HOW CAN THE RISK BE ESTIMATED FROM FAMILY DATA?

12.4 HOW CAN SAMPLE SIZE AND POWER BE CALCULATED FOR TRIO DATA?

12.5 WHAT ARE ALTERNATIVE STATISTICS TO THE TRANSMISSION DISEQUILIBRIUM TEST?

12.6 HOW CAN MORE THAN TWO ALLELES AT THE MARKER BE ANALYZED IN THE TDT?

Algorithm 12.1.

12.7 HOW CAN DIFFERENT FAMILY STRUCTURES BE ANALYZED?

12.8 HOW CAN ASSOCIATION WITH QUANTITATIVE TRAITS IN FAMILIES BE ANALYZED?

12.9 PROBLEMS

URLs

13 Haplotypes in Association Analyses

13.1 WHY ARE HAPLOTYPES INTERESTING?

13.2 HOW CAN WE INFER HAPLOTYPES?

Algorithm 13.1.

Algorithm 13.2.

13.3 HOW CAN WE TEST FOR ASSOCIATION USING HAPLOTYPES?

13.4 HOW CAN HAPLOTYPES AND LINKAGE DISEQUILIBRIUM STRUCTURE BE UTILIZED TO SELECT MARKERS FOR STUDY?

Algorithm 13.3.

13.5 PROBLEMS

URLs

14 Genome-wide Association Studies

14.1 WHAT DESIGN OPTIONS ARE THERE?

14.2 HOW CAN WE IMPUTE MISSING DATA?

14.3 HOW DO WE EVALUATE THE DATA STATISTICALLY?

14.4 HOW CAN WE DEAL WITH THE PROBLEM OF MULTIPLE TESTING?

Algorithm 14.1.

14.5 HOW DO WE USE ACCUMULATING DATA FROM GENOME-WIDE ASSOCIATION STUDIES?

14.6 WHAT IS THE CLINICAL IMPACT OF RESULTS FROM A GWA STUDY?

14.7 WHAT MAY WE CONCLUDE FROM GENOME-WIDE ASSOCIATION STUDIES, AND WHAT COMES NEXT?

14.8 PROBLEMS

URLs

Appendix Algorithms Used in Linkage Analyses

A.1 WHAT IS THE ELSTON–STEWART ALGORITHM?

A.2 WHAT IS THE LANDER–GREEN ALGORITHM?

A.3 WHAT IS THE CARDON–FULKER ALGORITHM?

A.4 PROBLEM

Solutions to Study Problems

Solution 1.1

Solution 1.2

Solution 1.3

Solution 1.4

Solution 2.1

Solution 2.2

Solution 2.3

Solution 3.1

Solution 3.2

Solution 3.3

Solution 4.1

Solution 4.2

Solution 4.3

Solution 4.4

Solution 4.5

Solution 4.6

Solution 4.7

Solution 4.8

Solution 4.9

Solution 4.10

Solution 5.1

Solution 5.2

Solution 5.3

Solution 5.4

Solution 6.1

Solution 6.2

Solution 7.1

Solution 7.2

Solution 7.3

Solution 7.4

Solution 7.5

Solution 7.6

Solution 7.7

Solution 7.8

Solution 8.1

Solution 8.2

Solution 8.3

Solution 8.4

Solution 8.5

Solution 8.6

Solution 8.7

Solution 8.8

Solution 9.1

Solution 9.2

Solution 9.3

Solution 9.4

Solution 10.1

Solution 10.2

Solution 10.3

Solution 10.4

Solution 10.5

Solution 10.6

Solution 10.7

Solution 10.8

Solution 11.1

Solution 11.2

Solution 11.3

Solution 11.4

Solution 11.5

Solution 11.6

Solution 11.7

Solution 12.1

Solution 12.2

Solution 12.1

Solution 12.2

Solution 12.3

Solution 13.1

Solution 13.2

Solution 14.1

Solution 14.2

Solution A.1

References

Index

The Authors

Prof. Dr. Andreas Ziegler

Dr. Inke R. König

Institut für Medizinische Biometrie und Statistik

Universität zu Lübeck

Maria-Goeppert-Str. 1

23562 Lübeck

Germany

All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Library of Congress Card No.: applied for

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at <http://dnb.d-nb.de>.

© 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.

Composition Uwe Krieg, Berlin

Printing and Binding betz-druck GmbH, Darmstadt

Cover Design Formgeber, Eppelheim

Printed in the Federal Republic of Germany Printed on acid-free paper

ISBN: 978-3-527-32389-0

To Anke

To Marita and Nico Böddeker

Foreword to the First Edition

The field of genetic epidemiology is barely 50 years old and, concomitant with the spectacular technological advances in molecular biology that have taken place over the past 25 years, there have been many advances in statistical methodology developed for the analysis of data specific to this field. Although there is no dearth of textbooks on either statistical methods or epidemiology in general, very little is available for the person who already has a basic statistical background but is new to the special needs of genetic epidemiology. It is therefore with great pleasure that I write a foreword to this new text, written by two expert statisticians who have gathered together the important concepts and methods into a comprehensive introductory textbook for upper-class undergraduate or graduate students. The next decade will see an enormous need for persons to analyze the massive amounts of genetic information that will be produced as a result of the Human Genome Project and the HapMap Project and I see this new text, complete with examples, homework questions and copious references, as filling a need in the training of such persons.

ROBERT C. ELSTON

Cleveland, OH, USA, December 2005

Foreword to the Second Edition

The pace of discovery in human genetics over the past three years has been extraordinary. Before 2006, family- and population-based studies had identified a handful of genetic markers associated with complex human traits and diseases. Now over 1200 markers associated with over 185 traits are known, and it seems every issue of the top genetics journals contains at least one paper linking a new locus to a clinically-relevant trait.

The pace of technological change has also been exceptional. The Human Genome Project took 12 years to produce 3 billion basepairs of sequence; a single well equipped-lab can now produce that much sequence in a week.

While these are exciting times for researchers interested in human genetics, they are also challenging times–especially for students, teachers, and textbook authors. When a paper can go from cutting edge to passé in the time from submission to review, how much more difficult to keep a textbook current!

Drs. Ziegler and König have done a great service by thoroughly updating and revising their textbook. The second edition touches on the most important developments of the last few years, most notably the success of genome-wide association studies. And it keeps the strengths of the previous edition, which include its compatibility with the classroom (see, e.g., the companion web course and the exercises) and the list of internet resources at the end of each chapter. These references are important, because publicly available databases like dbSNP, the HapMap and 1000 Genomes Project are central to the practice of modern genetic epidemiology.

These (for the uninitiated) hard-to-find resources and specialized terminology can make the study of genetic epidemiology appear more difficult than it is. This book gives those looking for an introduction to this vibrant field the basic tools they need to find their own way around current research and to learn by doing.

PETER KRAFT

Boston, MA, USA, December 2009

Preface

This book presents the second edition of a statistical approach to genetic epidemiology. But what do we mean by genetic epidemiology? According to the definition of Khoury, Beaty, and Cohen [353], it is the discipline investigating genetic and environmental factors that influence the development and distribution of diseases. It differs from epidemiology in that explicitly genetic factors and similarities within families are taken into account. On the other hand, it can be distinguished from medical genetics by considering populations rather than single patients or families.

As the name implies, genetic epidemiology is an interdisciplinary subject, and it is a working field for scientists from different backgrounds. In contrast to many outstanding textbooks on molecular genetic techniques, such as Human Molecular Genetics 3 by Strachan and Read [636], this book introduces statistical concepts to current approaches used in genetic epidemiology. It is written at a level that it should make it useful to undergraduate and graduate students as well as researchers. The necessary background in statistics is an introductory course to statistical testing and estimation. Excellent books allowing one to revive this knowledge include, for example, Ref. [498]; for a little knowledge about likelihood ratio, score, and Wald tests, see, for example, Hills [295], Section “Hypothesis Testing,” or Kleinbaum [358], pp. 128–136.

Just like Gaul at Cesar’s time [99], this volume is divided into three different parts. The material in this text assumes no background in biology, molecular biology, or genetics. The required fundamentals of these topics are described in the first part of this book, covering Chapters 1 through 4. Chapter 1 provides an introduction to the basicmolecular genetic mechanisms that are required to serve as a background for understanding the statistical methods in the later chapters. However, for comprehensive overviews on human molecular genetics the reader may refer to standard textbooks [254, 636]. Mendel’s laws and their consequences for familial inheritance patterns as well as the important population genetic Hardy-Weinberg law are discussed in Chapter 2 on formal genetics. Genetic variability is the key to studying the genetic architecture of a disease, and it can be measured by genetic markers, which are the topic of Chapter 3. Two different types of molecular genetic markers are at present in use, and we discuss these together with standard nomenclature. Because different molecular biological technologies have various pros and cons, we have integrated a short description of the most important technologies, and this section has been written by our colleague Jeanette Erdmann, an experienced molecular biologist. Before we introduce the techniques for statistically analyzing genetic disorders, Chapter 4 considers data quality control in genetic epidemiological studies. Caused by the new chip technologies, data quality control has increased dramatically in its importance, and this chapter has therefore more than tripled in size compared to our first edition.

The second part of this book on segregation and linkage analysis starts with Chapter 5 on genetic distance. Many colleagues have suggested to include an additional chapter on family studies not involving genetic markers. We have therefore added a corresponding new Chapter 6 and restructured the subsequent chapters on linkage analysis. An in-depth understanding of segregation analysis relies on a specific standard algorithm. To thoroughly comprehend segregation analysis, a specific standard algorithm is required, which is described in the Appendix owing to its high technical level. Model-based linkage analysis is the topic of Chapter 7, while model-free linkage approaches to dichotomous and quantitative traits are considered in Chapters 8 and 9, respectively. An in-depth understanding of the linkage methods relies on some standard algorithms enabling one to deduce genetic marker information. Because these are quite technical, they have been placed in the Appendix.

While linkage analysis relies on segregation information in families, association studies focus on differences at the population level. They are the subject of the last part of this book, starting with Chapter 10, where the fundamental concepts of association analyses are discussed. The analysis of association itself using single markers is presented in Chapters 11 and 12. Chapter 11 deals with studies utilizing data from unrelated individuals, and it has undergone a substantial revision and extension. Specifically, tests and estimates are described separately, and population based measures are described in greater detail. The different test statistics are derived, and equivalent formulations are presented. The section on controlling population stratification has been supplemented by new methods, and a section on studying interactions has been added. Chapter 12 is concerned with family-based association studies, and Chapter 13 is devoted to the topic of studying haplotypes. Genomewide association studies have been a great success for unraveling the genetic basis of complex genetic diseases in the past three years, and they are described in the new Chapter 14.

At the end of each chapter, the reader will find a number of problems covering both theoretical derivations and practical calculations. They can be solved by paper and pencil and require only the help of a pocket calculator. However, we refer to the software that could be utilized for computation in the text and list the relevant URLs at the end of each chapter. The solutions to all problems are detailed at the end of the book. Throughout the book, new, relevant terminology is indicated in italics where first introduced.

An important change to the first edition both in content and in presentation is the distinction between two different versions of this book. While one version is the standard printed one, the other version provides access to an online course developed and implemented by our colleague Friedrich Pahlke. The online course has been tested over several years, and we have found it a very useful resource for students, covering the first and the third part of the book, i.e., the molecular genetic background and association analysis. We teach the complete book in a two semester course with 8 points according to the European Credit Transfer and Accumulation System (ECTS) and the online course with 4 ECTS points on an annual basis. The present form of the distance learning course is an beginner’s course. It starts with a distance learning phase for Chapters 1 through 4, followed by a 2-day phase of attendance for Chapters 9, 10, 11, 14 and either 12 or 13. This course has the value of 4 ECTS. If the course is taught with a 3-day phase of attendance, we value it 5.5 ECTS. For the additional day, we recommended a more intensive discussion of Chapters 11 and 14.

ANDREAS ZIEGLER AND INKE R. KÖNIG

Lübeck, Germany

Acknowledgments

This book would not have been completed without the help of many colleagues from our Institute of Medical Biometry and Statistics. Friedrich Pahlke again wins the pole position for preparing the online course available with one version of this book and almost all of the figures. He was funded by the grant “Training in Genetic Epidemiology” from the German Ministry of Education and Research within the National Genome Research Network. We are extremely grateful to Jeanette Erdmann who has written the section on molecular genetic technologies. Many of the new results presented in the chapter on data quality emerged from joint work of A.Z. with our colleague Arne Schillert. We express our special thanks to our PhD student Christina Loley and our former PhD student Anika Großhennig who contributed to several parts of the book.

Innumerable suggestions for improvement were made by our Computational Life Science students from the University at Lubeck during our two-semester long courses. We are also grateful to all colleagues who have been trained remote and in Lübeck in our course “Training in Genetic Epidemiology” over the past four years. We sincerely apologize for all the errors of the first edition, but at the same time we are very grateful to all colleagues who have made us aware of these.

Wolfgang Lieb, Jeanette Erdmann, Katrina A. B. Goddard, Suzanne Margaret Leal, Lize van der Merwe, and Konstantin Strauch helped us substantially to improve the manuscript by commenting on specific chapters. Furthermore, we gratefully acknowledge Andrew Collins, Jochen Hampe, Tanja Zeller, and the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory for providing data for our independent analysis. We are very grateful to Yvonne Eckel-Fiedler, president of the Retriever Club Europa, for providing beautiful photos of her dogs.

Many thanks go to our collaborators who over the past years have supplied us ideas and problems as well as data and figures for this book. They have motivated us to further work in this area, and we are especially grateful to Jeanette Erdmann and Heribert Schunkert from the Medical Clinic II Lübeck, Tanja Zeller and Stefan Blankenberg from the Department of Medicine II Mainz, the Cardiogenics Project Group, Hans Konrad Schackert and Guido Fitze from the Arbeitsgruppe Chirurgische Forschung, Dresden, and Christine Klein from the Clinical Molecular Neurogenetics Group, Lübeck.

This work has its roots in many courses, and the one that can be seen as its foundation was held by A.Z. in 2000 as a 1-week course in Pelot´as, Brazil. It was outstandingly organized by Hiram Larangeira de Almeida Jr., and A.Z. will never forget the hummingbirds and the fresh fruit on his hacienda. More fruitful discussions helped the first edition of this book on its way during the sabbatical of A.Z. in 2005 at the Division of Molecular and Genetic Epidemiology, headed by Robert C. Elston, in Cleveland, Ohio.

We are also indebted to Hugo Marth from the Graphical Office at the University at Lübeck for his expert design of the book cover and to Gregor Cicchetti and Andreas Sendtko from Wiley-VCH for their constant excellent support. With this edition, the copyright of the figures has been transferred to Wiley-VCH. However, we still have the permission for using all own material for other publications.

Finally, acknowledgments go to our families. From I.R.K. to my husband Peter and my son Timotheus for sacrificing weekends and bolstering me up with loving patience and trust. Also, to my father-in-law Karl for sharing helpful experience. From A.Z. to my daughters Rebecca Elisabeth and Sarah Johanna for pointing out that there is a life beyond writing a book and, finally, to my loving wife Anke who accepted the unpredictable hours and always strongly encouraged me to complete this work.

A. Z. and I. R. K.