99,99 €
This ready reference discusses different methods for statistically analyzing and validating data created with high-throughput methods. As opposed to other titles, this book focusses on systems approaches, meaning that no single gene or protein forms the basis of the analysis but rather a more or less complex biological network. From a methodological point of view, the well balanced contributions describe a variety of modern supervised and unsupervised statistical methods applied to various large-scale datasets from genomics and genetics experiments. Furthermore, since the availability of sufficient computer power in recent years has shifted attention from parametric to nonparametric methods, the methods presented here make use of such computer-intensive approaches as Bootstrap, Markov Chain Monte Carlo or general resampling methods. Finally, due to the large amount of information available in public databases, a chapter on Bayesian methods is included, which also provides a systematic means to integrate this information. A welcome guide for mathematicians and the medical and basic research communities.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 527
Veröffentlichungsjahr: 2012
Contents
Cover
Titles of the Series: “Quantitative and Network Biology”
Related Titles
Title Page
Copyright
Preface
References
List of Contributors
Part One: General Overview
Chapter 1: Control of Type I Error Rates for Oncology Biomarker Discovery with High-Throughput Platforms
1.1 Brief Summary
1.2 Introduction
1.3 High-Throughput Platforms
1.4 Analysis of Experiments
1.5 Multiple Testing Type I Errors
1.6 Discussion
1.7 Perspective
References
Chapter 2: Overview of Public Cancer Databases, Resources, and Visualization Tools
2.1 Brief Overview
2.2 Introduction
2.3 Different Cancer Types are Genetically Related
2.4 Incidence and Mortality Rates of Cancer
2.5 Cancer and Disorder Databases
2.6 Visualization and Network-Based Analysis Tools
2.7 Conclusions
2.8 Perspective
References
Part Two: Bayesian Methods
Chapter 3: Discovery of Expression Signatures in Chronic Myeloid Leukemia by Bayesian Model Averaging
3.1 Brief Introduction
3.2 Chronic Myeloid Leukemia (CML)
3.3 Variable Selection on Gene Expression Data
3.4 Bayesian Model Averaging (BMA)
3.5 Case Study: CML Progression Data
3.6 The Power of iBMA
3.7 Laboratory Validation
3.8 Conclusions
3.9 Perspective
3.10 Publicly Available Resources
Acknowledgments
References
Chapter 4: Bayesian Ranking and Selection Methods in Microarray Studies
4.1 Brief Summary
4.2 Introduction
4.3 Hierarchical Mixture Modeling and Empirical Bayes Estimation
4.4 Ranking and Selection Methods
4.5 Simulations
4.6 Application
4.7 Concluding Remarks
4.8 Perspective
4.9 Appendix: The EM Algorithm
References
Chapter 5: Multiclass Classification via Bayesian Variable Selection with Gene Expression Data
5.1 Brief Summary
5.2 Introduction
5.3 Matrix Variate Distribution
5.4 Method
5.5 Real Data Analysis
5.6 Discussion
5.7 Perspective
References
Chapter 6: Semisupervised Methods for Analyzing High-dimensional Genomic Data
6.1 Brief Summary
6.2 Motivation
6.3 Existing Approaches
6.4 Data Application: Mesothelioma Cancer Data Set
6.5 Perspective
References
Part Three: Network-Based Approaches
Chapter 7: Colorectal Cancer and Its Molecular Subsystems: Construction, Interpretation, and Validation
7.1 Brief Summary
7.2 Colon Cancer: Etiology
7.3 Colon Cancer: Development
7.4 The Pathway Paradigm
7.5 Cancer Subtypes and Therapies
7.6 Molecular Subsystems: Introduction
7.7 Molecular Subsystems: Construction
7.8 Molecular Subsystems: Interpretation
7.9 Molecular Subsystems: Validation
7.10 Worked Example: Label-Free Proteomics
7.11 Conclusions
7.12 Perspective
References
Chapter 8: Network Medicine: Disease Genes in Molecular Networks
8.1 Brief Summary
8.2 Introduction
8.3 Genetic Architecture of Human Diseases
8.4 Systems Properties of Disease Genes
8.5 Disease Gene Prioritization
8.6 Conclusion
8.7 Perspectives
8.8 Acknowledgments
References
Chapter 9: Inference of Gene Regulatory Networks in Breast and Ovarian Cancer by Integrating Different Genomic Data
9.1 Brief Summary
9.2 Introduction
9.3 Theory and Contents of Gene Regulatory Network
9.4 Inference of Gene Regulatory Networks in Human Cancer
9.5 Conclusions
9.6 Perspective
References
Chapter 10: Network-Module-Based Approaches in Cancer Data Analysis
10.1 Brief Summary
10.2 Introduction
10.3 Notation and Terminology
10.4 Network Modules Containing Functionally Similar Genes or Proteins
10.5 Network Module Searching Methods
10.6 Applications of Network-Module-Based Approaches in Cancer Studies
10.7 The Reactome FI Cytoscape Plug-in
10.8 Conclusions
10.9 Perspective
References
Chapter 11: Discriminant and Network Analysis to Study Origin of Cancer
11.1 Brief Summary
11.2 Introduction
11.3 Overview of Relevant Machine Learning Techniques
11.4 Methods
11.5 Experiments and Results
11.6 Conclusion
11.7 Perspective
References
Chapter 12: Intervention and Control of Gene Regulatory Networks: Theoretical Framework and Application to Human Melanoma Gene Regulation
12.1 Brief Summary
12.2 Gene Regulatory Network Models
12.3 Intervention in Gene Regulatory Networks
12.4 Optimal Perturbation Control of Gene Regulatory Networks
12.5 Human Melanoma Gene Regulatory Network
12.6 Perspective
References
Part Four: Phenotype Influence of DNA Copy Number Aberrations
Chapter 13: Identification of Recurrent DNA Copy Number Aberrations in Tumors
13.1 Introduction
13.2 Genetic Background
13.3 Analyzing DNA Copy Number: Single Sample Methods
13.4 Analyzing DNA Copy Number Data: Multiple Sample Methods to Detect Recurrent CNAs
13.5 Analyzing DNA Copy Number Data with DiNAMIC
13.6 Open Questions
References
Chapter 14: The Cancer Cell, Its Entropy, and High-Dimensional Molecular Data
14.1 Brief Summary
14.2 Introduction
14.3 Background
14.4 Entropy Increase
14.5 Statistical Arguments
14.6 Statistical Methodology
14.7 Simulation
14.8 Application to Cancer Data
14.9 Conclusion
14.10 Perspective
14.11 Software
Acknowledgment
References
Index
Titles of the Series
“Quantitative and Network Biology”
Volume 1
Dehmer, M., Emmert-Streib, F., Graber, A., Salvador, A. (eds.)
Applied Statistics for Network Biology
Methods in Systems Biology
2011
ISBN: 978-3-527-32750-8
Volume 2
Dehmer, M., Varmuza, K., Bonchev, D.(eds.)
Statistical Modelling of Molecular Descriptors in QSAR/QSPR
2012
ISBN: 978-3-527-32434-7
Zhou, X.-H., Obuchowski, N. A., McClish, D. K.
Statistical Methods in Diagnostic Medicine
2011
ISBN: 978-0-470-18314-4
Azuaje, F.
Bioinformatics and Biomarker Discovery
“Omic” Data Analysis for Personalized Medicine
2010
ISBN: 978-0-470-74460-4
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty can be created or extended by sales representatives or written sales materials. The Advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Card No.: applied for
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.
©2013 Wiley-VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Print ISBN: 978-3-527-33262-5
ePDF ISBN: 978-3-527-66544-0
ePub ISBN: 978-3-527-66545-7
mobi ISBN: 978-3-527-66546-4
oBook ISBN: 978-3-527-66547-1
Typesetting Thomson Digital, Noida, India
Cover Design Grafik-Design Schulz, Fußgönheim
Preface
The data revolution in biology and medicine provides not only opportunities in enhancing our fundamental understanding of biological processes, patho- and tumorigenesis, and epidemiology but also constitutes a considerable challenge toward their analysis. For this reason, novel statistical and computational approaches are required to unravel the mass data provided by contemporary sequencing and array technologies [4, 5].
The aim of the book Statistical Diagnostics for Cancer: Analyzing High-Dimensional Data is to present statistical methods focusing on a systems level that can be applied to a wide spectrum of genetics and genomics data from high-throughput experiments of cancer. Due to the breathtaking progress during the last years in biology, many experimental approaches that originated in molecular and cell biology are now at the verge to enter medical research. For this reason, the major goal of the present book is to advocate and promote novel analysis methods that hold great promise to be beneficial for prognostic and diagnostic purposes in biomedical research.
Along the way toward this goal, we are facing several principle problems which need to be addressed systematically [1, 2, 3]. In this respect, three problems are of particular importance. First, in contrast to traditional clinical data, data from high-throughput experiments are very high dimensional involving thousands or even tens of thousands of variables. Usually, this requires a dimension reduction or a variable selection to tame the associated computational complexity of such high-dimensional problems. Second, due to the molecular dependence of gene products on each other there is a nonnegligible heterogeneity in these data-possessing difficulties for parametric statistical models. For this reason, nonparametric methods, for example, bootstrap or resampling methods are frequently used. Third, it is more and more common that high-throughput data from different technologies are simultaneously available, which requires their meaningful integration.
According to the World Health Organization (WHO), cancer is one of the leading causes of death in the developed countries. For this reason, we are focusing in this book entirely on this menace to the health and the chapters are discussing a large variety of different methods applied to different cancer types. For example, investigations of breast cancer, cervical cancer, colorectal cancer, lung cancer, leukemia, lymphoma, melanoma, ovarian cancer, and prostate cancer are presented in a way that highlights the obtained genetic and molecular understanding of these complex diseases, but provide also a thorough explanation of the statistical methods.
This book is intended for researches, graduate, and advanced undergraduate students in the interdisciplinary fields of computational biology, biostatistics, bioinformatics, and systems biology studying problems in biological and biomedical sciences. Each chapter is comprehensively presented, accessible not only to researchers from this field but also to interested students or scientists specialized in related areas. To enable this, each chapter presents not only technical results but provides in addition background knowledge which is necessary to understand the statistical method or the biological problem under consideration. In addition, each chapter starts with a section called “Brief Summary” and finishes with a section “Perspective.” These sections are nontechnical in nature providing the reader with a brief overview of the presented topic. These features allow us to use this book as a textbook for, for example, an interdisciplinary seminar for advanced students.
In Figure 1, we show an overview of all chapters in this book. Due to the complexity of general approaches to cancer, it is not possible to categorize the chapters uniquely by just one keyword. For this reason, we provide in Figure 1 a three-dimensional categorization, which is based on (1) the used data types, (2) statistical and computational methods, and (3) the studied cancer types. For each of these conceptual categories, we use a color code, as provided in Figure 1. In addition, the book is organized in four parts. In the first part, chapters present a general overview of generic methods and data used in the remainder of the book. The second part focuses on Bayesian methods and the third part on network-based approaches. Finally, part four contributed chapters describing the influence of DNA copy number abberrations on the phenotype. This conceptual overview may be useful for the reader to find quickly a specific chapter that deals with a particular subset of cancer types or statistical methods.
Figure 1 Brief overview of the book chapters with respect to the three major conceptual topics: high-throughput data types (a), statistical and computational methods (b), and cancer types (c).
Many colleagues, whether consciously or unconsciously, have provided us with input, help, and support before and during the preparation of the present book. In particular, we would like to thank Andreas Albrecht, Gökmen Altay, Subhash Basak, Jaine Blayney, Danail Bonchev, Frederick Campbell, Aedin Culhane, Maria Duca, Dean Fennell, Galina Glazko, Armin Graber, Beryl Graham, Benjamin Haibe-Kains, Peter Hamilton, Des Higgins, Maria Hughes, Patrick Johnston, Frank Kee, Declan Kieran, Chang Sik Kim, Terry Lappin, Kang Li, D. D. Lozovanu, Florian Markowetz, Darragh McArt, Dennis McCance, James McCann, Abbe Mowshowitz, Ken Mills, Paul Mullan, Arcady Mushegian, Katie Orr, Andrei Perjan, John Quackenbush, Andre Ribeiro, Bert Rima, Sudhakar Sahoo, Ricardo de Matos Simoes, Francesca Shearer, John Story, Simon Tavaré, Shailesh Tripathi, Peter Valent, Kurt Varmuza, Yinhai Wang, Kathleen Williamson, Shu-Dong Zhang, and apologize to all who have not been named mistakenly. We would also like to thank our editors Andreas Sendtko and Gregor Cicchetti from Wiley-Blackwell who have been always available and helpful.
Finally, we hope this book helps to spread out the enthusiasm and joy we have for this field and inspires people regarding their own practical or theoretical research problems.
Belfast and Hall/Tyrol, May 2012
Frank Emmert-Streib and Matthias Dehmer
References
1. Alon, U. (2006) An Introduction to Systems Biology: Design Principles of Biological Circuits, Chapman & Hall/CRC, Boca Raton, FL.
2. Barabasi, A. L. and Oltvai, Z. N. (2004) Network biology: Understanding the cell's functional organization. Nat. Rev. Genet., 5, 101–113.
3. von Bertalanffy, L. (1950) An outline of general systems theory. Brit. J. Philos. Sci., 1 (2), 134–165.
4. Emmert-Streib, F. and Glazko, G. (2011) Network biology: a direct approach to study biological function. WIREs Syst. Biol. Med., 3 (4), 379–391.
5. Palsson, B.O. (2006) Systems Biology: Properties of Reconstructed Networks, Cambridge University Press, Cambridge, UK.
List of Contributors
Yang Aijun
Nanjing Audit University
School of Finance
Jiangsu, 211815
China
Nidhal Bouaynaya
University of Arkansas at Little Rock
Department of Systems Engineering
ETAS 300 H., 2801 S. University Ave.
Little Rock, AR 72204
USA
Mark R. Chance
Case Western Reserve University
Center for Proteomics and Bioinformatics
10900 Euclid Ave., BRB 930
Cleveland, OH 44106-4988
USA
Sreenivas Chavali
MRC Laboratory of Molecular Biology
Hills Road
Cambridge CB2 0QH
UK
Li Chen
Johns Hopkins University
Department of Pathology
School of Medicine
1550 Orleans Street
Baltimore, MD 21231
USA
Matthias Dehmer
UMIT
Institut für Bioinformatik und Translationale Forschung
Eduard Wallnöfer Zentrum 1
6060 Hall/Tyrol
Austria
Ricardo de Matos Simoes
Max F. Perutz Laboratories
Center for Integrative Bioinformatics Vienna
Dr. Bohr Gasse 9
1030 Vienna
Austria
Frank Emmert-Streib
Queen's University Belfast
Computational Biology and Machine Learning Lab
Center for Cancer Research and Cell Biology
97 Lisburn Road
Belfast BT9 7BL
UK
Hassan M. Fathallah-Shaykh
University of Alabama at Birmingham
Department of Neurology School of Medicine
FOT 1020, 510 20th St South
Birmingham, AL 35294-3410
USA
Fei Gu
The Ohio State University
Department of Biomedical Informatics
Columbus, OH 43210
USA
D. Neil Hayes
University of North Carolina at Chapel Hill
UNC Lineberger Comprehensive Cancer Center
School of Medicine CB#7295
450 West Drive
Chapel Hill, NC 27599-7295
USA
Victor X. Jin
The Ohio State University
Department of Biomedical Informatics
Columbus, OH 43210
USA
Kartiek Kanduri
Turku Centre for Biotechnology
Turku
Finland
Devin C. Koestler
Dartmouth Medical School One Medical Center Drive
Section of Biostatistics & Epidemiology
7927 Rubin Building
Lebanon, NH 03756
USA
Song Liu
University at Buffalo
Department of Biostatistics
723 Kimball Tower
New York, NY 14214
USA
Shigeyuki Matsui
Kyoto University School of Public Health
Department of Biostatistics
Yoshida Konoe-cho, Sakyo-ku
Kyoto, 606-8501
Japan
Jeffrey Miecznikowski
University at Buffalo
Department of Biostatistics
723 Kimball Tower
New York, NY 14214
USA
David J. Miller
Johns Hopkins University
Department of Pathology
School of Medicine
1550 Orleans Street
Baltimore, MD 21231
USA
Andrew B. Nobel
University of North Carolina at Chapel Hill
Department of Statistics and Operations Research
Hanes Hall, CB#3260 Chapel Hill, NC 27599-3260
USA
Hisashi Noma
Kyoto University School of Public Health
Department of Biostatistics
Yoshida Konoe-cho, Sakyo-ku
Kyoto, 606-8501
Japan
Vishal N. Patel
Case Western Reserve University
Center for Proteomics and Bioinformatics
10900 Euclid Ave., BRB 930
Cleveland, OH 44106-4988
USA
Dan Schonfeld
UIC College of Engineering
Electrical and Computer Engineering
851 S. Morgan M/C 154
Chicago, IL 60607
USA
Ie-Ming Shih
Johns Hopkins University
Department of Pathology
School of Medicine
1550 Orleans Street
Baltimore, MD 21231
USA
Roman Shterenberg
University of Alabama at Birmingham
Department of Mathematics
452 Campbell Hall 1300 University Boulevard
Birmingham, AL 35294-1170
USA
Lincoln Stein
Stony Brook University
Department of Biomedical Engineering
Stony Brook, NY 11794
USA
Binhua Tang
The Ohio State University
Department of Biomedical Informatics
Columbus, OH 43210
USA
Ye Tian
Virginia Tech Research Center – Arlington
The Bradley Department of Electrical and Computer Engineering
900 N. Glebe Road
Arlington, VA 22201
USA
Shailesh Tripathi
Queen's University Belfast
Computational Biology and Machine Learning Lab
Center for Cancer Research and Cell Biology
97 Lisburn Road
Belfast BT9 7BL
UK
Aad W. van der Vaart
Vrije Universiteit
Department of Mathematics
Faculty of Sciences
De Boelelaan 1081a
1081 HV Amsterdam
The Netherlands
Wessel N. van Wieringen
Vrije Universiteit
Department of Mathematics
Faculty of Sciences
De Boelelaan 1081a
1081 HV Amsterdam
The Netherlands
Vonn Walter
University of North Carolina at Chapel Hill
UNC Lineberger Comprehensive Cancer Center
School of Medicine CB#7295
450 West Drive
Chapel Hill, NC 27599-7295
USA
Dan Wang
University at Buffalo
Department of Biostatistics
723 Kimball Tower
New York, NY 14214
USA
Yue Wang
Virginia Tech
Virginia Tech Research Center – Arlington
The Bradley Department of Electrical and Computer Engineering
900 N. Glebe Road
Arlington, VA 22201
USA
Fred A. Wright
University of North Carolina at Chapel Hill
Department of Biostatistics
4115B McGavran-Greenberg
135 Dauer Drive, Campus Box 7420
Chapel Hill, NC 27599-7420
USA
Guanming Wu
MaRS Centre
Ontario Institute for Cancer Research
South Tower
101 College Street, Suite 800
Toronto, ON M5G 0A3
Canada
Song Xinyuan
The Chinese University of Hong Kong
Department of Statistics
Hong Kong SAR
The People's Republic of China
Ka Yee Yeung
University of Washington
Department of Microbiology
Seattle, WA 98195-8070
USA
Guoqiang Yu
Johns Hopkins University
Department of Pathology
School of Medicine
1550 Orleans Street
Baltimore, MD 21231
USA
Li Yunxian
Yunnan University of Economics and Finance
School of Finance
Yunnan
China
Part One
General Overview
1
Control of Type I Error Rates for Oncology Biomarker Discovery with High-Throughput Platforms
Jeffrey Miecznikowski, Dan Wang, and Song Liu
This chapter provides an overview of the genetic and proteomic high-throughput platforms and the statistical methods used to evaluate molecular biomarkers for cancer diagnosis. Commonly, these experimental platforms are used in cancer diagnosis where the biomarkers can be used to determine cancer subtypes and thus potential treatments. Because of the large amount of data from these platforms, accurate testing methods are necessary. In this chapter, we highlight the statistical methods used to evaluate each potential biomarker and limit the number of false positives under a specific error rate.
Since the invention of microarray technology and related high-throughput technologies, researchers have been able to compile large amount of information. This amount of information enables researchers to uncover potentially new targets for therapies or to enhance our knowledge of biological systems. These high-throughput platforms have become commonly used experimental platforms in the biological realm [1]. A high-throughput platform is designed to measure large numbers (thousands or millions) of signatures in a biological organism at a given time point. These platforms are a function of the postgenomic era and are often used to determine how genomic expression is regulated or involved in biological processes. These platforms often use hybridization and sequence-based technologies such as gene expression microarrays and RNA-Seq platforms.
Specifically, these platforms and technologies have revolutionized the way researchers study cancer, especially with regard to diagnosis and prognosis. Current cancer classification consists of more than 200 subtypes of cancer [2]. In order to receive the most appropriate therapy, the clinician must identify as accurately as possible the cancer subtype, stage, and/or grade. Clinicians commonly use morphologic characteristics of biopsy specimens but “it gives very limited information and clearly misses much important tumor aspects such as rate of proliferation, capacity for invasion and metastases, and development of resistance mechanisms to certain treatment agents” [3]. Therefore, in order to improve these classification methods, new molecular diagnostic methods are needed. Thus, the huge amount of molecular information that can be extracted and integrated to find common patterns is a major advantage of these high-throughput platforms. These new technologies will allow researchers to enhance cancer diagnostics by (1) classifying tumor samples into known and new taxonomic categories, (2) discovering new diagnostic and therapeutic markers, and (3) identifying new subtypes that correlate with treatment outcome.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!