160,99 €
This book represents the most comprehensive and up-to-date collection of information on the topic of computational molecular biology. Bringing the most recent research into the forefront of discussion, Algorithms in Computational Molecular Biology studies the most important and useful algorithms currently being used in the field, and provides related problems. It also succeeds where other titles have failed, in offering a wide range of information from the introductory fundamentals right up to the latest, most advanced levels of study.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1927
Veröffentlichungsjahr: 2011
CONTENTS
COVER
HALF TITLE PAGE
TITLE PAGE
COPYRIGHT PAGE
DEDICATION
PREFACE
CONTRIBUTORS
SERIES PAGE
PART I: STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES
CHAPTER 1: STRING DATA STRUCTURES FOR COMPUTATIONAL MOLECULAR BIOLOGY
1.1 INTRODUCTION
1.2 MAIN STRING INDEXING DATA STRUCTURES
1.3 INDEX STRUCTURES FOR WEIGHTED STRINGS
1.4 INDEX STRUCTURES FOR INDETERMINATE STRINGS
1.5 STRING DATA STRUCTURES IN MEMORY HIERARCHIES
1.6 CONCLUSIONS
REFERENCE
CHAPTER 2: EFFICIENT RESTRICTED-CASE ALGORITHMS FOR PROBLEMS IN COMPUTATIONAL BIOLOGY
2.1 THE NEED FOR SPECIAL CASES
2.2 ASSESSING EFFICIENT SOLVABILITY OPTIONS FOR GENERAL PROBLEMS AND SPECIAL CASES
2.3 STRING AND SEQUENCE PROBLEMS
2.4 SHORTEST COMMON SUPERSTRING
2.5 LONGEST COMMON SUBSEQUENCE
2.6 COMMON APPROXIMATE SUBSTRING
2.7 CONCLUSION
REFERENCES
CHAPTER 3: FINITE AUTOMATA IN PATTERN MATCHING
3.1 INTRODUCTION
3.2 DIRECT USE OF DFA IN STRINGOLOGY
3.3 NFA SIMULATION
3.4 FINITE AUTOMATON AS MODEL OF COMPUTATION
3.5 FINITE AUTOMATA COMPOSITION
3.6 SUMMARY
REFERENCES
CHAPTER 4: NEW DEVELOPMENTS IN PROCESSING OF DEGENERATE SEQUENCES
4.1 INTRODUCTION
4.2 BACKGROUND
4.3 BASIC DEFINITIONS
4.4 REPETITIVE STRUCTURES IN DEGENERATE STRINGS
4.5 CONSERVATIVE STRING COVERING IN DEGENERATE STRINGS
4.6 CONCLUSION
REFERENCE
CHAPTER 5: EXACT SEARCH ALGORITHMS FOR BIOLOGICAL SEQUENCES
5.1 INTRODUCTION
5.2 SINGLE PATTERN MATCHING ALGORITHMS
5.3 ALGORITHMS FOR MULTIPLE PATTERNS
5.4 APPLICATION OF EXACT SET PATTERN MATCHING FOR READ MAPPING AND COMPARISON WITH MAPPING TOOLS
5.5 CONCLUSIONS
REFERENCE
CHAPTER 6: ALGORITHMIC ASPECTS OF ARC-ANNOTATED SEQUENCES
6.1 INTRODUCTION
6.2 PRELIMINARIES
6.3 LONGEST ARC-PRESERVING COMMON SUBSEQUENCE
6.4 ARC-PRESERVING SUBSEQUENCE
6.5 MAXIMUM ARC-PRESERVING COMMON SUBSEQUENCE
6.6 EDIT DISTANCE
REFERENCE
CHAPTER 7: ALGORITHMIC ISSUES IN DNA BARCODING PROBLEMS
7.1 INTRODUCTION
7.2 TEST SET PROBLEMS: A GENERAL FRAMEWORK FOR SEVERAL BARCODING PROBLEMS
7.3 A SYNOPSIS OF BIOLOGICAL APPLICATIONS OF BARCODING
7.4 SURVEY OF ALGORITHMIC TECHNIQUES ON BARCODING
7.5 INFORMATION CONTENT APPROACH
7.6 SET-COVERING APPROACH
7.7 EXPERIMENTAL RESULTS AND SOFTWARE AVAILABILITY
7.8 CONCLUDING REMARKS
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 8: RECENT ADVANCES IN WEIGHTED DNA SEQUENCES
8.1 INTRODUCTION
8.2 PRELIMINARIES
8.3 INDEXING
8.4 PATTERN MATCHING
8.5 APPROXIMATE PATTERN MATCHING
8.6 REPETITIONS, COVERS, AND TANDEM REPEATS
8.7 MOTIF DISCOVERY
8.8 CONCLUSIONS
REFERENCES
CHAPTER 9: DNA COMPUTING FOR SUBGRAPH ISOMORPHISM PROBLEM AND RELATED PROBLEMS
9.1 INTRODUCTION
9.2 DEFINITIONS OF SUBGRAPH ISOMORPHISM PROBLEM AND RELATED PROBLEMS
9.3 DNA COMPUTING MODELS
9.4 THE STICKER-BASED SOLUTION SPACE
9.5 ALGORITHMS FOR SOLVING PROBLEMS
9.6 EXPERIMENTAL DATA
9.7 CONCLUSION
REFERENCE
PART II: ANALYSIS OF BIOLOGICAL SEQUENCES
CHAPTER 10: GRAPHS IN BIOINFORMATICS
10.1 GRAPH THEORY—ORIGIN
10.2 GRAPHS AND THE BIOLOGICAL WORLD
10.3 CONCLUSION
REFERENCES
CHAPTER 11: A FLEXIBLE DATA STORE FOR MANAGING BIOINFORMATICS DATA
11.1 INTRODUCTION
11.2 DATA MODEL AND SYSTEM OVERVIEW
11.3 REPLICATION AND LOAD BALANCING
11.4 EVALUATION
11.5 RELATED WORK
11.6 SUMMARY
REFERENCE
CHAPTER 12: ALGORITHMS FOR THE ALIGNMENT OF BIOLOGICAL SEQUENCES
12.1 INTRODUCTION
12.2 ALIGNMENT ALGORITHMS
12.3 SCORE FUNCTIONS
12.4 BENCHMARKS
12.5 CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 13: ALGORITHMS FOR LOCAL STRUCTURAL ALIGNMENT AND STRUCTURAL MOTIF IDENTIFICATION
13.1 INTRODUCTION
13.2 PROBLEM DEFINITION OF LOCAL STRUCTURAL ALIGNMENT
13.3 VARIABLE-LENGTH ALIGNMENT FRAGMENT PAIR (VLAFP) ALGORITHM
13.4 STRUCTURAL ALIGNMENT BASED ON CENTER OF GRAVITY: SACG
13.5 SEARCHING STRUCTURAL MOTIFS
13.6 USING SACG ALGORITHM FOR CLASSIFICATION OF NEW PROTEIN STRUCTURES
13.7 EXPERIMENTAL RESULTS
13.8 ACCURACY RESULTS
13.9 CONCLUSION
ACKNOWLEDGMENTS
REFERENCE
CHAPTER 14: EVOLUTION OF THE CLUSTAL FAMILY OF MULTIPLE SEQUENCE ALIGNMENT PROGRAMS
14.1 INTRODUCTION
14.2 CLUSTAL-CLUSTALV
14.3 CLUSTALW
14.4 CLUSTALX
14.5 CLUSTALW AND CLUSTALX 2.0
14.6 DBCLUSTAL
14.7 PERSPECTIVES
REFERENCES
CHAPTER 15: FILTERS AND SEEDS APPROACHES FOR FAST HOMOLOGY SEARCHES IN LARGE DATASETS
15.1 INTRODUCTION
15.2 METHODS FRAMEWORK
15.3 LOSSLESS FILTERS
15.4 LOSSY SEED-BASED FILTERS
15.5 CONCLUSION
15.6 ACKNOWLEDGMENTS
REFERENCES
CHAPTER 16: NOVEL COMBINATORIAL AND INFORMATION-THEORETIC ALIGNMENT-FREE DISTANCES BIOLOGICAL DATA MINING
16.1 INTRODUCTION
16.2 INFORMATION-THEORETIC ALIGNMENT-FREE METHODS
16.3 COMBINATORIAL ALIGNMENT-FREE METHODS
16.4 ALIGNMENT-FREE COMPOSITIONAL METHODS
16.5 ALIGNMENT-FREE EXACT WORD MATCHES METHODS
16.6 DOMAINS OF BIOLOGICAL APPLICATION
16.7 DATASETS AND SOFTWARE FOR EXPERIMENTAL ALGORITHMICS
16.8 CONCLUSIONS
REFERENCES
CHAPTER 17: IN SILICO METHODS FOR THE ANALYSIS OF METABOLITES AND DRUG MOLECULES
17.1 INTRODUCTION
17.2 MOLECULAR DESCRIPTORS
17.3 DATABASES
17.4 METHODS AND DATA ANALYSIS ALGORITHMS
17.5 CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
PART III: MOTIF FINDING AND STRUCTURE PREDICTION
CHAPTER 18: MOTIF FINDING ALGORITHMS IN BIOLOGICAL SEQUENCES
18.1 INTRODUCTION
18.2 PRELIMINARIES
18.3 THE PLANTED (l, d)-MOTIF PROBLEM
18.4 THE EXTENDED (l, d)-MOTIF PROBLEM
18.5 THE EDITED MOTIF PROBLEM
18.6 THE SIMPLE MOTIF PROBLEM
18.7 CONCLUSION
REFERENCE
CHAPTER 19: COMPUTATIONAL CHARACTERIZATION OF REGULATORY REGIONS
19.1 THE GENOME REGULATORY LANDSCAPE
19.2 QUALITATIVE MODELS OF REGULATORY SIGNALS
19.3 QUANTITATIVE MODELS OF REGULATORY SIGNALS
19.4 DETECTION OF DEPENDENCIES IN SEQUENCES
19.5 REPOSITORIES OF REGULATORY INFORMATION
19.6 USING PREDICTIVE MODELS TO ANNOTATE SEQUENCES
19.7 COMPARATIVE GENOMICS CHARACTERIZATION
19.8 SEQUENCE COMPARISONS
19.9 COMBINING MOTIFS AND ALIGNMENTS
19.10 EXPERIMENTAL VALIDATION
19.11 SUMMARY
REFERENCES
CHAPTER 20: ALGORITHMIC ISSUES IN THE ANALYSIS OF CHIP-SEQ DATA
20.1 INTRODUCTION
20.2 MAPPING SEQUENCES ON THE GENOME
20.3 IDENTIFYING SIGNIFICANTLY ENRICHED REGIONS
20.4 DERIVING ACTUAL TRANSCRIPTION FACTOR BINDING SITES
20.5 CONCLUSIONS
REFERENCES
CHAPTER 21: APPROACHES AND METHODS FOR OPERON PREDICTION BASED ON MACHINE LEARNING TECHNIQUES
21.1 INTRODUCTION
21.2 DATASETS, FEATURES, AND PREPROCESSES FOR OPERON PREDICTION
21.3 MACHINE LEARNING PREDICTION METHODS FOR OPERON PREDICTION
21.4 CONCLUSIONS
21.5 ACKNOWLEDGMENTS
REFERENCES
CHAPTER 22: PROTEIN FUNCTION PREDICTION WITH DATA-MINING TECHNIQUES
22.1 INTRODUCTION
22.2 PROTEIN ANNOTATION BASED ON SEQUENCE
22.3 PROTEIN ANNOTATION BASED ON PROTEIN STRUCTURE
22.4 PROTEIN FUNCTION PREDICTION BASED ON GENE EXPRESSION DATA
22.5 PROTEIN FUNCTION PREDICTION BASED ON PROTEIN INTERACTOME MAP
22.6 PROTEIN FUNCTION PREDICTION BASED ON DATA INTEGRATION
22.7 CONCLUSIONS AND PERSPECTIVES
REFERENCES
CHAPTER 23: PROTEIN DOMAIN BOUNDARY PREDICTION
23.1 INTRODUCTION
23.2 PROFILING TECHNIQUE
23.3 RESULTS
23.4 DISCUSSION
23.5 CONCLUSIONS
REFERENCES
CHAPTER 24: AN INTRODUCTION TO RNA STRUCTURE AND PSEUDOKNOT PREDICTION
24.1 INTRODUCTION
24.2 RNA SECONDARY STRUCTURE PREDICTION
24.3 RNA PSEUDOKNOTS
24.4 CONCLUSIONS
REFERENCE
PART IV: PHYLOGENY RECONSTRUCTION
CHAPTER 25: PHYLOGENETIC SEARCH ALGORITHMS FOR MAXIMUM LIKELIHOOD
25.1 INTRODUCTION
25.2 COMPUTING THE LIKELIHOOD
25.3 ACCELERATING THE PLF BY ALGORITHMIC MEANS
25.4 ALIGNMENT SHAPES
25.5 GENERAL SEARCH HEURISTICS
25.6 COMPUTING THE ROBINSON FOULDS DISTANCE
25.7 CONVERGENCE CRITERIA
25.8 FUTURE DIRECTIONS
REFERENCES
CHAPTER 26: HEURISTIC METHODS FOR PHYLOGENETIC RECONSTRUCTION WITH MAXIMUM PARSIMONY
26.1 INTRODUCTION
26.2 DEFINITIONS AND FORMAL BACKGROUND
26.3 METHODS
26.4 CONCLUSION
REFERENCES
CHAPTER 27: MAXIMUM ENTROPY METHOD FOR COMPOSITION VECTOR METHOD
27.1 INTRODUCTION
27.2 MODELS AND ENTROPY OPTIMIZATION
27.3 APPLICATION AND DICUSSION
27.4 CONCLUDING REMARKS
REFERENCES
PART V: MICROARRAY DATA ANALYSIS
CHAPTER 28: MICROARRAY GENE EXPRESSION DATA ANALYSIS
28.1 INTRODUCTION
28.2 DNA MICROARRAY TECHNOLOGY AND EXPERIMENT
28.3 IMAGE ANALYSIS AND EXPRESSION DATA EXTRACTION
28.4 DATA PROCESSING
28.5 MISSING VALUE IMPUTATION
28.6 TEMPORAL GENE EXPRESSION PROFILE ANALYSIS
28.7 CYCLIC GENE EXPRESSION PROFILES DETECTION
28.8 SUMMARY
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 29: BICLUSTERING OF MICROARRAY DATA
29.1 INTRODUCTION
29.2 TYPES OF BICLUSTERS
29.3 GROUPS OF BICLUSTERS
29.4 EVALUATION FUNCTIONS
29.5 SYSTEMATIC AND STOCHASTIC BICLUSTERING ALGORITHMS
29.6 BIOLOGICAL VALIDATION
29.7 CONCLUSION
REFERENCES
CHAPTER 30: COMPUTATIONAL MODELS FOR CONDITION-SPECIFIC GENE AND PATHWAY INFERENCE
30.1 INTRODUCTION
30.2 CONDITION-SPECIFIC PATHWAY IDENTIFICATION
30.3 DISEASE GENE PRIORITIZATION AND GENETIC PATHWAY DETECTION
30.4 MODULE NETWORKS
30.5 SUMMARY
ACKNOWLEDGEMENTS
REFERENCES
CHAPTER 31: HETEROGENEITY OF DIFFERENTIAL EXPRESSION IN CANCER STUDIES: ALGORITHMS AND METHODS
31.1 INTRODUCTION
31.2 NOTATIONS
31.3 DIFFERENTIAL MEAN OF EXPRESSION
31.4 DIFFERENTIAL VARIABILITY OF EXPRESSION
31.5 DIFFERENTIAL EXPRESSION IN COMPENDIUM OF TUMORS
31.6 DIFFERENTIAL EXPRESSION BY CHROMOSOMAL ABERRATIONS: THE LOCAL PROPERTIES
31.7 DIFFERENTIAL EXPRESSION IN GENE INTERACTOME
31.8 DIFFERENTIAL COEXPRESSION: GLOBAL MULTIDIMENSIONAL INTERACTOME
ACKNOWLEDGMENTS
REFERENCES
PART VI: ANALYSIS OF GENOMES
CHAPTER 32: COMPARATIVE GENOMICS: ALGORITHMS AND APPLICATIONS
32.1 INTRODUCTION
32.2 NOTATIONS
32.3 ORTHOLOG ASSIGNMENT
32.4 GENE CLUSTER AND SYNTENY DETECTION
32.5 CONCLUSIONS
REFERENCES
CHAPTER 33: ADVANCES IN GENOME REARRANGEMENT ALGORITHMS
33.1 INTRODUCTION
33.2 PRELIMINARIES
33.3 SORTING BY REVERSALS
33.4 SORTING BY TRANSPOSITIONS
33.5 OTHER OPERATIONS
33.6 SORTING BY MORE THAN ONE OPERATION
33.7 FUTURE RESEARCH DIRECTIONS
33.8 NOTES ON SOFTWARE
REFERENCES
CHAPTER 34: COMPUTING GENOMIC DISTANCES : AN ALGORITHMIC VIEWPOINT
34.1 INTRODUCTION
34.2 INTERVAL-BASED CRITERIA
34.3 CHARACTER-BASED CRITERIA
34.4 CONCLUSION
REFERENCE
CHAPTER 35: WAVELET ALGORITHMS FOR DNA ANALYSIS
35.1 INTRODUCTION
35.2 DNA REPRESENTATION
35.3 STATISTICAL CORRELATIONS IN DNA
35.4 WAVELET ANALYSIS
35.5 HAAR WAVELET COEFFICIENTS AND STATISTICAL PARAMETERS
35.6 ALGORITHM OF THE SHORT HAAR DISCRETE WAVELET TRANSFORM
35.7 CLUSTERS OF WAVELET COEFFICIENTS
35.8 CONCLUSION
REFERENCE
CHAPTER 36: HAPLOTYPE INFERENCE MODELS AND ALGORITHMS
36.1 INTRODUCTION
36.2 PROBLEM STATEMENT AND NOTATIONS
36.3 COMBINATORIAL METHODS
36.4 STATISTICAL METHODS
36.5 PEDIGREE METHODS
36.6 EVALUATION
36.7 DISCUSSION
REFERENCES
PART VII: ANALYSIS OF BIOLOGICAL NETWORKS
CHAPTER 37: UNTANGLING BIOLOGICAL NETWORKS USING BIOINFORMATICS
37.1 INTRODUCTION
37.2 TYPES OF BIOLOGICAL NETWORKS
37.3 NETWORK DYNAMIC, EVOLUTION AND DISEASE
37.4 FUTURE CHALLENGES AND SCOPE
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 38: PROBABILISTIC APPROACHES FOR INVESTIGATING BIOLOGICAL NETWORKS
38.1 PROBABILISTIC MODELS FOR BIOLOGICAL NETWORKS
38.2 INTERPRETATION AND QUANTITATIVE ANALYSIS OF PROBABILISTIC MODELS
38.3 CONCLUSION
ACKNOWLEDGMENTS
REFERENCE
CHAPTER 39: MODELING AND ANALYSIS OF BIOLOGICAL NETWORKS WITH MODEL CHECKING
39.1 INTRODUCTION
39.2 PRELIMINARIES
39.3 ANALYZING GENETIC NETWORKS WITH MODEL CHECKING
39.4 PROBABILISTIC MODEL CHECKING FOR BIOLOGICAL SYSTEMS
REFERENCES
APPENDIX
CHAPTER 40: REVERSE ENGINEERING OF MOLECULAR NETWORKS FROM A COMMON COMBINATORIAL APPROACH
40.1 INTRODUCTION
40.2 REVERSE-ENGINEERING OF BIOLOGICAL NETWORKS
40.3 CLASSICAL COMBINATORIAL ALGORITHMS: A CASE STUDY
40.4 CONCLUDING REMARKS
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 41: UNSUPERVISED LEARNING FOR GENE REGULATION NETWORK INFERENCE FROM EXPRESSION DATA: A REVIEW
41.1 INTRODUCTION
41.2 GENE NETWORKS: DEFINITION AND PROPERTIES
41.3 GENE EXPRESSION: DATA AND ANALYSIS
41.4 NETWORK INFERENCE AS AN UNSUPERVISED LEARNING PROBLEM
41.5 CORRELATION-BASED METHODS
41.6 PROBABILISTIC GRAPHICAL MODELS
41.7 CONSTRAINT-BASED DATA MINING
41.8 VALIDATION
41.9 CONCLUSION AND PERSPECTIVES
REFERENCES
CHAPTER 42: APPROACHES TO CONSTRUCTION AND ANALYSIS OF MICRORNA-MEDIATED NETWORKS
42.1 INTRODUCTION
42.2 FUNDAMENTAL COMPONENT INTERACTION RESEARCH: PREDICTING MIRNA GENES, REGULATORS, AND TARGETS
42.3 IDENTIFYING MIRNA-MEDIATED NETWORKS
42.4 GLOBAL AND LOCAL ARCHITECTURE ANALYSIS IN MIRNA-CONTAINING NETWORKS
42.5 CONCLUSION
REFERENCES
INDEX
ALGORITHMS IN COMPUTATIONAL MOLECULAR BIOLOGY
Wiley Series on
Bioinformatics: Computational Techniques and Engineering
A complete list of the titles in this series appears at the end of this volume.3
Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and the author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor the author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information about our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data is available.
ISBN: 978-0-470-50519-9
To our families, for their patience and support.
PREFACE
Computational molecular biology has emerged from the Human Genome Project as an important discipline for academic research and industrial application. The exponential growth of the size of biological databases, the complexity of biological problems, and the necessity to deal with errors in biological sequences require the development of fast, low-memory requirement and high-performance algorithms. This book is a forum of such algorithms, based on new/improved approaches and/or techniques. Most of the current books on algorithms in computational molecular biology either lack technical depth or focus on specific narrow topics. This book is the first overview on algorithms in computational molecular biology with both a wide coverage of this field and enough depth to be of practical use to working professionals. It surveys the most recent developments, offering enough fundamental and technical information on these algorithms and the related problems without overloading the reader. So, this book endeavors to strike a balance between theoretical and practical coverage of a wide range of issues in computational molecular biology. Of course, the list of topics that is explored in this book is not exhaustive, but it is hoped that the topics covered will get the reader to think of the implications of the presented algorithms on the developments in his/her own field. The material included in this book was carefully chosen for quality and relevance. This book also presents a mixture of experiments and simulations that provide not only qualitative but also quantitative insights into the rich field of computational molecular biology. It is hoped that this book will increase the interest of the algorithmics community in studying a wider range of combinatorial problems that originate in computational molecular biology. This should enable researchers to deal with more complex issues and richer data sets.
Ideally, the reader of this book should be someone who is familiar with computational molecular biology and would like to learn more about algorithms that deal with the most studied, the most important, and/or the newest topics in the field of computational molecular biology. However, this book could be used by a wider audience such as graduate students, senior undergraduate students, researchers, instructors, and practitioners in computer science, life science, and mathematics. We have tried to make the material of this book self-contained so that the reader would not have to consult a lot of external references. Thus, the reader of this book will certainly find what he/she is looking for or at least a clue that will help to make an advance in his/her research. This book is quite timely, because the field of computational molecular biology as a whole is undergoing many changes, and will be of a great use to the reader.
This book is organized into seven parts: Strings Processing and Application to Biological Sequences, Analysis of Biological Sequences, Motif Finding and Structure Prediction, Phylogeny Reconstruction, Microarray Data Analysis, Analysis of Genomes, and Analysis of Biological Networks. The 42 chapters, that make up the seven parts of this book, were carefully selected to provide a wide scope with minimal overlap between the chapters in order to reduce duplication. Each contributor was asked that his/her chapter should cover review material as well as current developments. In addition, we selected authors who are leaders in their respective fields.
MOURAD ELLOUMI AND ALBERT Y. ZOMAYA
CONTRIBUTORS
Bassam A. Alqaralleh Faculty of IT, Al-Hussien Bin Talal University, Jordan.
Srinivas Aluru Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA; and Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India.
Mohamed Radhouene Aniba Institute of Genetics and Molecular and Cellular Biology, Illkirch, France.
Pavlos Antoniou Department of Computer Science, King’s College, London, UK.
Wassim Ayadi Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.
Enrique Blanco Department of Genetics, Institute of Biomedicine of the University of Barcelona, Spain.
Guillaume Blin IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.
Dragan Bosnacki Eindhoven University of Technology, The Netherlands.
Jérémie Bourdon LINA, University of Nantes and INRIA Rennes-Bretagne-Atlantique, France.
Carlo Cattani Department of Mathematics, University of Salerno, Italy.
Elsa Chacko Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, Australia.
Raymond H. F. Chan Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
Luonan Chen Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
Hsin-Hung Chou Department of Information Management, Chang Jung Christian University, Tainan, Taiwan.
Manolis Christodoulakis Department of Electrical and Computer Engineering, University of Cyprus, Nicosia, Cyprus; and Department of Computer Science, King’s College London, London, UK.
Adrian Cootes Macquarie University, Sydney, Australia.
Maxime Crochemore IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.
Bhaskar DasGupta Department of Computer Science, University of Illinois at Chicago, USA.
Amitava Datta School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.
Erik P. de Vink Eindhoven University of Technology, The Netherlands.
Wei Du College of Computer Science and Technology, Jilin University, Changchun, China.
Mohamed Elati Institute of Systems and Synthetic Biology, Evry University - Genopole, Evry, France.
Mourad Elloumi Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.
Chiara Epifanio Department of Mathematics and Applications, University of Palermo, Italy.
Patricia A. Evans Faculty of Computer Science, University of New Brunswick, Fredericton, Canada.
Damien Eveillard LINA, University of Nantes and INRIA Rennes-Bretagne-Atlantique, France.
Tarek El Falah Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.
Guillaume Fertin LINA UMR CNRS 6241, University of Nantes, France.
Alessandra Gabriele Department of Mathematics and Applications, University of Palermo, Italy.
Jennifer Gamble Vascular Biology Laboratory, Centenary Institute, Sydney, Australia.
Xiangchao Gan The Wellcome Trust Center for Human Genetics, University of Oxford, UK.
Raffaele Giancarlo Department of Mathematics and Applications, University of Palermo, Italy.
Mathieu Giraud LIFL, University of Lille 1 and INRIA Lille - Nord Europe, Villeneuve d’Ascq, France.
Adrien Goëffon LERIA, University of Angers, France.
Jin-Kao Hao LERIA, University of Angers, France.
Masud Hasan Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh.
Peter A. J. Hilbers Eindhoven University of Technology, The Netherlands.
Jan Holub Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, Czech Republic.
Sun-Yuan Hsieh Department of Computer Science and Information Engineering, Institute of Medical Informatics, Institute of Manufacturing Information and Systems, National Cheng Kung University, Tainan, Taiwan.
Chao-Wen Huang Department of Computer Science and Information Engineering, National Cheng Kung University Tainan, Taiwan.
Costas S. Iliopoulos Department of Computer Science, King’s College London, London, UK & Digital Ecosystems & Business Intelligence Institute, Curtin University, Perth, Australia.
Ming-Yang Kao Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Radha Krishna Murthy Karuturi Computational and Systems Biology, Genome Institute of Singapore.
Varun Khanna Department of Chemistry and Biomolecular Sciences, and ARC Centre of Excellence in Bioinformatics, Macquarie University Sydney, Australia.
Gaurav Kumar Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia
Vamsi Kundeti Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.
Thierry Lecroq LITIS, University of Rouen, France.
Yanchun Liang College of Computer Science and Technology, Jilin University, Changchun, China.
Jana Sperschneider School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.
Alan Wee-Chung Liew School of Information and Communication Technology, Griffith University, Australia.
Christos Makris Computer Engineering and Informatics Department, University of Patras, Rio, Greece.
Ion Mandoiu Computer Science & Engineering Department, University of Connecticut, Storrs, CT, USA.
Ronny S. Mans Eindhoven University of Technology, The Netherlands.
Ahmed Mokaddem Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.
Giulio Pavesi Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy.
Pierre Peterlongo INRIA Rennes Bretagne Atlantique, Campus de Beaulieu, Rennes, France.
Nadia Pisanti Dipartimento di Informatica, University of Pisa, Italy.
Yu-Qing Qiu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
Mohammed S. Rahman Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh.
Sanguthevar Rajasekaran Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.
Shoba Ranganathan Department of Chemistry and Biomolecular Sciences, and ARC Centre of Excellence in Bioinformatics, Macquarie University Sydney, Australia and Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
Jean-Michel Richer LERIA, University of Angers, France.
Eric Rivals LIRMM, University Montpellier 2, France.
Céline Rouveirol LIPN, UMR CNRS, Institute Galilée, University Paris-Nord, France.
Irena Rusu LINA UMR CNRS 6241, University of Nantes, France.
Leena Salmela Department of Computer Science, University of Helsinki, Finland.
Martin Schiller School of Life Sciences, University of Nevada Las Vegas, USA.
Marinella Sciortino Department of Mathematics and Applications, University of Palermo, Italy.
Eduardo Sontag Department of Mathematics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
Jana Sperschneider School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.
Alexandros Stamatakis The Exelixis Lab, Department of Computer Science, Technische Universität München, Germany.
Jorma Tarhio Department of Computer Science and Engineering, Aalto University, Espoo, Finland.
Evangelos Theodoridis Computer Engineering and Informatics Department, University of Patras, Rio, Greece.
Julie Thompson Institute of Genetics and Molecular and Cellular Biology, Illkirch, France.
Mathew Vadas Vascular Biology Laboratory, Centenary Institute, Sydney, Australia.
Paola Vera-Licona Institut Curie and INSERM, Paris, France.
Stéphane Vialette IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.
Chen Wang CSIRO ICT Centre, Australia.
Roger W. Wang Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
Shuqin Wang College of Computer Science and Technology, Jilin University, Changchun, China.
Yan Wang College of Computer Science and Technology, Jilin University, Changchun, China.
H. Todd Wareham Department of Computer Science, Memorial University of Newfoundland, St. John’s, Canada.
Jeff C. F. Wong Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
Ling-Yun Wu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
Xiao Yang Department of Electrical and Computer Engineering, Bioinformatics and Computational Biology program, Iowa State University, Ames, IA, USA.
Paul D. Yoo School of information Technologies, The University of Sydney, Australia.
Federico Zambelli Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy.
Chen Zhang College of Computer Science and Technology, Jilin University, Changchun, China.
Shihua Zhang Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
Xiang-Sun Zhang Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
Xing-Ming Zhao Institute of Systems Biology, Shanghai University, China.
Bing Bing Zhou School of information Technologies, The University of Sydney, Australia.
Chunguang Zhou College of Computer Science and Technology, Jilin University, Changchun, China.
You Zhou College of Computer Science and Technology, Jilin University, Changchun, China.
Albert Y. Zomaya School of Information Technologies, The University of Sydney, Australia.
Wiley Series on
Bioinformatics: Computational Techniques and Engineering
Bioinformatics and computational biology involve the comprehensive application of mathematics, statistics, and computer science to the understanding of living systems. Research and development in these areas require cooperation among specialists from the fields of biology, computer science, mathematics, statistics, physics, and related sciences. The objective of this book series is to provide timely treatments of the different aspects of bioinformatics spanning theory, new and established techniques, technologies and tools, and application domains. This series emphasizes algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.
Series Editors: Professor Yi Pan and Professor Albert Y. Zomayapan@cs.gsu.edu and albert.zomaya@sydney.edu.au
Knowledge Discovery in Bioinformatics: Techniques, Methods, and ApplicationsXiaohua Hu and Yi Pan
Grid Computing for Bioinformatics and Computational BiologyEdited by El-Ghazali Talbi and Albert Y. Zomaya
Bioinformatics Algorithms: Techniques and ApplicationsIon Mandiou and Alexander Zelikovsky
Analysis of Biological NetworksEdited by Björn H. Junker and Falk Schreiber
Computational Intelligence and Pattern Analysis in Biological InformaticsEdited by Ujjwal Maulik, Sanghamitra Bandyopadhyay, and Jason T. L. Wang
Mathematics of Bioinformatics: Theory, Practice, and ApplicationsMatthew He and Sergey Petoukhov
Algorithms in Computational Molecular Biology: Techniques, Approaches and ApplicationsEdited by Mourad Elloumi and Albert Y. Zomaya
PART I
STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES
CHAPTER 1
STRING DATA STRUCTURES FOR COMPUTATIONAL MOLECULAR BIOLOGY
Christos Makris and Evangelos Theodoridis
1.1 INTRODUCTION
The topic of the chapter is string data structures with applications in the field of computational molecular biology. Let Σ be a finite alphabet consisting of a set of characters (or symbols). The cardinality of the alphabet denoted by |Σ| expresses the number of distinct characters in the alphabet. A string or word is an ordered list of zero or more characters drawn from the alphabet. A word or string w of length n is represented by , where and |w| denotes the length of w. The empty word is the empty sequence (of zero length) and is denoted by ε. A list of characters of w, appearing in consecutive positions, is called a substring of w, denoted by w[i···j], where i and j are the starting and ending positions, respectively. If the substring starts at position 1, then it is called a prefix, whereas if it ends at position n, then it is called a suffix of w. However, an ordered list of characters of w that are not necessarily consecutive is called a subsequence of w.
Strings and subsequences appear in a plethora of computational molecular biology problems because the basic types of DNA, RNA, and protein molecules can be represented as strings—pieces of DNA as strings over the alphabet {,,,}(representing the four bases adenine, cytosine, guanine, and thymine, respectively), pieces of RNA as strings over the alphabet {A,C,G,U} (with uracil replacing thymine), and proteins as strings over an alphabet of 20, corresponding to the 20 amino acid residues.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
