Algorithms in Computational Molecular Biology - Mourad Elloumi - E-Book

Algorithms in Computational Molecular Biology E-Book

Mourad Elloumi

0,0
160,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book represents the most comprehensive and up-to-date collection of information on the topic of computational molecular biology. Bringing the most recent research into the forefront of discussion, Algorithms in Computational Molecular Biology studies the most important and useful algorithms currently being used in the field, and provides related problems. It also succeeds where other titles have failed, in offering a wide range of information from the introductory fundamentals right up to the latest, most advanced levels of study.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1927

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

COVER

HALF TITLE PAGE

TITLE PAGE

COPYRIGHT PAGE

DEDICATION

PREFACE

CONTRIBUTORS

SERIES PAGE

PART I: STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES

CHAPTER 1: STRING DATA STRUCTURES FOR COMPUTATIONAL MOLECULAR BIOLOGY

1.1 INTRODUCTION

1.2 MAIN STRING INDEXING DATA STRUCTURES

1.3 INDEX STRUCTURES FOR WEIGHTED STRINGS

1.4 INDEX STRUCTURES FOR INDETERMINATE STRINGS

1.5 STRING DATA STRUCTURES IN MEMORY HIERARCHIES

1.6 CONCLUSIONS

REFERENCE

CHAPTER 2: EFFICIENT RESTRICTED-CASE ALGORITHMS FOR PROBLEMS IN COMPUTATIONAL BIOLOGY

2.1 THE NEED FOR SPECIAL CASES

2.2 ASSESSING EFFICIENT SOLVABILITY OPTIONS FOR GENERAL PROBLEMS AND SPECIAL CASES

2.3 STRING AND SEQUENCE PROBLEMS

2.4 SHORTEST COMMON SUPERSTRING

2.5 LONGEST COMMON SUBSEQUENCE

2.6 COMMON APPROXIMATE SUBSTRING

2.7 CONCLUSION

REFERENCES

CHAPTER 3: FINITE AUTOMATA IN PATTERN MATCHING

3.1 INTRODUCTION

3.2 DIRECT USE OF DFA IN STRINGOLOGY

3.3 NFA SIMULATION

3.4 FINITE AUTOMATON AS MODEL OF COMPUTATION

3.5 FINITE AUTOMATA COMPOSITION

3.6 SUMMARY

REFERENCES

CHAPTER 4: NEW DEVELOPMENTS IN PROCESSING OF DEGENERATE SEQUENCES

4.1 INTRODUCTION

4.2 BACKGROUND

4.3 BASIC DEFINITIONS

4.4 REPETITIVE STRUCTURES IN DEGENERATE STRINGS

4.5 CONSERVATIVE STRING COVERING IN DEGENERATE STRINGS

4.6 CONCLUSION

REFERENCE

CHAPTER 5: EXACT SEARCH ALGORITHMS FOR BIOLOGICAL SEQUENCES

5.1 INTRODUCTION

5.2 SINGLE PATTERN MATCHING ALGORITHMS

5.3 ALGORITHMS FOR MULTIPLE PATTERNS

5.4 APPLICATION OF EXACT SET PATTERN MATCHING FOR READ MAPPING AND COMPARISON WITH MAPPING TOOLS

5.5 CONCLUSIONS

REFERENCE

CHAPTER 6: ALGORITHMIC ASPECTS OF ARC-ANNOTATED SEQUENCES

6.1 INTRODUCTION

6.2 PRELIMINARIES

6.3 LONGEST ARC-PRESERVING COMMON SUBSEQUENCE

6.4 ARC-PRESERVING SUBSEQUENCE

6.5 MAXIMUM ARC-PRESERVING COMMON SUBSEQUENCE

6.6 EDIT DISTANCE

REFERENCE

CHAPTER 7: ALGORITHMIC ISSUES IN DNA BARCODING PROBLEMS

7.1 INTRODUCTION

7.2 TEST SET PROBLEMS: A GENERAL FRAMEWORK FOR SEVERAL BARCODING PROBLEMS

7.3 A SYNOPSIS OF BIOLOGICAL APPLICATIONS OF BARCODING

7.4 SURVEY OF ALGORITHMIC TECHNIQUES ON BARCODING

7.5 INFORMATION CONTENT APPROACH

7.6 SET-COVERING APPROACH

7.7 EXPERIMENTAL RESULTS AND SOFTWARE AVAILABILITY

7.8 CONCLUDING REMARKS

ACKNOWLEDGMENTS

REFERENCES

CHAPTER 8: RECENT ADVANCES IN WEIGHTED DNA SEQUENCES

8.1 INTRODUCTION

8.2 PRELIMINARIES

8.3 INDEXING

8.4 PATTERN MATCHING

8.5 APPROXIMATE PATTERN MATCHING

8.6 REPETITIONS, COVERS, AND TANDEM REPEATS

8.7 MOTIF DISCOVERY

8.8 CONCLUSIONS

REFERENCES

CHAPTER 9: DNA COMPUTING FOR SUBGRAPH ISOMORPHISM PROBLEM AND RELATED PROBLEMS

9.1 INTRODUCTION

9.2 DEFINITIONS OF SUBGRAPH ISOMORPHISM PROBLEM AND RELATED PROBLEMS

9.3 DNA COMPUTING MODELS

9.4 THE STICKER-BASED SOLUTION SPACE

9.5 ALGORITHMS FOR SOLVING PROBLEMS

9.6 EXPERIMENTAL DATA

9.7 CONCLUSION

REFERENCE

PART II: ANALYSIS OF BIOLOGICAL SEQUENCES

CHAPTER 10: GRAPHS IN BIOINFORMATICS

10.1 GRAPH THEORY—ORIGIN

10.2 GRAPHS AND THE BIOLOGICAL WORLD

10.3 CONCLUSION

REFERENCES

CHAPTER 11: A FLEXIBLE DATA STORE FOR MANAGING BIOINFORMATICS DATA

11.1 INTRODUCTION

11.2 DATA MODEL AND SYSTEM OVERVIEW

11.3 REPLICATION AND LOAD BALANCING

11.4 EVALUATION

11.5 RELATED WORK

11.6 SUMMARY

REFERENCE

CHAPTER 12: ALGORITHMS FOR THE ALIGNMENT OF BIOLOGICAL SEQUENCES

12.1 INTRODUCTION

12.2 ALIGNMENT ALGORITHMS

12.3 SCORE FUNCTIONS

12.4 BENCHMARKS

12.5 CONCLUSION

ACKNOWLEDGMENTS

REFERENCES

CHAPTER 13: ALGORITHMS FOR LOCAL STRUCTURAL ALIGNMENT AND STRUCTURAL MOTIF IDENTIFICATION

13.1 INTRODUCTION

13.2 PROBLEM DEFINITION OF LOCAL STRUCTURAL ALIGNMENT

13.3 VARIABLE-LENGTH ALIGNMENT FRAGMENT PAIR (VLAFP) ALGORITHM

13.4 STRUCTURAL ALIGNMENT BASED ON CENTER OF GRAVITY: SACG

13.5 SEARCHING STRUCTURAL MOTIFS

13.6 USING SACG ALGORITHM FOR CLASSIFICATION OF NEW PROTEIN STRUCTURES

13.7 EXPERIMENTAL RESULTS

13.8 ACCURACY RESULTS

13.9 CONCLUSION

ACKNOWLEDGMENTS

REFERENCE

CHAPTER 14: EVOLUTION OF THE CLUSTAL FAMILY OF MULTIPLE SEQUENCE ALIGNMENT PROGRAMS

14.1 INTRODUCTION

14.2 CLUSTAL-CLUSTALV

14.3 CLUSTALW

14.4 CLUSTALX

14.5 CLUSTALW AND CLUSTALX 2.0

14.6 DBCLUSTAL

14.7 PERSPECTIVES

REFERENCES

CHAPTER 15: FILTERS AND SEEDS APPROACHES FOR FAST HOMOLOGY SEARCHES IN LARGE DATASETS

15.1 INTRODUCTION

15.2 METHODS FRAMEWORK

15.3 LOSSLESS FILTERS

15.4 LOSSY SEED-BASED FILTERS

15.5 CONCLUSION

15.6 ACKNOWLEDGMENTS

REFERENCES

CHAPTER 16: NOVEL COMBINATORIAL AND INFORMATION-THEORETIC ALIGNMENT-FREE DISTANCES BIOLOGICAL DATA MINING

16.1 INTRODUCTION

16.2 INFORMATION-THEORETIC ALIGNMENT-FREE METHODS

16.3 COMBINATORIAL ALIGNMENT-FREE METHODS

16.4 ALIGNMENT-FREE COMPOSITIONAL METHODS

16.5 ALIGNMENT-FREE EXACT WORD MATCHES METHODS

16.6 DOMAINS OF BIOLOGICAL APPLICATION

16.7 DATASETS AND SOFTWARE FOR EXPERIMENTAL ALGORITHMICS

16.8 CONCLUSIONS

REFERENCES

CHAPTER 17: IN SILICO METHODS FOR THE ANALYSIS OF METABOLITES AND DRUG MOLECULES

17.1 INTRODUCTION

17.2 MOLECULAR DESCRIPTORS

17.3 DATABASES

17.4 METHODS AND DATA ANALYSIS ALGORITHMS

17.5 CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

PART III: MOTIF FINDING AND STRUCTURE PREDICTION

CHAPTER 18: MOTIF FINDING ALGORITHMS IN BIOLOGICAL SEQUENCES

18.1 INTRODUCTION

18.2 PRELIMINARIES

18.3 THE PLANTED (l, d)-MOTIF PROBLEM

18.4 THE EXTENDED (l, d)-MOTIF PROBLEM

18.5 THE EDITED MOTIF PROBLEM

18.6 THE SIMPLE MOTIF PROBLEM

18.7 CONCLUSION

REFERENCE

CHAPTER 19: COMPUTATIONAL CHARACTERIZATION OF REGULATORY REGIONS

19.1 THE GENOME REGULATORY LANDSCAPE

19.2 QUALITATIVE MODELS OF REGULATORY SIGNALS

19.3 QUANTITATIVE MODELS OF REGULATORY SIGNALS

19.4 DETECTION OF DEPENDENCIES IN SEQUENCES

19.5 REPOSITORIES OF REGULATORY INFORMATION

19.6 USING PREDICTIVE MODELS TO ANNOTATE SEQUENCES

19.7 COMPARATIVE GENOMICS CHARACTERIZATION

19.8 SEQUENCE COMPARISONS

19.9 COMBINING MOTIFS AND ALIGNMENTS

19.10 EXPERIMENTAL VALIDATION

19.11 SUMMARY

REFERENCES

CHAPTER 20: ALGORITHMIC ISSUES IN THE ANALYSIS OF CHIP-SEQ DATA

20.1 INTRODUCTION

20.2 MAPPING SEQUENCES ON THE GENOME

20.3 IDENTIFYING SIGNIFICANTLY ENRICHED REGIONS

20.4 DERIVING ACTUAL TRANSCRIPTION FACTOR BINDING SITES

20.5 CONCLUSIONS

REFERENCES

CHAPTER 21: APPROACHES AND METHODS FOR OPERON PREDICTION BASED ON MACHINE LEARNING TECHNIQUES

21.1 INTRODUCTION

21.2 DATASETS, FEATURES, AND PREPROCESSES FOR OPERON PREDICTION

21.3 MACHINE LEARNING PREDICTION METHODS FOR OPERON PREDICTION

21.4 CONCLUSIONS

21.5 ACKNOWLEDGMENTS

REFERENCES

CHAPTER 22: PROTEIN FUNCTION PREDICTION WITH DATA-MINING TECHNIQUES

22.1 INTRODUCTION

22.2 PROTEIN ANNOTATION BASED ON SEQUENCE

22.3 PROTEIN ANNOTATION BASED ON PROTEIN STRUCTURE

22.4 PROTEIN FUNCTION PREDICTION BASED ON GENE EXPRESSION DATA

22.5 PROTEIN FUNCTION PREDICTION BASED ON PROTEIN INTERACTOME MAP

22.6 PROTEIN FUNCTION PREDICTION BASED ON DATA INTEGRATION

22.7 CONCLUSIONS AND PERSPECTIVES

REFERENCES

CHAPTER 23: PROTEIN DOMAIN BOUNDARY PREDICTION

23.1 INTRODUCTION

23.2 PROFILING TECHNIQUE

23.3 RESULTS

23.4 DISCUSSION

23.5 CONCLUSIONS

REFERENCES

CHAPTER 24: AN INTRODUCTION TO RNA STRUCTURE AND PSEUDOKNOT PREDICTION

24.1 INTRODUCTION

24.2 RNA SECONDARY STRUCTURE PREDICTION

24.3 RNA PSEUDOKNOTS

24.4 CONCLUSIONS

REFERENCE

PART IV: PHYLOGENY RECONSTRUCTION

CHAPTER 25: PHYLOGENETIC SEARCH ALGORITHMS FOR MAXIMUM LIKELIHOOD

25.1 INTRODUCTION

25.2 COMPUTING THE LIKELIHOOD

25.3 ACCELERATING THE PLF BY ALGORITHMIC MEANS

25.4 ALIGNMENT SHAPES

25.5 GENERAL SEARCH HEURISTICS

25.6 COMPUTING THE ROBINSON FOULDS DISTANCE

25.7 CONVERGENCE CRITERIA

25.8 FUTURE DIRECTIONS

REFERENCES

CHAPTER 26: HEURISTIC METHODS FOR PHYLOGENETIC RECONSTRUCTION WITH MAXIMUM PARSIMONY

26.1 INTRODUCTION

26.2 DEFINITIONS AND FORMAL BACKGROUND

26.3 METHODS

26.4 CONCLUSION

REFERENCES

CHAPTER 27: MAXIMUM ENTROPY METHOD FOR COMPOSITION VECTOR METHOD

27.1 INTRODUCTION

27.2 MODELS AND ENTROPY OPTIMIZATION

27.3 APPLICATION AND DICUSSION

27.4 CONCLUDING REMARKS

REFERENCES

PART V: MICROARRAY DATA ANALYSIS

CHAPTER 28: MICROARRAY GENE EXPRESSION DATA ANALYSIS

28.1 INTRODUCTION

28.2 DNA MICROARRAY TECHNOLOGY AND EXPERIMENT

28.3 IMAGE ANALYSIS AND EXPRESSION DATA EXTRACTION

28.4 DATA PROCESSING

28.5 MISSING VALUE IMPUTATION

28.6 TEMPORAL GENE EXPRESSION PROFILE ANALYSIS

28.7 CYCLIC GENE EXPRESSION PROFILES DETECTION

28.8 SUMMARY

ACKNOWLEDGMENTS

REFERENCES

CHAPTER 29: BICLUSTERING OF MICROARRAY DATA

29.1 INTRODUCTION

29.2 TYPES OF BICLUSTERS

29.3 GROUPS OF BICLUSTERS

29.4 EVALUATION FUNCTIONS

29.5 SYSTEMATIC AND STOCHASTIC BICLUSTERING ALGORITHMS

29.6 BIOLOGICAL VALIDATION

29.7 CONCLUSION

REFERENCES

CHAPTER 30: COMPUTATIONAL MODELS FOR CONDITION-SPECIFIC GENE AND PATHWAY INFERENCE

30.1 INTRODUCTION

30.2 CONDITION-SPECIFIC PATHWAY IDENTIFICATION

30.3 DISEASE GENE PRIORITIZATION AND GENETIC PATHWAY DETECTION

30.4 MODULE NETWORKS

30.5 SUMMARY

ACKNOWLEDGEMENTS

REFERENCES

CHAPTER 31: HETEROGENEITY OF DIFFERENTIAL EXPRESSION IN CANCER STUDIES: ALGORITHMS AND METHODS

31.1 INTRODUCTION

31.2 NOTATIONS

31.3 DIFFERENTIAL MEAN OF EXPRESSION

31.4 DIFFERENTIAL VARIABILITY OF EXPRESSION

31.5 DIFFERENTIAL EXPRESSION IN COMPENDIUM OF TUMORS

31.6 DIFFERENTIAL EXPRESSION BY CHROMOSOMAL ABERRATIONS: THE LOCAL PROPERTIES

31.7 DIFFERENTIAL EXPRESSION IN GENE INTERACTOME

31.8 DIFFERENTIAL COEXPRESSION: GLOBAL MULTIDIMENSIONAL INTERACTOME

ACKNOWLEDGMENTS

REFERENCES

PART VI: ANALYSIS OF GENOMES

CHAPTER 32: COMPARATIVE GENOMICS: ALGORITHMS AND APPLICATIONS

32.1 INTRODUCTION

32.2 NOTATIONS

32.3 ORTHOLOG ASSIGNMENT

32.4 GENE CLUSTER AND SYNTENY DETECTION

32.5 CONCLUSIONS

REFERENCES

CHAPTER 33: ADVANCES IN GENOME REARRANGEMENT ALGORITHMS

33.1 INTRODUCTION

33.2 PRELIMINARIES

33.3 SORTING BY REVERSALS

33.4 SORTING BY TRANSPOSITIONS

33.5 OTHER OPERATIONS

33.6 SORTING BY MORE THAN ONE OPERATION

33.7 FUTURE RESEARCH DIRECTIONS

33.8 NOTES ON SOFTWARE

REFERENCES

CHAPTER 34: COMPUTING GENOMIC DISTANCES : AN ALGORITHMIC VIEWPOINT

34.1 INTRODUCTION

34.2 INTERVAL-BASED CRITERIA

34.3 CHARACTER-BASED CRITERIA

34.4 CONCLUSION

REFERENCE

CHAPTER 35: WAVELET ALGORITHMS FOR DNA ANALYSIS

35.1 INTRODUCTION

35.2 DNA REPRESENTATION

35.3 STATISTICAL CORRELATIONS IN DNA

35.4 WAVELET ANALYSIS

35.5 HAAR WAVELET COEFFICIENTS AND STATISTICAL PARAMETERS

35.6 ALGORITHM OF THE SHORT HAAR DISCRETE WAVELET TRANSFORM

35.7 CLUSTERS OF WAVELET COEFFICIENTS

35.8 CONCLUSION

REFERENCE

CHAPTER 36: HAPLOTYPE INFERENCE MODELS AND ALGORITHMS

36.1 INTRODUCTION

36.2 PROBLEM STATEMENT AND NOTATIONS

36.3 COMBINATORIAL METHODS

36.4 STATISTICAL METHODS

36.5 PEDIGREE METHODS

36.6 EVALUATION

36.7 DISCUSSION

REFERENCES

PART VII: ANALYSIS OF BIOLOGICAL NETWORKS

CHAPTER 37: UNTANGLING BIOLOGICAL NETWORKS USING BIOINFORMATICS

37.1 INTRODUCTION

37.2 TYPES OF BIOLOGICAL NETWORKS

37.3 NETWORK DYNAMIC, EVOLUTION AND DISEASE

37.4 FUTURE CHALLENGES AND SCOPE

ACKNOWLEDGMENTS

REFERENCES

CHAPTER 38: PROBABILISTIC APPROACHES FOR INVESTIGATING BIOLOGICAL NETWORKS

38.1 PROBABILISTIC MODELS FOR BIOLOGICAL NETWORKS

38.2 INTERPRETATION AND QUANTITATIVE ANALYSIS OF PROBABILISTIC MODELS

38.3 CONCLUSION

ACKNOWLEDGMENTS

REFERENCE

CHAPTER 39: MODELING AND ANALYSIS OF BIOLOGICAL NETWORKS WITH MODEL CHECKING

39.1 INTRODUCTION

39.2 PRELIMINARIES

39.3 ANALYZING GENETIC NETWORKS WITH MODEL CHECKING

39.4 PROBABILISTIC MODEL CHECKING FOR BIOLOGICAL SYSTEMS

REFERENCES

APPENDIX

CHAPTER 40: REVERSE ENGINEERING OF MOLECULAR NETWORKS FROM A COMMON COMBINATORIAL APPROACH

40.1 INTRODUCTION

40.2 REVERSE-ENGINEERING OF BIOLOGICAL NETWORKS

40.3 CLASSICAL COMBINATORIAL ALGORITHMS: A CASE STUDY

40.4 CONCLUDING REMARKS

ACKNOWLEDGMENTS

REFERENCES

CHAPTER 41: UNSUPERVISED LEARNING FOR GENE REGULATION NETWORK INFERENCE FROM EXPRESSION DATA: A REVIEW

41.1 INTRODUCTION

41.2 GENE NETWORKS: DEFINITION AND PROPERTIES

41.3 GENE EXPRESSION: DATA AND ANALYSIS

41.4 NETWORK INFERENCE AS AN UNSUPERVISED LEARNING PROBLEM

41.5 CORRELATION-BASED METHODS

41.6 PROBABILISTIC GRAPHICAL MODELS

41.7 CONSTRAINT-BASED DATA MINING

41.8 VALIDATION

41.9 CONCLUSION AND PERSPECTIVES

REFERENCES

CHAPTER 42: APPROACHES TO CONSTRUCTION AND ANALYSIS OF MICRORNA-MEDIATED NETWORKS

42.1 INTRODUCTION

42.2 FUNDAMENTAL COMPONENT INTERACTION RESEARCH: PREDICTING MIRNA GENES, REGULATORS, AND TARGETS

42.3 IDENTIFYING MIRNA-MEDIATED NETWORKS

42.4 GLOBAL AND LOCAL ARCHITECTURE ANALYSIS IN MIRNA-CONTAINING NETWORKS

42.5 CONCLUSION

REFERENCES

INDEX

ALGORITHMS IN COMPUTATIONAL MOLECULAR BIOLOGY

Wiley Series on

Bioinformatics: Computational Techniques and Engineering

A complete list of the titles in this series appears at the end of this volume.3

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and the author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor the author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information about our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-0-470-50519-9

To our families, for their patience and support.

PREFACE

Computational molecular biology has emerged from the Human Genome Project as an important discipline for academic research and industrial application. The exponential growth of the size of biological databases, the complexity of biological problems, and the necessity to deal with errors in biological sequences require the development of fast, low-memory requirement and high-performance algorithms. This book is a forum of such algorithms, based on new/improved approaches and/or techniques. Most of the current books on algorithms in computational molecular biology either lack technical depth or focus on specific narrow topics. This book is the first overview on algorithms in computational molecular biology with both a wide coverage of this field and enough depth to be of practical use to working professionals. It surveys the most recent developments, offering enough fundamental and technical information on these algorithms and the related problems without overloading the reader. So, this book endeavors to strike a balance between theoretical and practical coverage of a wide range of issues in computational molecular biology. Of course, the list of topics that is explored in this book is not exhaustive, but it is hoped that the topics covered will get the reader to think of the implications of the presented algorithms on the developments in his/her own field. The material included in this book was carefully chosen for quality and relevance. This book also presents a mixture of experiments and simulations that provide not only qualitative but also quantitative insights into the rich field of computational molecular biology. It is hoped that this book will increase the interest of the algorithmics community in studying a wider range of combinatorial problems that originate in computational molecular biology. This should enable researchers to deal with more complex issues and richer data sets.

Ideally, the reader of this book should be someone who is familiar with computational molecular biology and would like to learn more about algorithms that deal with the most studied, the most important, and/or the newest topics in the field of computational molecular biology. However, this book could be used by a wider audience such as graduate students, senior undergraduate students, researchers, instructors, and practitioners in computer science, life science, and mathematics. We have tried to make the material of this book self-contained so that the reader would not have to consult a lot of external references. Thus, the reader of this book will certainly find what he/she is looking for or at least a clue that will help to make an advance in his/her research. This book is quite timely, because the field of computational molecular biology as a whole is undergoing many changes, and will be of a great use to the reader.

This book is organized into seven parts: Strings Processing and Application to Biological Sequences, Analysis of Biological Sequences, Motif Finding and Structure Prediction, Phylogeny Reconstruction, Microarray Data Analysis, Analysis of Genomes, and Analysis of Biological Networks. The 42 chapters, that make up the seven parts of this book, were carefully selected to provide a wide scope with minimal overlap between the chapters in order to reduce duplication. Each contributor was asked that his/her chapter should cover review material as well as current developments. In addition, we selected authors who are leaders in their respective fields.

MOURAD ELLOUMI AND ALBERT Y. ZOMAYA

CONTRIBUTORS

Bassam A. Alqaralleh Faculty of IT, Al-Hussien Bin Talal University, Jordan.

Srinivas Aluru Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA; and Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India.

Mohamed Radhouene Aniba Institute of Genetics and Molecular and Cellular Biology, Illkirch, France.

Pavlos Antoniou Department of Computer Science, King’s College, London, UK.

Wassim Ayadi Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.

Enrique Blanco Department of Genetics, Institute of Biomedicine of the University of Barcelona, Spain.

Guillaume Blin IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.

Dragan Bosnacki Eindhoven University of Technology, The Netherlands.

Jérémie Bourdon LINA, University of Nantes and INRIA Rennes-Bretagne-Atlantique, France.

Carlo Cattani Department of Mathematics, University of Salerno, Italy.

Elsa Chacko Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, Australia.

Raymond H. F. Chan Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.

Luonan Chen Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.

Hsin-Hung Chou Department of Information Management, Chang Jung Christian University, Tainan, Taiwan.

Manolis Christodoulakis Department of Electrical and Computer Engineering, University of Cyprus, Nicosia, Cyprus; and Department of Computer Science, King’s College London, London, UK.

Adrian Cootes Macquarie University, Sydney, Australia.

Maxime Crochemore IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.

Bhaskar DasGupta Department of Computer Science, University of Illinois at Chicago, USA.

Amitava Datta School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.

Erik P. de Vink Eindhoven University of Technology, The Netherlands.

Wei Du College of Computer Science and Technology, Jilin University, Changchun, China.

Mohamed Elati Institute of Systems and Synthetic Biology, Evry University - Genopole, Evry, France.

Mourad Elloumi Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.

Chiara Epifanio Department of Mathematics and Applications, University of Palermo, Italy.

Patricia A. Evans Faculty of Computer Science, University of New Brunswick, Fredericton, Canada.

Damien Eveillard LINA, University of Nantes and INRIA Rennes-Bretagne-Atlantique, France.

Tarek El Falah Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.

Guillaume Fertin LINA UMR CNRS 6241, University of Nantes, France.

Alessandra Gabriele Department of Mathematics and Applications, University of Palermo, Italy.

Jennifer Gamble Vascular Biology Laboratory, Centenary Institute, Sydney, Australia.

Xiangchao Gan The Wellcome Trust Center for Human Genetics, University of Oxford, UK.

Raffaele Giancarlo Department of Mathematics and Applications, University of Palermo, Italy.

Mathieu Giraud LIFL, University of Lille 1 and INRIA Lille - Nord Europe, Villeneuve d’Ascq, France.

Adrien Goëffon LERIA, University of Angers, France.

Jin-Kao Hao LERIA, University of Angers, France.

Masud Hasan Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh.

Peter A. J. Hilbers Eindhoven University of Technology, The Netherlands.

Jan Holub Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, Czech Republic.

Sun-Yuan Hsieh Department of Computer Science and Information Engineering, Institute of Medical Informatics, Institute of Manufacturing Information and Systems, National Cheng Kung University, Tainan, Taiwan.

Chao-Wen Huang Department of Computer Science and Information Engineering, National Cheng Kung University Tainan, Taiwan.

Costas S. Iliopoulos Department of Computer Science, King’s College London, London, UK & Digital Ecosystems & Business Intelligence Institute, Curtin University, Perth, Australia.

Ming-Yang Kao Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA

Radha Krishna Murthy Karuturi Computational and Systems Biology, Genome Institute of Singapore.

Varun Khanna Department of Chemistry and Biomolecular Sciences, and ARC Centre of Excellence in Bioinformatics, Macquarie University Sydney, Australia.

Gaurav Kumar Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia

Vamsi Kundeti Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.

Thierry Lecroq LITIS, University of Rouen, France.

Yanchun Liang College of Computer Science and Technology, Jilin University, Changchun, China.

Jana Sperschneider School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.

Alan Wee-Chung Liew School of Information and Communication Technology, Griffith University, Australia.

Christos Makris Computer Engineering and Informatics Department, University of Patras, Rio, Greece.

Ion Mandoiu Computer Science & Engineering Department, University of Connecticut, Storrs, CT, USA.

Ronny S. Mans Eindhoven University of Technology, The Netherlands.

Ahmed Mokaddem Unit of Technologies of Information and Communication (UTIC) and University of Tunis-El Manar, Tunisia.

Giulio Pavesi Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy.

Pierre Peterlongo INRIA Rennes Bretagne Atlantique, Campus de Beaulieu, Rennes, France.

Nadia Pisanti Dipartimento di Informatica, University of Pisa, Italy.

Yu-Qing Qiu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

Mohammed S. Rahman Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh.

Sanguthevar Rajasekaran Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.

Shoba Ranganathan Department of Chemistry and Biomolecular Sciences, and ARC Centre of Excellence in Bioinformatics, Macquarie University Sydney, Australia and Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.

Jean-Michel Richer LERIA, University of Angers, France.

Eric Rivals LIRMM, University Montpellier 2, France.

Céline Rouveirol LIPN, UMR CNRS, Institute Galilée, University Paris-Nord, France.

Irena Rusu LINA UMR CNRS 6241, University of Nantes, France.

Leena Salmela Department of Computer Science, University of Helsinki, Finland.

Martin Schiller School of Life Sciences, University of Nevada Las Vegas, USA.

Marinella Sciortino Department of Mathematics and Applications, University of Palermo, Italy.

Eduardo Sontag Department of Mathematics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.

Jana Sperschneider School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia.

Alexandros Stamatakis The Exelixis Lab, Department of Computer Science, Technische Universität München, Germany.

Jorma Tarhio Department of Computer Science and Engineering, Aalto University, Espoo, Finland.

Evangelos Theodoridis Computer Engineering and Informatics Department, University of Patras, Rio, Greece.

Julie Thompson Institute of Genetics and Molecular and Cellular Biology, Illkirch, France.

Mathew Vadas Vascular Biology Laboratory, Centenary Institute, Sydney, Australia.

Paola Vera-Licona Institut Curie and INSERM, Paris, France.

Stéphane Vialette IGM, University Paris-Est, Champs-sur-Marne, Marne-la-Vallée, France.

Chen Wang CSIRO ICT Centre, Australia.

Roger W. Wang Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.

Shuqin Wang College of Computer Science and Technology, Jilin University, Changchun, China.

Yan Wang College of Computer Science and Technology, Jilin University, Changchun, China.

H. Todd Wareham Department of Computer Science, Memorial University of Newfoundland, St. John’s, Canada.

Jeff C. F. Wong Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.

Ling-Yun Wu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

Xiao Yang Department of Electrical and Computer Engineering, Bioinformatics and Computational Biology program, Iowa State University, Ames, IA, USA.

Paul D. Yoo School of information Technologies, The University of Sydney, Australia.

Federico Zambelli Department of Biomolecular Sciences and Biotechnology, University of Milan, Italy.

Chen Zhang College of Computer Science and Technology, Jilin University, Changchun, China.

Shihua Zhang Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

Xiang-Sun Zhang Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

Xing-Ming Zhao Institute of Systems Biology, Shanghai University, China.

Bing Bing Zhou School of information Technologies, The University of Sydney, Australia.

Chunguang Zhou College of Computer Science and Technology, Jilin University, Changchun, China.

You Zhou College of Computer Science and Technology, Jilin University, Changchun, China.

Albert Y. Zomaya School of Information Technologies, The University of Sydney, Australia.

Wiley Series on

Bioinformatics: Computational Techniques and Engineering

Bioinformatics and computational biology involve the comprehensive application of mathematics, statistics, and computer science to the understanding of living systems. Research and development in these areas require cooperation among specialists from the fields of biology, computer science, mathematics, statistics, physics, and related sciences. The objective of this book series is to provide timely treatments of the different aspects of bioinformatics spanning theory, new and established techniques, technologies and tools, and application domains. This series emphasizes algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Series Editors: Professor Yi Pan and Professor Albert Y. Zomayapan@cs.gsu.edu and albert.zomaya@sydney.edu.au

Knowledge Discovery in Bioinformatics: Techniques, Methods, and ApplicationsXiaohua Hu and Yi Pan

Grid Computing for Bioinformatics and Computational BiologyEdited by El-Ghazali Talbi and Albert Y. Zomaya

Bioinformatics Algorithms: Techniques and ApplicationsIon Mandiou and Alexander Zelikovsky

Analysis of Biological NetworksEdited by Björn H. Junker and Falk Schreiber

Computational Intelligence and Pattern Analysis in Biological InformaticsEdited by Ujjwal Maulik, Sanghamitra Bandyopadhyay, and Jason T. L. Wang

Mathematics of Bioinformatics: Theory, Practice, and ApplicationsMatthew He and Sergey Petoukhov

Algorithms in Computational Molecular Biology: Techniques, Approaches and ApplicationsEdited by Mourad Elloumi and Albert Y. Zomaya

PART I

STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES

CHAPTER 1

STRING DATA STRUCTURES FOR COMPUTATIONAL MOLECULAR BIOLOGY

Christos Makris and Evangelos Theodoridis

1.1 INTRODUCTION

The topic of the chapter is string data structures with applications in the field of computational molecular biology. Let Σ be a finite alphabet consisting of a set of characters (or symbols). The cardinality of the alphabet denoted by |Σ| expresses the number of distinct characters in the alphabet. A string or word is an ordered list of zero or more characters drawn from the alphabet. A word or string w of length n is represented by , where and |w| denotes the length of w. The empty word is the empty sequence (of zero length) and is denoted by ε. A list of characters of w, appearing in consecutive positions, is called a substring of w, denoted by w[i···j], where i and j are the starting and ending positions, respectively. If the substring starts at position 1, then it is called a prefix, whereas if it ends at position n, then it is called a suffix of w. However, an ordered list of characters of w that are not necessarily consecutive is called a subsequence of w.

Strings and subsequences appear in a plethora of computational molecular biology problems because the basic types of DNA, RNA, and protein molecules can be represented as strings—pieces of DNA as strings over the alphabet {,,,}(representing the four bases adenine, cytosine, guanine, and thymine, respectively), pieces of RNA as strings over the alphabet {A,C,G,U} (with uracil replacing thymine), and proteins as strings over an alphabet of 20, corresponding to the 20 amino acid residues.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!