123,99 €
Structural Bioinformatics was the first major effort to show the application of the principles and basic knowledge of the larger field of bioinformatics to questions focusing on macromolecular structure, such as the prediction of protein structure and how proteins carry out cellular functions, and how the application of bioinformatics to these life science issues can improve healthcare by accelerating drug discovery and development. Designed primarily as a reference, the first edition nevertheless saw widespread use as a textbook in graduate and undergraduate university courses dealing with the theories and associated algorithms, resources, and tools used in the analysis, prediction, and theoretical underpinnings of DNA, RNA, and proteins. This new edition contains not only thorough updates of the advances in structural bioinformatics since publication of the first edition, but also features eleven new chapters dealing with frontier areas of high scientific impact, including: sampling and search techniques; use of mass spectrometry; genome functional annotation; and much more. Offering detailed coverage for practitioners while remaining accessible to the novice, Structural Bioinformatics, Second Edition is a valuable resource and an excellent textbook for a range of readers in the bioinformatics and advanced biology fields. Praise for the previous edition: "This book is a gold mine of fundamental and practical information in an area not previously well represented in book form." --Biochemistry and Molecular Education "... destined to become a classic reference work for workers at all levels in structural bioinformatics...recommended with great enthusiasm for educators, researchers, and graduate students." --BAMBED "...a useful and timely summary of a rapidly expanding field." --Nature Structural Biology "...a terrific job in this timely creation of a compilation of articles that appropriately addresses this issue." --Briefings in Bioinformatics
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 2250
Veröffentlichungsjahr: 2011
CONTENTS
Foreword
Preface
Acknowledgments
Contributors
Section I DATA COLLECTION, ANALYSIS, AND VISUALIZATION
1 DEFINING BIOINFORMATICS AND STRUCTURAL BIOINFORMATICS
WHAT IS BIOINFORMATICS?
TECHNICAL CHALLENGES WITHIN STRUCTURAL BIOINFORMATICS
INTEGRATING STRUCTURAL DATA WITH OTHER DATA SOURCES
REFERENCES
2 FUNDAMENTALS OF PROTEIN STRUCTURE
THE IMPORTANCE OF PROTEIN STRUCTURE
THE PRIMARY STRUCTURE OF PROTEINS: THE AMINO ACID SEQUENCE
THE SECONDARY STRUCTURE OF PROTEINS: THE LOCAL THREE-DIMENSIONAL STRUCTURE
THE TERTIARY STRUCTURE OF PROTEINS: THE GLOBAL THREE-DIMENSIONAL STRUCTURE
THE QUATERNARY STRUCTURE OF PROTEINS: ASSOCIATIONS OF MULTIPLE POLYPEPTIDE CHAINS
CONCLUSION
MORE INFORMATION ON THE INTERNET
REFERENCES
3 FUNDAMENTALS OF DNA AND RNA STRUCTURE
INTRODUCTION
CHEMICAL STRUCTURE OF NUCLEIC ACIDS
BASE-PAIR GEOMETRY
CONFORMATION OF THE SUGAR PHOSPHATE BACKBONE
STRUCTURES OF NUCLEIC ACIDS
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
4 COMPUTATIONAL ASPECTS OF HIGH-THROUGHPUT CRYSTALLOGRAPHIC MACROMOLECULAR STRUCTURE DETERMINATION
INTRODUCTION
HIGH-THROUGHPUT STRUCTURE DETERMINATION
DATA ANALYSIS
HEAVY ATOM LOCATION AND COMPUTATION OF EXPERIMENTAL PHASES
DENSITY MODIFICATION
MOLECULAR REPLACEMENT
REFINEMENT
VALIDATION
CHALLENGES TO AUTOMATION
CONCLUSIONS
REFERENCES
5 MACROMOLECULAR STRUCTURE DETERMINATION BY NMR SPECTROSCOPY
INTRODUCTION TO PROTEIN STRUCTURE DETERMINATION BY NMR
PREPARATION OF PROTEIN SAMPLES FOR NMR
PROTOCOL FOR PROTEIN STRUCTURE DETERMINATION BY NMR
PROBA BILISTIC APPROACHES AND AUTOMATION
DATABASES FOR BIOMOLECULAR NMR
ACKNOWLEDGMENTS
REFERENCES
6 ELECTRON MICROSCOPY IN THE CONTEXT OF STRUCTURAL SYSTEMS BIOLOGY
INTRODUCTION
ELECTRON OPTICS AND IMAGE FORMATION
THREE-DIMENSIONAL RECONSTRUCTION
COMBINATION WITH OTHER APPROACHES: HYBRID METHODS
FUTURE DIRECTIONS
ACKNOWLEDGMENTS
REFERENCES
7 STUDY OF PROTEIN THREE-DIMENSIONAL STRUCTURE AND DYNAMICS USING PEPTIDE AMIDE HYDROGEN/ DEUTERIUM EXCHANGE MASS SPECTROMETRY (DXMS) AND CHEMICAL CROSS-LINKING WITH MASS SPECTROMETRY TO CONSTRAIN MOLECULAR MODELING
INTRODUCTION
OVERVIEW OF DXMS METHODOLOGY
EXAMPLES OF APPLICATIONS OF DXMS
DXMS ANALYSIS: CONCLUDING REMARKS
HYBRID BIOCHEMICAL/BIOINFORMATICS APPROACH TO LOWRESOLUTION STRUCTURE DETERMINATION USING CHEMICAL CROSS-LINKERS AND MASS SPECTROMETRY
RECENT APPLICATIONS
CONCLUSION WITH RESPECT TO CHEMICAL CROSS-LINKAGE METHODS
ACKNOWLEDGMENTS
REFERENCES
8 SEARCH AND SAMPLING IN STRUCTURAL BIOINFORMATICS
INTRODUCTION
SAMPLING STRUCTURAL SPACE
SEARCH METHODS
DATA ANALYSIS AND REDUCTION
CONCLUDING REMARKS
ACKNOWLEDGMENTS
REFERENCES (Including Remarks on Recommended Further Reading)
9 MOLECULAR VISUALIZATION
INTRODUCTION
THE PROCESS OF MOLECULAR VISUALIZATION
MOLECULAR MODELS
MOLECULAR VISUALIZATION PROGRAMS
REFERENCES
Section II DATA REPRESENTATION AND DATABASES
10 THE PDB FORMAT, mmCIF FORMATS, AND OTHER DATA FORMATS
INTRODUCTION
THE PDB FORMAT
mmCIF: A DICTIONARY-BASED APPROACH TO DATA DESCRIPTION
THE PDB EXCHANGE AND OTHER DATA DICTIONARIES
SUPPORTING OTHER FORMATS
SUPPORTING APPLICATION PROGRAM INTERFACES
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
11 THE WORLDWIDE PROTEIN DATA BANK
INTRODUCTION
DATA ACQUISITION AND PROCESSING
DATA ACCESS
RCSB PDB
PDBj
PDBe
BMRB
FUTURE
ACKNOWLEDGMENTS
REFERENCES
12 THE NUCLEIC ACID DATABASE
INTRODUCTION
DATA PROCESSING AND VALIDATION
THE DATABASE
DISTRIBUTION OF INFORMATION
APPLICATIONS OF THE NDB
ACKNOWLEDGMENT
REFERENCES
13 OTHER STRUCTURE-BASED DATABASES
INTRODUCTION
THE ADDED VALUE PHILOSOPHY
OTHER PRIMARY INFORMATION RESOURCES
SECONDARY RESOURCES
STRUCTURAL DATABASES OF THE FUTURE
REFERENCES
Section III DATA INTEGRITY AND COMPARATIVE FEATURES
14 STRUCTURAL QUALITY ASSURANCE
INTRODUCTION
STRUCTURES AS MODELS
AIMS
ERROR ESTIMATION AND PRECISION
ERROR ESTIMATES IN X-RAY CRYSTALLOGRAPHY
ERROR ESTIMATES IN NMR SPECTROSCOPY
ERRORS IN DEPOSITED STRUCTURES
STEREOCHEMICAL PARAMETERS
SOFTWARE FOR QUALITY CHECKS
PROCHECK
WHAT_CHECK
QUALITY INFORMATION ON THE WEB
PDBREPORT–WHAT_CHECK Results
CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
15 THE IMPACT OF LOCAL ACCURACY IN PROTEIN AND RNA STRUCTURES: VALIDATION AS AN ACTIVE TOOL
INTRODUCTION
METHODOLOGY OF ALL-ATOM CONTACT ANALYSIS
COMPLEMENTARY RELATIONSHIP WITH MORE TRADITIONAL CRITERIA
USING MOLPROBITY AND RELATED FACILITIES
RNA: VALIDATION, STRUCTURE IMPROVEMENT, AND CONFORMER STRINGS
USING LOCAL ACCURACY IN BIOINFORMATIC ANALYSES
RELEVANT WEB SITES
REFERENCES
16 STRUCTURE COMPARISON AND ALIGNMENT
INTRODUCTION
GENERAL APPROACH TO STRUCTURE COMPARISON AND ALIGNMENT
COMPARISON ALGORITHM AND OPTIMIZATION
HOW WELL ARE WE DOING?
SAMPLE RESULTS FROM STRUCTURE COMPARISON AND ALIGNMENT
MULTIPLE STRUCTURE ALIGNMENT
FLEXIBLE STRUCTURE ALIGNMENT
MAPPING PROTEIN FOLD SPACE
THE IMPACT OF STRUCTURAL GENOMICS
THE FUTURE
ACKNOWLEDGMENTS
REFERENCES
17 PROTEIN STRUCTURE EVOLUTION AND THE SCOP DATABASE
INTRODUCTION
THE EVOLUTION OF PROTEINS
THE EVOLUTION OF FOLD
THE EVOLUTION OF ENZYMATIC CATALYSIS
THE COMPARISON OF STRUCTURES
SCOP HIERARCHY
CLASSES
FOLDS
SUPERFAMILIES
FAMILIES
ORGANIZATION AND CAPABI LITIES OF THE SCOP RESOURCE
BROWSING THROUGH THE SCOP HIERARCHY
LINKING TO OTHER STRUCTURE AND SEQUENCE DATABASES
SCOP REFINEMENTS TO ACCOMMODATE STRUCTURAL GENOMICS
NEW FEATURES IN SCOP
INTEGRATION WITH OTHER DATABASES
RECLASSIFICATION
SCOP USAGE
SCOP FROM A USER’S PERSPECTIVE
REFERENCES
18 THE CATH DOMAIN STRUCTURE DATABASE
INTRODUCTION
HISTORICAL DEVELOPMENT
CURRENT METHODOLOGIES FOR IDENTIFYING STRUCTURAL SIMILARITIES AND EVOLUTIONARY RELATIONSHIPS IN CATH
CLASSIFYING CLOSE HOMOLOGUES (CHOPCLOSE)
IDENTIFICATION OF DOMAIN BOUNDARIES
METHODS TO DETECT SEQUENCE AND STRUCTURAL RELATIVES
STRUCTURE-BASED METHODS FOR IDENTIFYING STRUCTURAL HOMOLOGUES AND RELATED FOLDS (SSAP AND CATHEDRAL)
SSAP—SEQUENTIAL STRUCTURE ALIGNMENT PROGRAM
CATHEDRAL—CATHS EXISTING DOMAIN RECOGNITION ALGORITHM
METHODS FOR GENERATING MULTIPLE STRUCTURE ALIGNMENTS (CORA) AND PROTOCOLS FOR USING 3D TEMPLATES USED TO IDENTIFY DISTANT STRUCTURAL RELATIONSHIPS
SEQUENCE, STRUCTURAL, AND FUNCTIONAL VALIDATION OF HOMOLOGY
THE DICTIONARY OF HOMOLOGOUS SUPERFAMILIES (DHS)
THE GENE3D RESOURCE
THE CATH WEB SITE AND SERVER
THE CATHEDRAL SERVER
IS FOLD CLASSIFICATION A LEGITIMATE REPRESENTATION OF DOMAIN STRUCTURE SPACE?
POPULATION OF SUPERFAMILIES AND FAMILIES WITHIN FOLDS
FOLD CLASSIFICATION IN CATH
ACKNOWLEDGMENTS
REFERENCES
Section IV STRUCTURAL AND FUNCTIONAL ASSIGNMENT
19 SECONDARY STRUCTURE ASSIGNMENT
SECONDARY STRUCTURE CONCEPTS
ASSIGNMENT METHODS
SECONDARY STRUCTURE STATISTICS AND COMPARISON
APPLICATIONS OF SECONDARY STRUCTURE
CONCLUSION
ABBREVIATIONS
ACKNOWLEDGMENTS
REFERENCES
20 IDENTIFYING STRUCTURAL DOMAINS IN PROTEINS
INTRODUCTION
DEFINITIONS OF STRUCTURAL DOMAINS
ALGORITHMS FOR IDENTIFYING STRUCTURAL DOMAINS: INSIGHT INTO HISTORY AND METHODOLOGY
ALGORITHMS FOR IDENTIFYING STRUCTURAL DOMAINS: IN-DEPTH
DOMAIN ASSIGNMENTS: EVALUATING AUTOMATIC METHODS
DOMAIN PREDICTION BASED ON SEQUENCE INFORMATION
CONCLUSIONS AND PERSPECTIVES
WEB RESOURCES
REFERENCES
21 INFERRING PROTEIN FUNCTION FROM STRUCTURE
INTRODUCTION
WHAT INFORMATION CAN BE OBTAINED FROM THREE-DIMENSIONAL PROTEIN STRUCTURES?
INFERRING FUNCTION FROM STRUCTURE
STRUCTURAL GENOMICS: HIGH-THROUGHPUT FUNCTION PREDICTION
CONCLUSIONS
REFERENCES
22 STRUCTURAL ANNOTATION OF GENOMES
INTRODUCTION
AVAILABILITY OF COMPLETED GENOMES
METHODOLOGIES FOR IDENTIFYING STRUCTURAL PROTEIN DOMAINS IN GENOMES
HOW WELL ARE GENOMES COVERED BY STRUCTURAL DOMAIN ANNOTATION?
CAN WE DETERMINE ALL THE STRUCTURES PRESENT IN THE GENOMES?—STRUCTURAL ANNOTATION OF GENOMES AND STRUCTURAL GENOMICS
WHAT CAN STRUCTURAL GENOME ANNOTATION TELL US ABOUT EVOLUTION?
STRUCTURAL GENOME ANNOTATION RESOURCES
SUPERFAMILY
3D GENOMICS
SUMMARY
REFERENCES
23 EVOLUTION STUDIED USING PROTEIN STRUCTURE
STRUCTURES AS EVOLUTIONARY UNITS
PHYLOGENY BY PROTEIN DOMAIN CONTENT
THE LAST UNIVERSAL COMMON ANCESTOR (LUCA)
ANCIENT GEOCHEMICAL ENVIRONMENT REFLECTED BY THE MODERN STRUCTURE REPERTOIRE
THE EVOLUTIONARY HISTORY OF PROTEIN DOMAINS
FILLING IN FOLD SPACE: CURRENT LIMITATIONS
CONCLUSION
REFERENCES
Section V MACROMOLECULAR INTERACTIONS
24 ELECTROSTATIC INTERACTIONS
INTRODUCTION
OVERVIEW OF FUNCTIONAL ROLES OF ELECTROSTATICS
BRIEF HISTORY
THE NEED FOR MORE EFFICIENT AND SCALABLE ELECTROSTATICS METHODS
POISSON–BOLTZMANN THEORY
NUMERICAL SOLUTION OF THE POISSON–BOLTZMANN EQUATION
APPLICATIONS
ACKNOWLEDGMENTS
REFERENCES
25 PREDICTION OF PROTEIN–NUCLEIC ACID INTERACTIONS
INTRODUCTION
MOTIVATION
POTENTIAL FUNCTIONS FOR PROTEIN–NUCLEIC ACID INTERACTIONS
FLEXIBILITY IN PROTEIN/NUCLEIC ACID COMPLEXES
APPLICATIONS
FUTURE WORK AND CRITICAL CHALLENGES
REFERENCES
26 PREDICTION OF PROTEIN–PROTEIN INTERACTIONS FROM EVOLUTIONARY INFORMATION
INTRODUCTION
PREDICTION OF INTERACTING REGIONS
PREDICTION OF INTERACTION PARTNERS
FUTURE TRENDS
REFERENCES
27 DOCKING METHODS, LIGAND DESIGN, AND VALIDATING DATA SETS IN THE STRUCTURAL GENOMICS ERA
INTRODUCTION
DOCKING AND SCORING
DRUG DESIGN IN THE STRUCTURAL PROTEOMICS ERA
SUMMARY
REFERENCES
Section VI STRUCTURE PREDICTION
28 CASP AND OTHER COMMUNITY-WIDE ASSESSMENTS TO ADVANCE THE FIELD OF STRUCTURE PREDICTION
A MEASURE FOR SUCCESS
COMMUNITY BENCHMARK HISTORY AND FINDINGS
OVERALL PROGRESS
CASP7
WHERE DO WE GO FROM HERE?
ACKNOWLEDGMENT
WEB SITES
REFERENCES
29 PREDICTION OF PROTEIN STRUCTURE IN 1D: SECONDARY STRUCTURE, MEMBRANE REGIONS, AND SOLVENT ACCESSIBILITY
INTRODUCTION
METHODS
PROGRAMS AND PUBLIC SERVERS
PRACTICAL ASPECTS
EMERGING AND FUTURE DEVELOPMENTS
FURTHER READING
ACKNOWLEDGMENTS
REFERENCES
30 HOMOLOGY MODELING
INTRODUCTION
STEP 1—TEMPLATE RECOGNITION AND INITIAL ALIGNMENT
STEP 2—ALIGNMENT CORRECTION
STEP 3—BACKBONE GENERATION
STEP 4—LOOP MODELING
STEP 5—SIDE CHAIN MODELING
STEP 6—MODEL OPTIMIZATION
STEP 7—MODEL VALIDATION
STEP 8—ITERATION
ACKNOWLEDGMENTS
REFERENCES
31 FOLD RECOGNITION METHODS
INTRODUCTION
THEORETICAL BACKGROUND FOR FOLD RECOGNITION
PROTEINS AS SEEN BY A BIOLOGIST
PROTEINS AS SEEN BY A PHYSICIST
SUMMARY
REFERENCES
32 DE NOVO PROTEIN STRUCTURE PREDICTION: METHODS AND APPLICATION
INTRODUCTION
REDUCED COMPLEXITY MODELS
SCORING FUNCTIONS FOR REDUCED COMPLEXITY MODELS
ROSETTA DE NOVO STRUCTURE PREDICTION
HIGH-RESOLUTION STRUCTURE PREDICTION
CASP: EVALUATION OF STRUCTURE PREDICTIONS
BIOLOGICAL APPLICATIONS OF STRUCTURE PREDICTION
FUTURE DIRECTIONS
REFERENCES
33 RNA STRUCTURAL BIOINFORMATICS
INTRODUCTION
METHODS FOR PREDICTING SECONDARY STRUCTURES
THREE-DIMENSIONAL MODELING METHODOLOGY
CONCLUSION
WEB RESOURCES
SUGGESTED READINGS
REFERENCES
Section VII THERAPEUTIC DISCOVERY
34 STRUCTURAL BIOINFORMATICS IN DRUG DISCOVERY
HISTORIC DEVELOPMENT OF DRUG DISCOVERY
MODERN DRUG DISCOVERY
GENERATING PROTEIN STRUCTURES
DRUG TARGETS
LEAD IDENTIFICATION
LEAD OPTIMIZATION
STRUCTURAL BIOINFORMATICS DATABASES
TOWARD PERSONALIZED MEDICINE
CONCLUSION AND FUTURE DIRECTIONS
ACKNOWLEDGMENTS
REFERENCES
FURTHER READING
35 B-CELL EPITOPE PREDICTION
INTRODUCTION
THE PROBLEM OF B-CELL EPITOPE PREDICTION
ANTIBODY STRUCTURE AND FUNCTION
EXPERIMENTAL METHODS USED FOR B-CELL EPITOPE IDENTIFICATION
HISTORY OF ATTEMPTS AT B-CELL EPITOPE PREDICTION
BIOINFORMATICS METHODS FOR B-CELL EPITOPE PREDICTION
APPLICATIONS
CONCLUSION
ABBREVIATIONS
ACKNOWLEDGMENT
REFERENCES
Section VIII FUTURE CHALLENGES
36 METHODS TO CLASSIFY AND PREDICT THE STRUCTURE OF MEMBRANE PROTEINS
THE BIOLOGICAL MEMBRANE
THE FOLDING PROCESS OF MEMBRANE PROTEINS
WHY IS IT DIFFICULT TO SOLVE MEMBRANE PROTEIN 3D STRUCTURES?
STRUCTURAL GENOMICS AND MEMBRANE PROTEINS
COMPUTATIONAL METHODS FOR THE IDENTIFICATION OF MEMBRANE PROTEINS AND THE PREDICTION OF THEIR STRUCTURES
WEB-AVAILABLE DATA RESOURCES FOR MEMBRANE PROTEINS
CLASSIFICATION OF MEMBRANE PROTEINS
CONCLUSIONS
REFERENCES
37 PROTEIN MOTION: SIMULATION
INTRODUCTION
PROTEIN MOTION TIMESCALES, SIZE, AND SIMULATION
COARSE-GRAINED METHODS
CLASSICAL MOLECULAR MECHANICS
QM–MM METHODS
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
38 THE SIGNIFICANCE AND IMPACTS OF PROTEIN DISORDER AND CONFORMATIONAL VARIANTS
INTRODUCTION
PROTEIN DISORDER: UNDERSTANDING THE REALM OF “INVISIBLE”
PROTEIN CONFORMATIONAL VARIANTS AND ENSEMBLES
FUTURE DIRECTIONS
REFERENCES
39 PROTEIN DESIGNABILITY AND ENGINEERING
INTRODUCTION
PROTEIN STRUCTURAL UNIVERSE
DETERMINANTS OF PROTEIN DOMAIN EVOLUTION
MECHANISMS OF PROTEIN DOMAIN EVOLUTION
PROTEIN ALPHABET
LESSONS FOR ENGINEERING STABLE PROTEINS
PROTEIN ENGINEERING: US VERSUS NATURE
CHALLENGES IN PROTEIN DESIGN
CONCLUSIONS
REFERENCES
40 STRUCTURAL GENOMICS OF PROTEIN SUPERFAMILIES
INTRODUCTION
NYSGXRC
BIOMEDICAL THEME TARGETS: BACKGROUND AND MOTIVATION
BIOMEDICAL THEME TARGETS: SELECTION AND PROGRESS
BIOMEDICAL THEME TARGETS: SELECTED EXAMPLES
PROTEIN TYROSINE PHOSPHATASE (PTPs)
INSULINOMA-ASSOCIATED PROTEIN 2 (IA-2)
SMALL C-TERMINAL DOMAIN PHOSPHATASE 3
CHRONOPHIN
OTHER PHOSPHATASES
BIOMEDICAL THEME TARGETS: POTENTIAL IMPACT ON DRUG DISCOVERY
FRAGMENT CONDENSATION LEAD DISCOVERY STRATEGY APPLIED TO PTP1B
VIRTUAL SCREENING STRATEGY APPLIED TO PP2Cα
BIOMEDICAL THEME TARGETS: CONCLUSION
COMMUNITY-NOMINATED TARGETS: MOTIVATION
COMMUNITY-NOMINATED TARGETS: SELECTION AND PROGRESS
COMMUNITY-NOMINATED TARGETS: FUNCTIONAL CHARACTERIZATION
COMMUNITY-NOMINATED TARGETS: CONCLUSION
OVERALL CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
INDEX
Copyright © 2009 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
ISBN 978-0-470-18105-8
FOREWORD
The quality of the coverage and the timeliness of the first edition of Structural Bioinformatics led to its wide usage. In turn, the collection has been adopted by the community as a defining articulation for the utility of bioinformatics and structural biology applied to address a range of functional studies in 21st Century Biology. I personally found the book to be read by students, postdoctoral fellows and more senior researchers around the world. The success reflected the excitement in the growing impact of both domains and also helped to accelerate that impact. As pointed out in the Introduction, bioinformatics is now a mainstream activity within the biological sciences; similarly, the implementation of the structural genomics initiative worldwide as a logical and necessary follow up to the Human Genome Project represents the consensus that structure is indeed important for understanding the mechanisms of molecular function. The increased impact and rapid progress during the 6 years, since the first edition was published, demonstrate the extraordinary significance of the interface between structural biology and informatics. Such significant advances indicate the clear need for an update, which has now been provided through the editorial efforts of Gu and Bourne and the writing team of leading researchers who bring the reader to the edge of the frontier.
The second edition, simultaneously as a textbook and an expert monograph, contains a balanced set of contributions, which include updates and advances over the past 6 years and new innovative domains made possible by the sustained application of structural bioinformatics, which were undoubtedly catalyzed by the first edition. Just as was notable about the first edition, this new comprehensive collection fully captures the spirit of excitement at the “bleeding edge” of biodiscovery. The frontier between computing and biology itself reflects the decades of extraordinary progress and revolutionary advances in both domains. During the past 6 years, the data provided by the completion of the full human and numerous model genomes has accelerated the importance of computing for biology, and brought new funding opportunities and research training needs. At the same time, the deeper insight from complete genome sequencing has been the need to look beyond individual genes to systems and to look across multiple scales of biology, ultimately to establish an integrative view of biology based on experimentation and computation. With funding from within mainstream science programs and the increased recognition by the experimental community, the combined use of information technology and quantitative approaches are central to building an integrative view of biology. The superb collection of articles in this second edition speaks directly and powerfully for the role of structural bioinformatics in this effort for both the basic and the applied life sciences.
Additional academic training programs that include structural bioinformatics have been introduced consistently over the past 6 years. Yet, faculty in top flight research institutions who do research and teach in first class bioinformatics programs remain overwhelmed by the demands; indeed, the challenges of bioinformatics education has been a common theme at the major professional meetings in the field. Thus, textbooks of the highest quality and clarity are essential today, as we all struggle to determine the right curriculum and the right content to train what will be the first generation of students who are truly bioinformaticians. Today, every young aspiring biologist wants to learn about bioinformatics, as do those training in computer science and other quantitative sciences. Understanding the underlying assumptions and intricacies of any bioinformatics algorithm is necessary for proper usage and interpretation of the results obtained with the tools. To teach the large and growing numbers of young scientists who wish to utilize or even contribute to bioinformatics requires authoritative treatments that provide the basis to pull in a new generation of scientists. Such works must also use the best treatments possible to reach a much larger audience, including mature scientists who wish to retrain themselves, and need to set a standard for training everywhere in the world. This collection admirably meets those goals and is a must read for those entering the field and for all of us committed to understanding the interplay of structure and function. By assembling the best thinkers to address systematically all of the challenges at the next stage of the genome effort, Gu and Bourne have created a book that will serve to educate the next generation who will be the future young investigators who will create the tools required to interpret the ever advancing frontier of biology.
At the same time, while the rapid pace of research progress in structural bioinformatics has driven the need for a second edition, the prehistory of modern structural bioinformatics, as is well described in this book, has been retained to remind the readers of some fundamental challenges that should not be neglected and forgotten. This update is not an extensive replicate with minor tweaks of the first edition; instead, the role of historical context and origins of structural bioinformatics as they contribute to present advances are discussed, such as the biannual Critical Assessment of Structure Prediction (CASP) and the initial “exponentiation” of progress in biology resulting from the application of advanced computing technology to structural biology. For more historical content, I refer any reader to the forward for the first edition of Structural Bioinformatics or any of my status reports (over the past decade) on computational biology, readily accessible via the internet (or PubMed) in today’s world of “e-knowledge.”
What is most notable and important for the potential readership of this edition is that the fields of computational biology and bioinformatics have gone from uncertainty and neglect to the buzz words on everyone’s lips in less than 10 years. While newly created, contemporary bioinformatics and computational biology training programs are in the process of being incorporated into every biological science domain, and those working at the interface will have increased options even within core disciplines. The reason is obvious: we are already living in the future as biologists; we feel fully the impact of having completely sequenced genomes; we are a part of the transition to high throughput and high information content biological research, and to asking global or systemic questions as the norm, rather than using individual macromolecule-specific probes. Early in this century, the extraordinary, early, and even unanticipated successes of the genome project enabled computer search and modeling techniques to open up new vistas in biology.
The continuing availability of complete genomes coupled with high throughput experimental biological methods, structure determination, and annotated databases will definitely advance our current understanding of protein structures as they relate to biological function, processes, and evolution—a basic research curiosity that has captured central stage in the community. The challenge is to decode the rich information content implicit in genomes and apply the resulting knowledge in service to society. Obtaining a better understanding of biological processes will be achieved through the integration of a variety of methods including the use of structural information, and should be extended in the future to provide improved health care delivery. At the same time, we are only learning how best to exploit computer and information technology to understand biological mechanisms; the educational content of Structural Bioinformatics provides a perspective on our progress and educational content to enable the full potential of systematic computational analysis.
Those of us concerned with macromolecular structure, and protein science in particular, have long spoken the mantra: form follows function, a given function requires a specific structure, or, conversely, structure in turn can be seen to determine function. That is, if we know the structure, we can infer many aspects of biochemical and sometimes, even cellular function which can subsequently be experimentally tested. Indeed, given that we now “know” many gene sequences gained implicitly from sequenced genomes, such resources provide the basis for more refined algorithms that either leverage structural information or improve our understanding of protein structures to model them explicitly and more accurately. Subsequently, these improvements facilitate our ability to predict functionality and greatly reduce the search space for experimental efforts by providing a guided focus to test only the most likely functions. Furthermore, computational modeling of these molecules, for both static and dynamic processes, can provide a detailed description of biological processes at the atomic level, an alternative to traditional biological cartoons which have been the descriptive ways in which we biologists think.
The field of structural bioinformatics, to connect the abstract to the practical, continues to push boundaries beyond what was previously thought impossible. The basis for all structural bioinformatics, the central community database for structural biology, is the Protein Data Bank (PDB) and the macromolecule structures it contains. The growth of this resource has been accelerated by structural genomics initiatives, which continue to sustain and enhance the discovery of novel structures and folds paving the way for new opportunities of discovery and insight through structural bioinformatics. Some examples include a more accurate genome annotation and the basis for clarifying evolutionary related questions. The book addresses key points for cutting edge research such as these, beginning with definitions, and conveys the current scope of research and knowledge of protein structures.
Central to this wonderful collection and insightful articles are advances that have been made in building the infrastructure for an integrative approach to understanding biosystems through the power of understanding protein structures and the implications for the mechanics of function. A survey of current resources is provided to highlight the foundation where future developments are needed to integrate experimental data better and provide the basis for abstraction and generalization. Overall, the individual chapters outline the suite of major basic life science questions such as the status of efforts to predict protein structure and how proteins carry out cellular functions, and also the applied life science questions such as how structural bioinformatics can improve health care through accelerating drug discovery. Dictated by the process of uncovering the mechanisms through which macromolecules act, this journey of discovery into the regulation of life’s processes will keep biologists entertained for centuries to come. The second edition book is a great guidebook, even more informative than the earlier collection, and represents the basis for this journey. I highly recommend it to all members of our community.
John C. WooleyAssociate Vice Chancellor, ResearchUniversity of California, San DiegoLa Jolla, CA
PREFACE
Six years have elapsed since the first edition of this book was published. The field of structural bioinformatics has sustained a high level of excitement in that time, leading to innovative developments and considerable progress throughout the topics covered in the first edition and in the extension to many new domains. Through the efforts of the authors of this new edition, we have tried to capture these developments and to provide an accurate, detailed view of the current field. One way of picturing the advances or defining the “structural” change is relatively straightforward; namely, the number of experimental macromolecular structures has doubled since the first edition of this book was published. The Protein Structure Initiative has also led to an increase in the number of novel structures and folds. Overall, the continued growth in experimental structures has created an even richer data source for much of the work described herein. But numbers do not tell the whole story. The complexity of structures, the methods used, the ways structure is represented, our ability to model structures, our understanding of proteomes and their structural coverage, and so on, have also changed.
Describing the advances in “bioinformatics” per se is more difficult. Change in this case reflects both scientific advances and an increase in recognition within the biological sciences for the importance of computational methods. Due in part to the explosion in high-throughput experimental methods, bioinformatics is certainly more mainstream than it was 6 years ago and most experimental (i.e., non-computational) life scientists would acknowledge the role bioinformatics now plays in furthering our understanding of living systems. Some years from now, whether or not bioinformatics will exist as a separate entity, rather than as a core effort in every biological science department, is a subject for debate. What is important here is that there is an active effort to apply computational methods to a rapidly growing corpus of macromolecular structure data. Our primary goal is to provide a comprehensive description of what this field has accomplished to date and to make the reader aware of what we have gained and could gain in the future toward our understanding of living systems through the study of macromolecular structure and the continued, rigorous application of bioinformatics. As such, this edition should provide a fully current, useful reference to those already in the field, and a suitable text for those educating others. The first edition already encouraged new scholars to enter the field and we believe the case for engaging in structural bioinformatics is stronger than ever.
To meet this goal, the second edition includes not only updated chapters, but also new chapters covering mass spectrometry, genome annotation, immunology, protein dynamics and disorder, membrane proteins, protein design capabilities, and evolutionary biology as they relate to macromolecular structure.
Macromolecular structure is often underappreciated and bypassed in practice during the current era of high-throughput biology, especially since researchers can jump directly from genomic sequence to phenotype and conduct biochemical studies on large-scale protein– protein interactions that are often involved in pathways associated with disease states. While a great deal can be learned from such studies, ultimately the devil is in the details which do not arise from traditional functional studies; the field of molecular biophysics, now termed structural biology, came into existence to obtain and use structures to provide those details. We believe structure will play an ever increasingly important role as genome studies seek to explore deeper into the mechanisms of life. As such, computational approaches that analyze structure are essential and we hope this book will be there to guide you.
We begin by describing the scope of this book and the history of the field (Chapter 1). The remainder of the introductory Section I is devoted to the understanding of the data itself, namely protein, DNA and RNA structure, respectively (Chapters 2 and 3). Understanding the nuances (scope, accuracy, completeness, etc.) of structural data is prerequisite to any effective use of that data. Effective data use in turn requires an understanding of the experiments or experimental method that produce the data. The most popular methods for deriving macromolecular structure data are, in order, X-ray crystallography (Chapter 4), NMR spectroscopy (Chapter 5), and electron microscopy (Chapter 6). Constructing structural models of molecules can also be guided with hydrogen–deuterium and cross-linking experiments coupled with mass spectrometry (Chapter 7). The raw data from these methods are most often a set of Cartesian coordinates representing the positions of the atoms in these structures, which are well suited for analysis by computer, but alternative representations of this information rich content are sometimes needed to conduct wide-scale bioinformatics analysis (Chapter 8). That is, structural biology and structural bioinformatics are inherently visual sciences—the tabular output of atomic coordinates can be useful as input for computation, but not for human insight. The visualization of structure has evolved along with the science and many useful tools, mostly free, are available (Chapter 9).
In the early days of structural biology (up to the late 1970s), those in the field could name all the structures that had been solved, some of which had Nobel prizes attached to them. As the field grew this was no longer possible, and databases of structure data began to appear. Consistent use of structural data contained within these databases (and indeed the construction of the databases themselves) requires consistent data representation and Section II is devoted to this topic. Chapter 10 introduces the common data representations used by today’s software. The field is very fortunate to have scientists who recognize the importance of having a single source of primary data, the worldwide PDB (wwPDB—Chapter 11), from which a variety of secondary resources are derived. Examples of such resources are provided in Chapters 12 and 13.
As the number of structures has increased, much can be learnt from comparative analysis (Section III), where similarities and differences provide new insights. Chapters 14 and 15 describe structure validation, which is important in understanding the accuracy of the data you are dealing with before a 3D comparison and alignment of structures can be made (Chapter 16). When structure comparisons are made and similarities found, reductionism can be applied to make sense of the vast amount of data. Such reductionism leads to classification in various ways, such as by fold, domain, family, and super family (Chapters 17 and 18).
The more we know from comparing structures the more we can learn about structure and functional assignment (Section IV). Secondary structure assignment can now be made consistently and reliably for the majority of structures (Chapter 19). Proteins exist as one or more domains or compact structural and functional units. Hence, automated assignment of domains is important (Chapter 20). Through the structural genomics projects and the NIH Protein Structure Initiative, structure determination is moving from a functional to a genomic initiative. That is, structures were traditionally determined in an effort to elucidate further details about a known function, and these structural efforts were established based on very extensive prior biological, biochemical and often genetic research, and were done in parallel with continuing biological research on functional properties. In contrast, high-resolution structures with no elucidated or known function are being determined at an accelerated pace, thus making functional assignment critical (Chapter 21). The use of structural information to identify distantly related proteins also serves in annotating genomes (Chapter 22) and clarifying evolutionary relationships (Chapter 23).
Proteins do not act in isolation, that is, most proteins do not function by themselves but act as the result of complex protein–protein, protein–ligand and protein–solvent interactions and are often part of larger macromolecular assemblies. Section V describes these interactions beginning with an introduction to electrostatic forces that have a fundamental impact on recognition between molecules (Chapter 24). The majority of these interactions are not captured in the experimental structure of a complex, but as an apo form of the structure with a signature that can be teased out to predict that interaction. Understanding these signatures when found in protein–DNA and protein–RNA interactions (Chapter 25) and in protein–protein interactions (Chapter 26) aids, for example, in the identification of new transcription sites and reconstruction of protein signaling networks. After the sites of interactions are identified, docking of the molecules is simplified, which is important in drug design (Chapter 27).
While the number of structures is increasing rapidly, the number of protein sequences is increasing much more rapidly; thus, the idea of predicting protein structure from its sequence remains an “obsessive” goal (Section VI). Spurred by an unusual biannual competition, referred to as CASP—the Critical Assessment of Protein Structure Prediction (Chapter 28), progress is being made in subcategories of structure prediction efforts within CASP and the field in general. Structure prediction categories include homology modeling (Chapter 30), fold recognition (Chapter 31) and ab initio structure prediction (Chapter 32). Other forms of prediction include secondary structure and membrane components for proteins (Chapter 29). Advances in understanding and predicting RNA structures have also been made and are discussed in Chapter 33.
Structural bioinformatics is playing an increasingly important role in the development of new pharmaceuticals (Section VII). The identification of drug targets, understanding the action of drug binding, and the design of promising leads all involve structural bioinformatics (Chapter 34). In addition to the development of small molecule and peptide-based drugs, contributions are also being made in identifying antigen recognition sites that aid in antibody-based therapeutics (Chapter 35).
Finally, Section VIII identifies challenges at the frontiers of structural bioinformatics. Membrane associated proteins, whose structures are difficult to characterize in vitro and thus are underrepresented experimentally, are one example (Chapter 36). Proteins are not static under physiological conditions, yet understanding the dynamics (Chapter 37) and the impact of disorder and conformational variants (Chapter 38), while important to protein function, are all still poorly understood. As our understanding of protein structure improves, so do our design rules and capacity for engineering new proteins for their functions that improve upon nature or provide the potential for novel processes (Chapter 39). The best way to push back these frontiers is with more structures and enhanced generalizations about their roles; structure genomics is doing just that (Chapter 40) and is thus a fitting place to end our tour of structural bioinformatics.
The words that follow are written by many of the leaders in the field and we thank them for their time and energy in sharing what motivates them to unravel the mysteries of nature, which are so beautifully displayed before us in an ever increasing number of macromolecular structures.
Jenny GuPhilip E. Bourne
Color files of all figures from this book are available for download from the following web address: ftp://ftp.wiley.com/public/sci_tech_med/structural_bioinformatics
ACKNOWLEDGMENTS
“Science does not know its debt to imagination.”
Ralph Waldo Emerson
We are grateful to the University of Texas Medical Branch, the University of unster, and the University of California at San Diego for institutional supportin our research and educational endeavors, including this book. Likewise, we thank the Jeanne Kempner Foundation and the US funding agencies, the National Science Foundation, and the National Institutes of Health for their continual support in the advancement of science. This book could not have been completed without the help of all the contributors who have dedicated their expertise, time, and efforts to this extensive update and indispensable resource for the community—we are greatly indebted to them. Additionally, we would like to thank Wolfgang Bluhm, Kristine Briedis, Naomi Cotton, Lynn Fink, Apostol Gramada, Maria Kontoyianni, Kannan Natarajan, Julia Ponomarenko, Peter Rose, Wayne Townsend-Merino, Ruben Valas, Stella Veretnik, Lei Xie, Song Yang, and Zhanyang Zhu for their assistance and support in reviewing the materials for this book. The successful publication of this edition could not have been achieved without the help and patience of Andrea Baier, Tiffany Williams, and Thomas Moore at Wiley-Blackwell.
On a more personal note, JG would like to thank her family, friends, colleagues and mentors for their continued support in taking on new challenging endeavors. PEB would like to sincerely thank his family, Roma, Melanie, and Scott, for their continued understanding of the role science plays in his life.
CONTRIBUTORS
Paul D. Adams, Physical Biosciences Division, Lawrence Berkeley Laboratory, Berkeley, CA and Department of Bioengineering, University of California, Berkeley, CA
Steven C. Almo, Albert Einstein College of Medicine, Bronx, NY
Russ B. Altman, Departments of Bioengineering & Genetics, Stanford University, Stanford, CA
Claus A. Andersen, Siena Biotech Spa., Siena, Italy
Arash Bahrami, Graduate Program in Biophysics and National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, Madison, WI
Nathan A. Baker, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology, Washington University, St. Louis, MO
Gail J. Bartlett, School of Chemistry, University of Bristol, Cantock’s Close, Clifton, Bristol, UK
Helen M. Berman, RCSB Protein Data Bank, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ
Jeffrey B. Bonanno, Albert Einstein College of Medicine, Bronx, NY
Richard Bonneau, Department of Biology, New York University, New York, NY; and Department of Computer Science, Courant Institute, New York University, New York, NY
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
