Structural Bioinformatics -  - E-Book

Structural Bioinformatics E-Book

0,0
123,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Structural Bioinformatics was the first major effort to show the application of the principles and basic knowledge of the larger field of bioinformatics to questions focusing on macromolecular structure, such as the prediction of protein structure and how proteins carry out cellular functions, and how the application of bioinformatics to these life science issues can improve healthcare by accelerating drug discovery and development. Designed primarily as a reference, the first edition nevertheless saw widespread use as a textbook in graduate and undergraduate university courses dealing with the theories and associated algorithms, resources, and tools used in the analysis, prediction, and theoretical underpinnings of DNA, RNA, and proteins. This new edition contains not only thorough updates of the advances in structural bioinformatics since publication of the first edition, but also features eleven new chapters dealing with frontier areas of high scientific impact, including: sampling and search techniques; use of mass spectrometry; genome functional annotation; and much more. Offering detailed coverage for practitioners while remaining accessible to the novice, Structural Bioinformatics, Second Edition is a valuable resource and an excellent textbook for a range of readers in the bioinformatics and advanced biology fields. Praise for the previous edition: "This book is a gold mine of fundamental and practical information in an area not previously well represented in book form." --Biochemistry and Molecular Education "... destined to become a classic reference work for workers at all levels in structural bioinformatics...recommended with great enthusiasm for educators, researchers, and graduate students." --BAMBED "...a useful and timely summary of a rapidly expanding field." --Nature Structural Biology "...a terrific job in this timely creation of a compilation of articles that appropriately addresses this issue." --Briefings in Bioinformatics

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 2250

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

Foreword

Preface

Acknowledgments

Contributors

Section I DATA COLLECTION, ANALYSIS, AND VISUALIZATION

1 DEFINING BIOINFORMATICS AND STRUCTURAL BIOINFORMATICS

WHAT IS BIOINFORMATICS?

TECHNICAL CHALLENGES WITHIN STRUCTURAL BIOINFORMATICS

INTEGRATING STRUCTURAL DATA WITH OTHER DATA SOURCES

REFERENCES

2 FUNDAMENTALS OF PROTEIN STRUCTURE

THE IMPORTANCE OF PROTEIN STRUCTURE

THE PRIMARY STRUCTURE OF PROTEINS: THE AMINO ACID SEQUENCE

THE SECONDARY STRUCTURE OF PROTEINS: THE LOCAL THREE-DIMENSIONAL STRUCTURE

THE TERTIARY STRUCTURE OF PROTEINS: THE GLOBAL THREE-DIMENSIONAL STRUCTURE

THE QUATERNARY STRUCTURE OF PROTEINS: ASSOCIATIONS OF MULTIPLE POLYPEPTIDE CHAINS

CONCLUSION

MORE INFORMATION ON THE INTERNET

REFERENCES

3 FUNDAMENTALS OF DNA AND RNA STRUCTURE

INTRODUCTION

CHEMICAL STRUCTURE OF NUCLEIC ACIDS

BASE-PAIR GEOMETRY

CONFORMATION OF THE SUGAR PHOSPHATE BACKBONE

STRUCTURES OF NUCLEIC ACIDS

CONCLUSION

ACKNOWLEDGMENTS

REFERENCES

4 COMPUTATIONAL ASPECTS OF HIGH-THROUGHPUT CRYSTALLOGRAPHIC MACROMOLECULAR STRUCTURE DETERMINATION

INTRODUCTION

HIGH-THROUGHPUT STRUCTURE DETERMINATION

DATA ANALYSIS

HEAVY ATOM LOCATION AND COMPUTATION OF EXPERIMENTAL PHASES

DENSITY MODIFICATION

MOLECULAR REPLACEMENT

REFINEMENT

VALIDATION

CHALLENGES TO AUTOMATION

CONCLUSIONS

REFERENCES

5 MACROMOLECULAR STRUCTURE DETERMINATION BY NMR SPECTROSCOPY

INTRODUCTION TO PROTEIN STRUCTURE DETERMINATION BY NMR

PREPARATION OF PROTEIN SAMPLES FOR NMR

PROTOCOL FOR PROTEIN STRUCTURE DETERMINATION BY NMR

PROBA BILISTIC APPROACHES AND AUTOMATION

DATABASES FOR BIOMOLECULAR NMR

ACKNOWLEDGMENTS

REFERENCES

6 ELECTRON MICROSCOPY IN THE CONTEXT OF STRUCTURAL SYSTEMS BIOLOGY

INTRODUCTION

ELECTRON OPTICS AND IMAGE FORMATION

THREE-DIMENSIONAL RECONSTRUCTION

COMBINATION WITH OTHER APPROACHES: HYBRID METHODS

FUTURE DIRECTIONS

ACKNOWLEDGMENTS

REFERENCES

7 STUDY OF PROTEIN THREE-DIMENSIONAL STRUCTURE AND DYNAMICS USING PEPTIDE AMIDE HYDROGEN/ DEUTERIUM EXCHANGE MASS SPECTROMETRY (DXMS) AND CHEMICAL CROSS-LINKING WITH MASS SPECTROMETRY TO CONSTRAIN MOLECULAR MODELING

INTRODUCTION

OVERVIEW OF DXMS METHODOLOGY

EXAMPLES OF APPLICATIONS OF DXMS

DXMS ANALYSIS: CONCLUDING REMARKS

HYBRID BIOCHEMICAL/BIOINFORMATICS APPROACH TO LOWRESOLUTION STRUCTURE DETERMINATION USING CHEMICAL CROSS-LINKERS AND MASS SPECTROMETRY

RECENT APPLICATIONS

CONCLUSION WITH RESPECT TO CHEMICAL CROSS-LINKAGE METHODS

ACKNOWLEDGMENTS

REFERENCES

8 SEARCH AND SAMPLING IN STRUCTURAL BIOINFORMATICS

INTRODUCTION

SAMPLING STRUCTURAL SPACE

SEARCH METHODS

DATA ANALYSIS AND REDUCTION

CONCLUDING REMARKS

ACKNOWLEDGMENTS

REFERENCES (Including Remarks on Recommended Further Reading)

9 MOLECULAR VISUALIZATION

INTRODUCTION

THE PROCESS OF MOLECULAR VISUALIZATION

MOLECULAR MODELS

MOLECULAR VISUALIZATION PROGRAMS

REFERENCES

Section II DATA REPRESENTATION AND DATABASES

10 THE PDB FORMAT, mmCIF FORMATS, AND OTHER DATA FORMATS

INTRODUCTION

THE PDB FORMAT

mmCIF: A DICTIONARY-BASED APPROACH TO DATA DESCRIPTION

THE PDB EXCHANGE AND OTHER DATA DICTIONARIES

SUPPORTING OTHER FORMATS

SUPPORTING APPLICATION PROGRAM INTERFACES

CONCLUSION

ACKNOWLEDGMENTS

REFERENCES

11 THE WORLDWIDE PROTEIN DATA BANK

INTRODUCTION

DATA ACQUISITION AND PROCESSING

DATA ACCESS

RCSB PDB

PDBj

PDBe

BMRB

FUTURE

ACKNOWLEDGMENTS

REFERENCES

12 THE NUCLEIC ACID DATABASE

INTRODUCTION

DATA PROCESSING AND VALIDATION

THE DATABASE

DISTRIBUTION OF INFORMATION

APPLICATIONS OF THE NDB

ACKNOWLEDGMENT

REFERENCES

13 OTHER STRUCTURE-BASED DATABASES

INTRODUCTION

THE ADDED VALUE PHILOSOPHY

OTHER PRIMARY INFORMATION RESOURCES

SECONDARY RESOURCES

STRUCTURAL DATABASES OF THE FUTURE

REFERENCES

Section III DATA INTEGRITY AND COMPARATIVE FEATURES

14 STRUCTURAL QUALITY ASSURANCE

INTRODUCTION

STRUCTURES AS MODELS

AIMS

ERROR ESTIMATION AND PRECISION

ERROR ESTIMATES IN X-RAY CRYSTALLOGRAPHY

ERROR ESTIMATES IN NMR SPECTROSCOPY

ERRORS IN DEPOSITED STRUCTURES

STEREOCHEMICAL PARAMETERS

SOFTWARE FOR QUALITY CHECKS

PROCHECK

WHAT_CHECK

QUALITY INFORMATION ON THE WEB

PDBREPORT–WHAT_CHECK Results

CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

15 THE IMPACT OF LOCAL ACCURACY IN PROTEIN AND RNA STRUCTURES: VALIDATION AS AN ACTIVE TOOL

INTRODUCTION

METHODOLOGY OF ALL-ATOM CONTACT ANALYSIS

COMPLEMENTARY RELATIONSHIP WITH MORE TRADITIONAL CRITERIA

USING MOLPROBITY AND RELATED FACILITIES

RNA: VALIDATION, STRUCTURE IMPROVEMENT, AND CONFORMER STRINGS

USING LOCAL ACCURACY IN BIOINFORMATIC ANALYSES

RELEVANT WEB SITES

REFERENCES

16 STRUCTURE COMPARISON AND ALIGNMENT

INTRODUCTION

GENERAL APPROACH TO STRUCTURE COMPARISON AND ALIGNMENT

COMPARISON ALGORITHM AND OPTIMIZATION

HOW WELL ARE WE DOING?

SAMPLE RESULTS FROM STRUCTURE COMPARISON AND ALIGNMENT

MULTIPLE STRUCTURE ALIGNMENT

FLEXIBLE STRUCTURE ALIGNMENT

MAPPING PROTEIN FOLD SPACE

THE IMPACT OF STRUCTURAL GENOMICS

THE FUTURE

ACKNOWLEDGMENTS

REFERENCES

17 PROTEIN STRUCTURE EVOLUTION AND THE SCOP DATABASE

INTRODUCTION

THE EVOLUTION OF PROTEINS

THE EVOLUTION OF FOLD

THE EVOLUTION OF ENZYMATIC CATALYSIS

THE COMPARISON OF STRUCTURES

SCOP HIERARCHY

CLASSES

FOLDS

SUPERFAMILIES

FAMILIES

ORGANIZATION AND CAPABI LITIES OF THE SCOP RESOURCE

BROWSING THROUGH THE SCOP HIERARCHY

LINKING TO OTHER STRUCTURE AND SEQUENCE DATABASES

SCOP REFINEMENTS TO ACCOMMODATE STRUCTURAL GENOMICS

NEW FEATURES IN SCOP

INTEGRATION WITH OTHER DATABASES

RECLASSIFICATION

SCOP USAGE

SCOP FROM A USER’S PERSPECTIVE

REFERENCES

18 THE CATH DOMAIN STRUCTURE DATABASE

INTRODUCTION

HISTORICAL DEVELOPMENT

CURRENT METHODOLOGIES FOR IDENTIFYING STRUCTURAL SIMILARITIES AND EVOLUTIONARY RELATIONSHIPS IN CATH

CLASSIFYING CLOSE HOMOLOGUES (CHOPCLOSE)

IDENTIFICATION OF DOMAIN BOUNDARIES

METHODS TO DETECT SEQUENCE AND STRUCTURAL RELATIVES

STRUCTURE-BASED METHODS FOR IDENTIFYING STRUCTURAL HOMOLOGUES AND RELATED FOLDS (SSAP AND CATHEDRAL)

SSAP—SEQUENTIAL STRUCTURE ALIGNMENT PROGRAM

CATHEDRAL—CATHS EXISTING DOMAIN RECOGNITION ALGORITHM

METHODS FOR GENERATING MULTIPLE STRUCTURE ALIGNMENTS (CORA) AND PROTOCOLS FOR USING 3D TEMPLATES USED TO IDENTIFY DISTANT STRUCTURAL RELATIONSHIPS

SEQUENCE, STRUCTURAL, AND FUNCTIONAL VALIDATION OF HOMOLOGY

THE DICTIONARY OF HOMOLOGOUS SUPERFAMILIES (DHS)

THE GENE3D RESOURCE

THE CATH WEB SITE AND SERVER

THE CATHEDRAL SERVER

IS FOLD CLASSIFICATION A LEGITIMATE REPRESENTATION OF DOMAIN STRUCTURE SPACE?

POPULATION OF SUPERFAMILIES AND FAMILIES WITHIN FOLDS

FOLD CLASSIFICATION IN CATH

ACKNOWLEDGMENTS

REFERENCES

Section IV STRUCTURAL AND FUNCTIONAL ASSIGNMENT

19 SECONDARY STRUCTURE ASSIGNMENT

SECONDARY STRUCTURE CONCEPTS

ASSIGNMENT METHODS

SECONDARY STRUCTURE STATISTICS AND COMPARISON

APPLICATIONS OF SECONDARY STRUCTURE

CONCLUSION

ABBREVIATIONS

ACKNOWLEDGMENTS

REFERENCES

20 IDENTIFYING STRUCTURAL DOMAINS IN PROTEINS

INTRODUCTION

DEFINITIONS OF STRUCTURAL DOMAINS

ALGORITHMS FOR IDENTIFYING STRUCTURAL DOMAINS: INSIGHT INTO HISTORY AND METHODOLOGY

ALGORITHMS FOR IDENTIFYING STRUCTURAL DOMAINS: IN-DEPTH

DOMAIN ASSIGNMENTS: EVALUATING AUTOMATIC METHODS

DOMAIN PREDICTION BASED ON SEQUENCE INFORMATION

CONCLUSIONS AND PERSPECTIVES

WEB RESOURCES

REFERENCES

21 INFERRING PROTEIN FUNCTION FROM STRUCTURE

INTRODUCTION

WHAT INFORMATION CAN BE OBTAINED FROM THREE-DIMENSIONAL PROTEIN STRUCTURES?

INFERRING FUNCTION FROM STRUCTURE

STRUCTURAL GENOMICS: HIGH-THROUGHPUT FUNCTION PREDICTION

CONCLUSIONS

REFERENCES

22 STRUCTURAL ANNOTATION OF GENOMES

INTRODUCTION

AVAILABILITY OF COMPLETED GENOMES

METHODOLOGIES FOR IDENTIFYING STRUCTURAL PROTEIN DOMAINS IN GENOMES

HOW WELL ARE GENOMES COVERED BY STRUCTURAL DOMAIN ANNOTATION?

CAN WE DETERMINE ALL THE STRUCTURES PRESENT IN THE GENOMES?—STRUCTURAL ANNOTATION OF GENOMES AND STRUCTURAL GENOMICS

WHAT CAN STRUCTURAL GENOME ANNOTATION TELL US ABOUT EVOLUTION?

STRUCTURAL GENOME ANNOTATION RESOURCES

SUPERFAMILY

3D GENOMICS

SUMMARY

REFERENCES

23 EVOLUTION STUDIED USING PROTEIN STRUCTURE

STRUCTURES AS EVOLUTIONARY UNITS

PHYLOGENY BY PROTEIN DOMAIN CONTENT

THE LAST UNIVERSAL COMMON ANCESTOR (LUCA)

ANCIENT GEOCHEMICAL ENVIRONMENT REFLECTED BY THE MODERN STRUCTURE REPERTOIRE

THE EVOLUTIONARY HISTORY OF PROTEIN DOMAINS

FILLING IN FOLD SPACE: CURRENT LIMITATIONS

CONCLUSION

REFERENCES

Section V MACROMOLECULAR INTERACTIONS

24 ELECTROSTATIC INTERACTIONS

INTRODUCTION

OVERVIEW OF FUNCTIONAL ROLES OF ELECTROSTATICS

BRIEF HISTORY

THE NEED FOR MORE EFFICIENT AND SCALABLE ELECTROSTATICS METHODS

POISSON–BOLTZMANN THEORY

NUMERICAL SOLUTION OF THE POISSON–BOLTZMANN EQUATION

APPLICATIONS

ACKNOWLEDGMENTS

REFERENCES

25 PREDICTION OF PROTEIN–NUCLEIC ACID INTERACTIONS

INTRODUCTION

MOTIVATION

POTENTIAL FUNCTIONS FOR PROTEIN–NUCLEIC ACID INTERACTIONS

FLEXIBILITY IN PROTEIN/NUCLEIC ACID COMPLEXES

APPLICATIONS

FUTURE WORK AND CRITICAL CHALLENGES

REFERENCES

26 PREDICTION OF PROTEIN–PROTEIN INTERACTIONS FROM EVOLUTIONARY INFORMATION

INTRODUCTION

PREDICTION OF INTERACTING REGIONS

PREDICTION OF INTERACTION PARTNERS

FUTURE TRENDS

REFERENCES

27 DOCKING METHODS, LIGAND DESIGN, AND VALIDATING DATA SETS IN THE STRUCTURAL GENOMICS ERA

INTRODUCTION

DOCKING AND SCORING

DRUG DESIGN IN THE STRUCTURAL PROTEOMICS ERA

SUMMARY

REFERENCES

Section VI STRUCTURE PREDICTION

28 CASP AND OTHER COMMUNITY-WIDE ASSESSMENTS TO ADVANCE THE FIELD OF STRUCTURE PREDICTION

A MEASURE FOR SUCCESS

COMMUNITY BENCHMARK HISTORY AND FINDINGS

OVERALL PROGRESS

CASP7

WHERE DO WE GO FROM HERE?

ACKNOWLEDGMENT

WEB SITES

REFERENCES

29 PREDICTION OF PROTEIN STRUCTURE IN 1D: SECONDARY STRUCTURE, MEMBRANE REGIONS, AND SOLVENT ACCESSIBILITY

INTRODUCTION

METHODS

PROGRAMS AND PUBLIC SERVERS

PRACTICAL ASPECTS

EMERGING AND FUTURE DEVELOPMENTS

FURTHER READING

ACKNOWLEDGMENTS

REFERENCES

30 HOMOLOGY MODELING

INTRODUCTION

STEP 1—TEMPLATE RECOGNITION AND INITIAL ALIGNMENT

STEP 2—ALIGNMENT CORRECTION

STEP 3—BACKBONE GENERATION

STEP 4—LOOP MODELING

STEP 5—SIDE CHAIN MODELING

STEP 6—MODEL OPTIMIZATION

STEP 7—MODEL VALIDATION

STEP 8—ITERATION

ACKNOWLEDGMENTS

REFERENCES

31 FOLD RECOGNITION METHODS

INTRODUCTION

THEORETICAL BACKGROUND FOR FOLD RECOGNITION

PROTEINS AS SEEN BY A BIOLOGIST

PROTEINS AS SEEN BY A PHYSICIST

SUMMARY

REFERENCES

32 DE NOVO PROTEIN STRUCTURE PREDICTION: METHODS AND APPLICATION

INTRODUCTION

REDUCED COMPLEXITY MODELS

SCORING FUNCTIONS FOR REDUCED COMPLEXITY MODELS

ROSETTA DE NOVO STRUCTURE PREDICTION

HIGH-RESOLUTION STRUCTURE PREDICTION

CASP: EVALUATION OF STRUCTURE PREDICTIONS

BIOLOGICAL APPLICATIONS OF STRUCTURE PREDICTION

FUTURE DIRECTIONS

REFERENCES

33 RNA STRUCTURAL BIOINFORMATICS

INTRODUCTION

METHODS FOR PREDICTING SECONDARY STRUCTURES

THREE-DIMENSIONAL MODELING METHODOLOGY

CONCLUSION

WEB RESOURCES

SUGGESTED READINGS

REFERENCES

Section VII THERAPEUTIC DISCOVERY

34 STRUCTURAL BIOINFORMATICS IN DRUG DISCOVERY

HISTORIC DEVELOPMENT OF DRUG DISCOVERY

MODERN DRUG DISCOVERY

GENERATING PROTEIN STRUCTURES

DRUG TARGETS

LEAD IDENTIFICATION

LEAD OPTIMIZATION

STRUCTURAL BIOINFORMATICS DATABASES

TOWARD PERSONALIZED MEDICINE

CONCLUSION AND FUTURE DIRECTIONS

ACKNOWLEDGMENTS

REFERENCES

FURTHER READING

35 B-CELL EPITOPE PREDICTION

INTRODUCTION

THE PROBLEM OF B-CELL EPITOPE PREDICTION

ANTIBODY STRUCTURE AND FUNCTION

EXPERIMENTAL METHODS USED FOR B-CELL EPITOPE IDENTIFICATION

HISTORY OF ATTEMPTS AT B-CELL EPITOPE PREDICTION

BIOINFORMATICS METHODS FOR B-CELL EPITOPE PREDICTION

APPLICATIONS

CONCLUSION

ABBREVIATIONS

ACKNOWLEDGMENT

REFERENCES

Section VIII FUTURE CHALLENGES

36 METHODS TO CLASSIFY AND PREDICT THE STRUCTURE OF MEMBRANE PROTEINS

THE BIOLOGICAL MEMBRANE

THE FOLDING PROCESS OF MEMBRANE PROTEINS

WHY IS IT DIFFICULT TO SOLVE MEMBRANE PROTEIN 3D STRUCTURES?

STRUCTURAL GENOMICS AND MEMBRANE PROTEINS

COMPUTATIONAL METHODS FOR THE IDENTIFICATION OF MEMBRANE PROTEINS AND THE PREDICTION OF THEIR STRUCTURES

WEB-AVAILABLE DATA RESOURCES FOR MEMBRANE PROTEINS

CLASSIFICATION OF MEMBRANE PROTEINS

CONCLUSIONS

REFERENCES

37 PROTEIN MOTION: SIMULATION

INTRODUCTION

PROTEIN MOTION TIMESCALES, SIZE, AND SIMULATION

COARSE-GRAINED METHODS

CLASSICAL MOLECULAR MECHANICS

QM–MM METHODS

CONCLUSION

ACKNOWLEDGMENTS

REFERENCES

38 THE SIGNIFICANCE AND IMPACTS OF PROTEIN DISORDER AND CONFORMATIONAL VARIANTS

INTRODUCTION

PROTEIN DISORDER: UNDERSTANDING THE REALM OF “INVISIBLE”

PROTEIN CONFORMATIONAL VARIANTS AND ENSEMBLES

FUTURE DIRECTIONS

REFERENCES

39 PROTEIN DESIGNABILITY AND ENGINEERING

INTRODUCTION

PROTEIN STRUCTURAL UNIVERSE

DETERMINANTS OF PROTEIN DOMAIN EVOLUTION

MECHANISMS OF PROTEIN DOMAIN EVOLUTION

PROTEIN ALPHABET

LESSONS FOR ENGINEERING STABLE PROTEINS

PROTEIN ENGINEERING: US VERSUS NATURE

CHALLENGES IN PROTEIN DESIGN

CONCLUSIONS

REFERENCES

40 STRUCTURAL GENOMICS OF PROTEIN SUPERFAMILIES

INTRODUCTION

NYSGXRC

BIOMEDICAL THEME TARGETS: BACKGROUND AND MOTIVATION

BIOMEDICAL THEME TARGETS: SELECTION AND PROGRESS

BIOMEDICAL THEME TARGETS: SELECTED EXAMPLES

PROTEIN TYROSINE PHOSPHATASE (PTPs)

INSULINOMA-ASSOCIATED PROTEIN 2 (IA-2)

SMALL C-TERMINAL DOMAIN PHOSPHATASE 3

CHRONOPHIN

OTHER PHOSPHATASES

BIOMEDICAL THEME TARGETS: POTENTIAL IMPACT ON DRUG DISCOVERY

FRAGMENT CONDENSATION LEAD DISCOVERY STRATEGY APPLIED TO PTP1B

VIRTUAL SCREENING STRATEGY APPLIED TO PP2Cα

BIOMEDICAL THEME TARGETS: CONCLUSION

COMMUNITY-NOMINATED TARGETS: MOTIVATION

COMMUNITY-NOMINATED TARGETS: SELECTION AND PROGRESS

COMMUNITY-NOMINATED TARGETS: FUNCTIONAL CHARACTERIZATION

COMMUNITY-NOMINATED TARGETS: CONCLUSION

OVERALL CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

INDEX

Copyright © 2009 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

ISBN 978-0-470-18105-8

FOREWORD

The quality of the coverage and the timeliness of the first edition of Structural Bioinformatics led to its wide usage. In turn, the collection has been adopted by the community as a defining articulation for the utility of bioinformatics and structural biology applied to address a range of functional studies in 21st Century Biology. I personally found the book to be read by students, postdoctoral fellows and more senior researchers around the world. The success reflected the excitement in the growing impact of both domains and also helped to accelerate that impact. As pointed out in the Introduction, bioinformatics is now a mainstream activity within the biological sciences; similarly, the implementation of the structural genomics initiative worldwide as a logical and necessary follow up to the Human Genome Project represents the consensus that structure is indeed important for understanding the mechanisms of molecular function. The increased impact and rapid progress during the 6 years, since the first edition was published, demonstrate the extraordinary significance of the interface between structural biology and informatics. Such significant advances indicate the clear need for an update, which has now been provided through the editorial efforts of Gu and Bourne and the writing team of leading researchers who bring the reader to the edge of the frontier.

The second edition, simultaneously as a textbook and an expert monograph, contains a balanced set of contributions, which include updates and advances over the past 6 years and new innovative domains made possible by the sustained application of structural bioinformatics, which were undoubtedly catalyzed by the first edition. Just as was notable about the first edition, this new comprehensive collection fully captures the spirit of excitement at the “bleeding edge” of biodiscovery. The frontier between computing and biology itself reflects the decades of extraordinary progress and revolutionary advances in both domains. During the past 6 years, the data provided by the completion of the full human and numerous model genomes has accelerated the importance of computing for biology, and brought new funding opportunities and research training needs. At the same time, the deeper insight from complete genome sequencing has been the need to look beyond individual genes to systems and to look across multiple scales of biology, ultimately to establish an integrative view of biology based on experimentation and computation. With funding from within mainstream science programs and the increased recognition by the experimental community, the combined use of information technology and quantitative approaches are central to building an integrative view of biology. The superb collection of articles in this second edition speaks directly and powerfully for the role of structural bioinformatics in this effort for both the basic and the applied life sciences.

Additional academic training programs that include structural bioinformatics have been introduced consistently over the past 6 years. Yet, faculty in top flight research institutions who do research and teach in first class bioinformatics programs remain overwhelmed by the demands; indeed, the challenges of bioinformatics education has been a common theme at the major professional meetings in the field. Thus, textbooks of the highest quality and clarity are essential today, as we all struggle to determine the right curriculum and the right content to train what will be the first generation of students who are truly bioinformaticians. Today, every young aspiring biologist wants to learn about bioinformatics, as do those training in computer science and other quantitative sciences. Understanding the underlying assumptions and intricacies of any bioinformatics algorithm is necessary for proper usage and interpretation of the results obtained with the tools. To teach the large and growing numbers of young scientists who wish to utilize or even contribute to bioinformatics requires authoritative treatments that provide the basis to pull in a new generation of scientists. Such works must also use the best treatments possible to reach a much larger audience, including mature scientists who wish to retrain themselves, and need to set a standard for training everywhere in the world. This collection admirably meets those goals and is a must read for those entering the field and for all of us committed to understanding the interplay of structure and function. By assembling the best thinkers to address systematically all of the challenges at the next stage of the genome effort, Gu and Bourne have created a book that will serve to educate the next generation who will be the future young investigators who will create the tools required to interpret the ever advancing frontier of biology.

At the same time, while the rapid pace of research progress in structural bioinformatics has driven the need for a second edition, the prehistory of modern structural bioinformatics, as is well described in this book, has been retained to remind the readers of some fundamental challenges that should not be neglected and forgotten. This update is not an extensive replicate with minor tweaks of the first edition; instead, the role of historical context and origins of structural bioinformatics as they contribute to present advances are discussed, such as the biannual Critical Assessment of Structure Prediction (CASP) and the initial “exponentiation” of progress in biology resulting from the application of advanced computing technology to structural biology. For more historical content, I refer any reader to the forward for the first edition of Structural Bioinformatics or any of my status reports (over the past decade) on computational biology, readily accessible via the internet (or PubMed) in today’s world of “e-knowledge.”

What is most notable and important for the potential readership of this edition is that the fields of computational biology and bioinformatics have gone from uncertainty and neglect to the buzz words on everyone’s lips in less than 10 years. While newly created, contemporary bioinformatics and computational biology training programs are in the process of being incorporated into every biological science domain, and those working at the interface will have increased options even within core disciplines. The reason is obvious: we are already living in the future as biologists; we feel fully the impact of having completely sequenced genomes; we are a part of the transition to high throughput and high information content biological research, and to asking global or systemic questions as the norm, rather than using individual macromolecule-specific probes. Early in this century, the extraordinary, early, and even unanticipated successes of the genome project enabled computer search and modeling techniques to open up new vistas in biology.

The continuing availability of complete genomes coupled with high throughput experimental biological methods, structure determination, and annotated databases will definitely advance our current understanding of protein structures as they relate to biological function, processes, and evolution—a basic research curiosity that has captured central stage in the community. The challenge is to decode the rich information content implicit in genomes and apply the resulting knowledge in service to society. Obtaining a better understanding of biological processes will be achieved through the integration of a variety of methods including the use of structural information, and should be extended in the future to provide improved health care delivery. At the same time, we are only learning how best to exploit computer and information technology to understand biological mechanisms; the educational content of Structural Bioinformatics provides a perspective on our progress and educational content to enable the full potential of systematic computational analysis.

Those of us concerned with macromolecular structure, and protein science in particular, have long spoken the mantra: form follows function, a given function requires a specific structure, or, conversely, structure in turn can be seen to determine function. That is, if we know the structure, we can infer many aspects of biochemical and sometimes, even cellular function which can subsequently be experimentally tested. Indeed, given that we now “know” many gene sequences gained implicitly from sequenced genomes, such resources provide the basis for more refined algorithms that either leverage structural information or improve our understanding of protein structures to model them explicitly and more accurately. Subsequently, these improvements facilitate our ability to predict functionality and greatly reduce the search space for experimental efforts by providing a guided focus to test only the most likely functions. Furthermore, computational modeling of these molecules, for both static and dynamic processes, can provide a detailed description of biological processes at the atomic level, an alternative to traditional biological cartoons which have been the descriptive ways in which we biologists think.

The field of structural bioinformatics, to connect the abstract to the practical, continues to push boundaries beyond what was previously thought impossible. The basis for all structural bioinformatics, the central community database for structural biology, is the Protein Data Bank (PDB) and the macromolecule structures it contains. The growth of this resource has been accelerated by structural genomics initiatives, which continue to sustain and enhance the discovery of novel structures and folds paving the way for new opportunities of discovery and insight through structural bioinformatics. Some examples include a more accurate genome annotation and the basis for clarifying evolutionary related questions. The book addresses key points for cutting edge research such as these, beginning with definitions, and conveys the current scope of research and knowledge of protein structures.

Central to this wonderful collection and insightful articles are advances that have been made in building the infrastructure for an integrative approach to understanding biosystems through the power of understanding protein structures and the implications for the mechanics of function. A survey of current resources is provided to highlight the foundation where future developments are needed to integrate experimental data better and provide the basis for abstraction and generalization. Overall, the individual chapters outline the suite of major basic life science questions such as the status of efforts to predict protein structure and how proteins carry out cellular functions, and also the applied life science questions such as how structural bioinformatics can improve health care through accelerating drug discovery. Dictated by the process of uncovering the mechanisms through which macromolecules act, this journey of discovery into the regulation of life’s processes will keep biologists entertained for centuries to come. The second edition book is a great guidebook, even more informative than the earlier collection, and represents the basis for this journey. I highly recommend it to all members of our community.

John C. WooleyAssociate Vice Chancellor, ResearchUniversity of California, San DiegoLa Jolla, CA

PREFACE

Six years have elapsed since the first edition of this book was published. The field of structural bioinformatics has sustained a high level of excitement in that time, leading to innovative developments and considerable progress throughout the topics covered in the first edition and in the extension to many new domains. Through the efforts of the authors of this new edition, we have tried to capture these developments and to provide an accurate, detailed view of the current field. One way of picturing the advances or defining the “structural” change is relatively straightforward; namely, the number of experimental macromolecular structures has doubled since the first edition of this book was published. The Protein Structure Initiative has also led to an increase in the number of novel structures and folds. Overall, the continued growth in experimental structures has created an even richer data source for much of the work described herein. But numbers do not tell the whole story. The complexity of structures, the methods used, the ways structure is represented, our ability to model structures, our understanding of proteomes and their structural coverage, and so on, have also changed.

Describing the advances in “bioinformatics” per se is more difficult. Change in this case reflects both scientific advances and an increase in recognition within the biological sciences for the importance of computational methods. Due in part to the explosion in high-throughput experimental methods, bioinformatics is certainly more mainstream than it was 6 years ago and most experimental (i.e., non-computational) life scientists would acknowledge the role bioinformatics now plays in furthering our understanding of living systems. Some years from now, whether or not bioinformatics will exist as a separate entity, rather than as a core effort in every biological science department, is a subject for debate. What is important here is that there is an active effort to apply computational methods to a rapidly growing corpus of macromolecular structure data. Our primary goal is to provide a comprehensive description of what this field has accomplished to date and to make the reader aware of what we have gained and could gain in the future toward our understanding of living systems through the study of macromolecular structure and the continued, rigorous application of bioinformatics. As such, this edition should provide a fully current, useful reference to those already in the field, and a suitable text for those educating others. The first edition already encouraged new scholars to enter the field and we believe the case for engaging in structural bioinformatics is stronger than ever.

To meet this goal, the second edition includes not only updated chapters, but also new chapters covering mass spectrometry, genome annotation, immunology, protein dynamics and disorder, membrane proteins, protein design capabilities, and evolutionary biology as they relate to macromolecular structure.

Macromolecular structure is often underappreciated and bypassed in practice during the current era of high-throughput biology, especially since researchers can jump directly from genomic sequence to phenotype and conduct biochemical studies on large-scale protein– protein interactions that are often involved in pathways associated with disease states. While a great deal can be learned from such studies, ultimately the devil is in the details which do not arise from traditional functional studies; the field of molecular biophysics, now termed structural biology, came into existence to obtain and use structures to provide those details. We believe structure will play an ever increasingly important role as genome studies seek to explore deeper into the mechanisms of life. As such, computational approaches that analyze structure are essential and we hope this book will be there to guide you.

We begin by describing the scope of this book and the history of the field (Chapter 1). The remainder of the introductory Section I is devoted to the understanding of the data itself, namely protein, DNA and RNA structure, respectively (Chapters 2 and 3). Understanding the nuances (scope, accuracy, completeness, etc.) of structural data is prerequisite to any effective use of that data. Effective data use in turn requires an understanding of the experiments or experimental method that produce the data. The most popular methods for deriving macromolecular structure data are, in order, X-ray crystallography (Chapter 4), NMR spectroscopy (Chapter 5), and electron microscopy (Chapter 6). Constructing structural models of molecules can also be guided with hydrogen–deuterium and cross-linking experiments coupled with mass spectrometry (Chapter 7). The raw data from these methods are most often a set of Cartesian coordinates representing the positions of the atoms in these structures, which are well suited for analysis by computer, but alternative representations of this information rich content are sometimes needed to conduct wide-scale bioinformatics analysis (Chapter 8). That is, structural biology and structural bioinformatics are inherently visual sciences—the tabular output of atomic coordinates can be useful as input for computation, but not for human insight. The visualization of structure has evolved along with the science and many useful tools, mostly free, are available (Chapter 9).

In the early days of structural biology (up to the late 1970s), those in the field could name all the structures that had been solved, some of which had Nobel prizes attached to them. As the field grew this was no longer possible, and databases of structure data began to appear. Consistent use of structural data contained within these databases (and indeed the construction of the databases themselves) requires consistent data representation and Section II is devoted to this topic. Chapter 10 introduces the common data representations used by today’s software. The field is very fortunate to have scientists who recognize the importance of having a single source of primary data, the worldwide PDB (wwPDB—Chapter 11), from which a variety of secondary resources are derived. Examples of such resources are provided in Chapters 12 and 13.

As the number of structures has increased, much can be learnt from comparative analysis (Section III), where similarities and differences provide new insights. Chapters 14 and 15 describe structure validation, which is important in understanding the accuracy of the data you are dealing with before a 3D comparison and alignment of structures can be made (Chapter 16). When structure comparisons are made and similarities found, reductionism can be applied to make sense of the vast amount of data. Such reductionism leads to classification in various ways, such as by fold, domain, family, and super family (Chapters 17 and 18).

The more we know from comparing structures the more we can learn about structure and functional assignment (Section IV). Secondary structure assignment can now be made consistently and reliably for the majority of structures (Chapter 19). Proteins exist as one or more domains or compact structural and functional units. Hence, automated assignment of domains is important (Chapter 20). Through the structural genomics projects and the NIH Protein Structure Initiative, structure determination is moving from a functional to a genomic initiative. That is, structures were traditionally determined in an effort to elucidate further details about a known function, and these structural efforts were established based on very extensive prior biological, biochemical and often genetic research, and were done in parallel with continuing biological research on functional properties. In contrast, high-resolution structures with no elucidated or known function are being determined at an accelerated pace, thus making functional assignment critical (Chapter 21). The use of structural information to identify distantly related proteins also serves in annotating genomes (Chapter 22) and clarifying evolutionary relationships (Chapter 23).

Proteins do not act in isolation, that is, most proteins do not function by themselves but act as the result of complex protein–protein, protein–ligand and protein–solvent interactions and are often part of larger macromolecular assemblies. Section V describes these interactions beginning with an introduction to electrostatic forces that have a fundamental impact on recognition between molecules (Chapter 24). The majority of these interactions are not captured in the experimental structure of a complex, but as an apo form of the structure with a signature that can be teased out to predict that interaction. Understanding these signatures when found in protein–DNA and protein–RNA interactions (Chapter 25) and in protein–protein interactions (Chapter 26) aids, for example, in the identification of new transcription sites and reconstruction of protein signaling networks. After the sites of interactions are identified, docking of the molecules is simplified, which is important in drug design (Chapter 27).

While the number of structures is increasing rapidly, the number of protein sequences is increasing much more rapidly; thus, the idea of predicting protein structure from its sequence remains an “obsessive” goal (Section VI). Spurred by an unusual biannual competition, referred to as CASP—the Critical Assessment of Protein Structure Prediction (Chapter 28), progress is being made in subcategories of structure prediction efforts within CASP and the field in general. Structure prediction categories include homology modeling (Chapter 30), fold recognition (Chapter 31) and ab initio structure prediction (Chapter 32). Other forms of prediction include secondary structure and membrane components for proteins (Chapter 29). Advances in understanding and predicting RNA structures have also been made and are discussed in Chapter 33.

Structural bioinformatics is playing an increasingly important role in the development of new pharmaceuticals (Section VII). The identification of drug targets, understanding the action of drug binding, and the design of promising leads all involve structural bioinformatics (Chapter 34). In addition to the development of small molecule and peptide-based drugs, contributions are also being made in identifying antigen recognition sites that aid in antibody-based therapeutics (Chapter 35).

Finally, Section VIII identifies challenges at the frontiers of structural bioinformatics. Membrane associated proteins, whose structures are difficult to characterize in vitro and thus are underrepresented experimentally, are one example (Chapter 36). Proteins are not static under physiological conditions, yet understanding the dynamics (Chapter 37) and the impact of disorder and conformational variants (Chapter 38), while important to protein function, are all still poorly understood. As our understanding of protein structure improves, so do our design rules and capacity for engineering new proteins for their functions that improve upon nature or provide the potential for novel processes (Chapter 39). The best way to push back these frontiers is with more structures and enhanced generalizations about their roles; structure genomics is doing just that (Chapter 40) and is thus a fitting place to end our tour of structural bioinformatics.

The words that follow are written by many of the leaders in the field and we thank them for their time and energy in sharing what motivates them to unravel the mysteries of nature, which are so beautifully displayed before us in an ever increasing number of macromolecular structures.

Jenny GuPhilip E. Bourne

Color files of all figures from this book are available for download from the following web address: ftp://ftp.wiley.com/public/sci_tech_med/structural_bioinformatics

ACKNOWLEDGMENTS

“Science does not know its debt to imagination.”

Ralph Waldo Emerson

We are grateful to the University of Texas Medical Branch, the University of unster, and the University of California at San Diego for institutional supportin our research and educational endeavors, including this book. Likewise, we thank the Jeanne Kempner Foundation and the US funding agencies, the National Science Foundation, and the National Institutes of Health for their continual support in the advancement of science. This book could not have been completed without the help of all the contributors who have dedicated their expertise, time, and efforts to this extensive update and indispensable resource for the community—we are greatly indebted to them. Additionally, we would like to thank Wolfgang Bluhm, Kristine Briedis, Naomi Cotton, Lynn Fink, Apostol Gramada, Maria Kontoyianni, Kannan Natarajan, Julia Ponomarenko, Peter Rose, Wayne Townsend-Merino, Ruben Valas, Stella Veretnik, Lei Xie, Song Yang, and Zhanyang Zhu for their assistance and support in reviewing the materials for this book. The successful publication of this edition could not have been achieved without the help and patience of Andrea Baier, Tiffany Williams, and Thomas Moore at Wiley-Blackwell.

On a more personal note, JG would like to thank her family, friends, colleagues and mentors for their continued support in taking on new challenging endeavors. PEB would like to sincerely thank his family, Roma, Melanie, and Scott, for their continued understanding of the role science plays in his life.

CONTRIBUTORS

Paul D. Adams, Physical Biosciences Division, Lawrence Berkeley Laboratory, Berkeley, CA and Department of Bioengineering, University of California, Berkeley, CA

Steven C. Almo, Albert Einstein College of Medicine, Bronx, NY

Russ B. Altman, Departments of Bioengineering & Genetics, Stanford University, Stanford, CA

Claus A. Andersen, Siena Biotech Spa., Siena, Italy

Arash Bahrami, Graduate Program in Biophysics and National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, Madison, WI

Nathan A. Baker, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology, Washington University, St. Louis, MO

Gail J. Bartlett, School of Chemistry, University of Bristol, Cantock’s Close, Clifton, Bristol, UK

Helen M. Berman, RCSB Protein Data Bank, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ

Jeffrey B. Bonanno, Albert Einstein College of Medicine, Bronx, NY

Richard Bonneau, Department of Biology, New York University, New York, NY; and Department of Computer Science, Courant Institute, New York University, New York, NY

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!