129,99 €
This handbook and ready reference presents a combination of statistical, information-theoretic, and data analysis methods to meet the challenge of designing empirical models involving molecular descriptors within bioinformatics. The topics range from investigating information processing in chemical and biological networks to studying statistical and information-theoretic techniques for analyzing chemical structures to employing data analysis and machine learning techniques for QSAR/QSPR.
The high-profile international author and editor team ensures excellent coverage of the topic, making this a must-have for everyone working in chemoinformatics and structure-oriented drug design.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 775
Veröffentlichungsjahr: 2012
Contents
Cover
Titles of the series “Quantitative and Network Biology”
Volume 1
Related Titles
Title Page
Copyright
Preface
List of Contributors
Chapter 1: Current Modeling Methods Used in QSAR/QSPR
1.1 Introduction
1.2 Modeling Methods
1.3 Software for QSAR Development
1.4 Conclusion
References
Chapter 2: Developing Best Practices for Descriptor-Based Property Prediction: Appropriate Matching of Datasets, Descriptors, Methods, and Expectations
2.1 Introduction
2.2 Leveraging Experimental Data and Understanding their Limitations
2.3 Descriptors: The Lexicon of QSARs
2.4 Machine Learning Methods: The Grammar of QSARs
2.5 Defining Modeling Strategies: Putting It All Together
2.6 Conclusions
References
Chapter 3: Mold2 Molecular Descriptors for QSAR
3.1 Background
3.2 Mold2 Molecular Descriptors
3.3 QSAR Using Mold2 Descriptors
3.4 Conclusion Remarks
References
Chapter 4: Multivariate Analysis of Molecular Descriptors
4.1 Introduction
4.2 2D Matrix-Based Descriptors
4.3 Graph-Theoretical Matrices
4.4 Multivariate Similarity Analysis of Chemical Spaces
4.5 Analysis of Chemical Information of Descriptors from Graph-Theoretical Matrices
4.6 Conclusions
References
Chapter 5: Partial-Order Ranking and Linear Modeling: Their Use in Predictive QSAR/QSPR Studies
5.1 Introduction
5.2 Linear QSAR Methodology, ERM, RM and GA
5.3 Principles of Ranking Methods
5.4 Selection of the Molecular Descriptors for Ranking
5.5 QSAR Based on Hasse Diagrams
5.6 Discussion
5.7 Conclusions
References
Chapter 6: Graph-Theoretical Descriptors for Branched Polymers
6.1 Introduction
6.2 Algebraic Graph Theory
6.3 Ideal Chain Models
6.4 Graph-Theoretical Approach to Chain Dynamics and Statistics
6.5 Applications
6.6 Final Remarks
References
Chapter 7: Structural-Similarity-Based Approaches for the Development of Clustering and QSPR/QSAR Models in Chemical Databases
7.1 Chemical Structural Similarity
7.2 Clustering Models Based on Structural Similarity
7.3 QSPR/QSAR Models Based on Structural Similarity
References
Chapter 8: Statistical Methods for Predicting Compound Recovery Rates for Ligand-Based Virtual Screening and Assessing the Probability of Activity
8.1 Introduction
8.2 Theory
8.3 Alternative Approaches to the Prediction of Compound Recall
8.4 Conclusions
References
Chapter 9: Molecular Descriptors and the Electronic Structure
9.1 Introduction
9.2 The Structure of Molecules
9.3 The Electronic Structure
9.4 Dividing Molecules in Atoms and Bonds
9.5 Structure and Dynamics
9.6 Structure and Properties
9.7 Modeling of Physicochemical Properties of the Isomers of Hexane
9.8 Modeling of the Proton Affinity
9.9 Molecular Surface Properties
9.10 Conclusions
References
Chapter 10: New Types of Descriptors and Models in QSAR/QSPR
10.1 Introduction
10.2 Local Properties
10.3 Descriptors Derived from Local Properties
10.4 MEP as Descriptor for Hydrogen-Bonding Strengths
10.5 ParaSurf (Politzer–Murray) Descriptors
10.6 4D: Conformational-Ensemble-based Descriptors
10.7 Proper Validation/Generation of QSA(P)R Models
10.8 Conclusions
Acknowledgments
References
Chapter 11: Consensus Models of Activity Landscapes
11.1 Introduction
11.2 Characterization of the Activity Landscape
11.3 Consensus Models of Activity Landscape
11.4 Conclusions and Future Perspectives
Acknowledgments
References
Chapter 12: Reverse Engineering Chemical Reaction Networks from Time Series Data
12.1 Introduction
12.2 Problem Definition
12.3 Reconstruction of Elementary Reaction Networks from Data by Network Search
12.4 Formulation of the Objective Function for Network Search
12.5 Differential Evolution for Searching the Space of Reaction Networks
12.6 Network Identification Case Studies
12.7 Conclusions
Acknowledgment
References
Chapter 13: Reduction of Dimensionality, Order, and Classification in Spaces of Theoretical Descriptions of Molecules: An Approach Based on Metrics, Pattern Recognition Techniques, and Graph Theoretic Considerations
13.1 Introduction
13.2 Theory
13.3 Methods and Computational Strategy
13.4 Results and Discussion
13.5 Conclusions
References
Chapter 14: The Analysis of Organic Reaction Pathways by Brownian Processing
14.1 Introduction
14.2 Electronic Messages, Information, and Energy
14.3 Molecular Messages, Conversions, and State Space Representations
14.4 Closing
Acknowledgments
References
Chapter 15: Generation of Chemical Transformations: Reaction Pathways Prediction and Synthesis Design
15.1 Introduction
15.2 The Graph Transformation Rules for Generation of Chemical Reactions
15.3 Combinatorial Complexity Problem: Strategies for the Directed Reaction Generation
15.4 Conclusion
References
Index
Titles of the series “Quantitative and Network Biology”
Volume 1
Dehmer, M., Emmert-Streib, F., Graber, A., Salvador, A. (eds.)Applied Statistics for Network BiologyMethods in Systems Biology 2011 ISBN: 978-3-527-32750-8
Related Titles
Wang, B.Drug Design of Zinc-Enzyme InhibitorsFunctional, Structural, and Disease Applications 2009 ISBN: 978-0-470-27500-9
Todeschini, R., Consonni, V.Molecular Descriptors for ChemoinformaticsVolume I: Alphabetical Listing / Volume II: Appendices, References 2009 ISBN: 978-3-527-31852-0
Hinchliffe, A.Molecular Modelling for Beginners 2009 ISBN: 978-0-470-51314-9
Schneider, G., Baringhaus, K.-H.Molecular DesignConcepts and Applications 2008 ISBN: 978-3-527-31432-4
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty can be created or extended by sales representatives or written sales materials. The Advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Card No.: applied for
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.
© 2012 Wiley-VCH Verlag & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany
Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical, andMedical business with Blackwell Publishing.
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Print ISBN: 978-3-527-32434-7
ePDF ISBN: 978-3-527-64502-2
oBook ISBN: 978-3-527-64512-1
ePub ISBN: 978-3-527-64501-5
mobi ISBN: 978-3-527-64503-9
List of Contributors
Dimitris K. Agrafiotis Johnson & Johnson Pharmaceutical Research & Development, LLC Welsh & McKean Roads Spring House, PA 19477 USA
Jürgen Bajorath Rheinische Friedrich-Wilhelms-Universität B-IT (Bonn-Aachen International Center for Information Technology) Department of Life Science Informatics Dahlmannstr. 2 53113 Bonn Germany
Curt M. Breneman Rensselaer Polytechnic Institute Department of Chemistry and Chemical Biology 110 8th Street Troy, NY 12180 USA
Eduardo A. Castro Universidad de Buenos Aires Facultad de Farmacia y Bioquímica PRALIB (UBA-CONICET) Junín 956 C1113AAD Buenos Aires Argentina
Yiyu Cheng Zhejiang University College of Pharmaceutical Sciences 388 Yuhangtang Road Hangzhou, Zhejiang 310058 China
Timothy Clark Friedrich-Alexander-Universität Erlangen – Nürnberg Computer-Chemie-Centrum Nägelsbachstrasse 25 91052 Erlangen Germany
and
University of Portsmouth Center for Molecular Design Mercantile House Portsmouth PO1 2EG UK
Viviana Consonni University of Milano – Bicocca Milano Chemometrics & QSAR Research Group P.za della Scienza 1 20126 Milano Italy
Hong Fang ICF International at FDA's NationalCenter for Toxicological Research 3900 NCTR Road Jefferson, AR 72079 USA
Grzegorz Fic Rzeszow University of Technology Faculty of Chemistry Department of Physical Chemistry and Computer Chemistry Al. Powstancow Warszawy 6 35-959 Rzeszow Poland
Gonzalo Cerruela García University of Córdoba Department of Computing and Numerical Analysis Campus de Rabanales Albert Einstein Building 14071 Córdoba Spain
Weigong Ge U.S. Food and Drug Administration National Center for Toxicological Research Center for Bioinformatics Division of Systems Biology 3900 NCTR Road Jefferson, AR 72079 USA
Miguel Ángel Gómez-Nieto University of Córdoba Department of Computing and Numerical Analysis Campus de Rabanales Albert Einstein Building 14071 Córdoba Spain
Daniel J. Graham Department of Chemistry Loyola University Chicago 6525 North Sheridan Road Chicago, IL 60631 USA
Huixiao Hong Center for Bioinformatics Division of Systems Biology National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Building 5, Room 5C-109A Jefferson, AR 72079 USA
Bögel Horst Martin-Luther-University Department of Chemistry Kurt-Mothes-Str. 2 06122 Halle Germany
Tao-Wei Huang Rensselaer Polytechnic Institute Department of Chemistry and Chemical Biology 110 8th Street Troy, NY 12180 USA
Christian Kramer Novartis Pharma AG Novartis Institutes for BioMedical Research Forum 1 Novartis Campus 4056 Basel Switzerland
Michael Krein Rensselaer Polytechnic Institute Department of Chemistry and Chemical Biology 110 8th Street Troy, NY 12180 USA
Fabian López-Vallejo Computational Chemistry Torrey Pines Institute for Molecular Studies 11350 SW Village Parkway Port St. Lucie, FL 34987 USA
Irene Luque Ruiz University of Córdoba Department of Computing and Numerical Analysis Campus de Rabanales Albert Einstein Building 14071 Córdoba Spain
George Maroulis University of Patras Department of Chemistry 26500 Patras Greece
José L. Medina-Franco Computational Chemistry Torrey Pines Institute for Molecular Studies 11350 SW Village Parkway Port St. Lucie, FL 34987 USA
Andrew G. Mercader Instituto de InvestigacionesFisicoquímicas Teóricas y Aplicadas(INIFTA, UNLP, CCT La Plata-CONICET) Diag. 113 y 64, Sucursal 4, C.C. 16 1900 La Plata Argentina
and
Universidad de Buenos Aires Facultad de Farmacia y Bioquímica PRALIB (UBA-CONICET) Junín 956 C1113AAD Buenos Aires Argentina
Lisa Morkowchuk Rensselaer Polytechnic Institute Department of Chemistry and Chemical Biology 110 8th Street Troy, NY 12180 USA
Koh-Hei Nitta Kanazawa University Institute of Science and Engineering Division of Natural System Kanazawa 920-1192 Japan
Grayna Nowak Rzeszow University of Technology Faculty of Chemistry Department of Physical Chemistry and Computer Chemistry Al. Powstancow Warszawy 6 35-959 Rzeszow Poland
Roger Perkins U.S. Food and Drug Administration National Center for Toxicological Research Center for Bioinformatics Division of Systems Biology 3900 NCTR Road Jefferson, AR 72079 USA
Feng Qian ICF International at FDA's National Center for Toxicological Research 3900 NCTR Road Jefferson, AR 72079 USA
Dominic P. Searson Newcastle University School of Chemical Engineering and Advanced Materials Newcastle upon Tyne NE1 7RU UK
Leming Shi U.S. Food and Drug Administration National Center for Toxicological Research Center for Bioinformatics Division of Systems Biology 3900 NCTR Road Jefferson, AR 72079 USA
Svetoslav Slavov U.S. Food and Drug Administration National Center for Toxicological Research Center for Bioinformatics Division of Systems Biology 3900 NCTR Road Jefferson, AR 72079 USA
Zhenqiang Su ICF International at FDA's NationalCenter for Toxicological Research 3900 NCTR Road Jefferson, AR 72079 USA
Roberto Todeschini University of Milano – Bicocca Milano Chemometrics & QSAR Research Group P.za della Scienza 1 20126 Milano Italy
Weida Tong U.S. Food and Drug Administration National Center for Toxicological Research Center for Bioinformatics Division of Systems Biology 3900 NCTR Road Jefferson, AR 72079 USA
Martin Vogt Rheinische Friedrich-Wilhelms-Universität B-IT (Bonn-Aachen International Center for Information Technology) Department of Life Science Informatics Dahlmannstr. 2 53113 Bonn Germany
Yap Chun Wei National University of Singapore Faculty of Science Department of Pharmacy 18 Science Drive 4 Singapore 117543 Singapore
Mark J. Willis Newcastle University School of Chemical Engineering and Advanced Materials Newcastle upon Tyne NE1 7RU UK
Allen Wright Newcastle University School of Chemical Engineering and Advanced Materials Newcastle upon Tyne NE1 7RU UK
Liew Chin Yee National University of Singapore Faculty of Science Department of Pharmacy 18 Science Drive 4 Singapore 117543 Singapore
Austin B. Yongye Computational Chemistry Torrey Pines Institute for Molecular Studies 11350 SW Village Parkway Port St. Lucie, FL 34987 USA
Preface
Molecular descriptors have been applied extensively in, for example, bioinformatics, network biology structure-oriented drug design, medicinal chemistry, chemometrics, chemical graph theory, and mathematical chemistry. Also, their positive impact in quantitative structure-activity relationship/quantitative structure-property relationship (QSAR/QSPR) has been demonstrated and important subgroups of descriptors such as topological indices have been explored. The book Statistical Modeling of Molecular Descriptors in QSAR/QSPR presents theoretical and practical results toward the statistical analysis and modeling of molecular descriptors. An intriguing and important field of activity for applying the results discussed in this book is QSAR and QSPR. Particularly the contributors put the emphasis on employing statistical methods for modeling data generated by using molecular descriptors. In this sense, the major goal of the book is to advocate and promote a combination of statistical, information-theoretic, and data analysis techniques to meet the challenge of designing empirical models by using molecular descriptors. Importantly, some of these contributions demonstrate the ability of molecular descriptors for predicting physicochemical or even toxic properties of chemicals successfully. Also, mathematical properties of molecular and topological descriptors are investigated.
We would like to sketch the idea of choosing the book cover in brief. Note that it has been inspired by a short NASA report from April 1995 tries to demonstrate the complexity of QSAR/QSPR in a multivariate setting. The authors of this report, D.A. Noever, R.J. Cronise, and R.A. Relwani, exposed spiders to substances with different toxicity and claimed that the changes in the spider webs reflect the degree of toxicity. For caffeine – the molecule shown on the book cover – the spiders produced only unstructured webs instead of rather symmetrical, radial webs as shown in the background of the cover.
From a statistical point of view, one regrets that no estimations of the reproducibility are given in the report and obviously no further literature exists dealing with this subject; although the original report has been cited frequently. From a point of view of QSAR one may doubt that the toxic effect on spiders can be easily translated to explain toxic effect on other animals or even humans. Furthermore, the effect is not really surprising considering well-known effects of drugs and ethanol when it comes to humans. When speculating, one may be seduced to look for relationships between the networks describing chemical structures and the networks of distorted spider webs.
A different approach is the crucial idea on which the book and its contributions is based: Starting from a molecular structure, a set of descriptors is calculated, for example, information-theoretic indices by using Shannon's entropy as indicated by the cover figure. Hence, a set of chemical structures can be thereby represented by a matrix where each row corresponds to a structure. Typically, multivariate data analysis methods can be applied to such data to generate empirical models that relate a property of substances to the molecular descriptors derived from the chemical structures. Essential for such empirical models is a careful and cautious evaluation of the performance – otherwise one might quickly run into speculation and circular reasoning. In this context, we hope that the book may help to avoid this and also might be stimulating for understanding the mentioned problems more deeply.
Exemplarily, the topics we are going to tackle in this book range from modeling molecular descriptors, studying statistical and information-theoretic techniques, multivariate data analysis, and machine learning techniques for QSAR and QSPR. The book is intended for researchers, graduate, and advanced undergraduate students in the interdisciplinary fields such as biostatistics, bioinformatics, chemistry, chemometrics, mathematical chemistry, molecular medicine, medical informatics, network biology, and systems biology. Each chapter is comprehensively presented, accessible not only to researchers from this field but also to advanced undergraduate or graduate students.
Many colleagues, whether consciously or unconsciously, have provided us with input, help and support before and during the preparation of the present book. In particular, we would like to thank Maria and Gheorghe Duca, Frank Emmert-Streib, Boris Furtula, Ivan Gutman, Armin Graber, Martin Grabner, D. D. Lozovanu, Alexei Levitchi, Alexander Mehler, Abbe Mowshowitz, Arcady Mushegian, Andrei Perjan, Ricardo de Matos Simoes, Fred Sobik, Dongxiao Zhu and apologize to all who have not been named mistakenly. Matthias Dehmer thanks his wife Jana. Also, we would like to thank our editors Andreas Sendtko and Gregor Cicchetti from Wiley-VCH who have been always available and helpful and we are grateful to Frank Emmert-Streib for fruitful discussions. Last but not least, Matthias Dehmer and Kurt Varmuza thank the Austrian Science Funds for supporting this work (project P22029-N13).
Finally, we hope this book helps to spread out the enthusiasm and joy we have for this field and to inspire people regarding their own practical or theoretical research problems.
Hall/Tyrol, Vienna and Richmond, January 2012Matthias DehmerKurt VarmuzaDanail Bonchev
Chapter 1
Current Modeling Methods Used in QSAR/QSPR
Liew Chin Yee and Yap Chun Wei
1.1 Introduction
A drug company has to ensure the quality, safety, and efficacy of a marketed drug by subjecting the drug to a variety of tests [1]. Therefore, drug development is a time-consuming and expensive process. From the initial stage of target discovery, development often takes an average of 12 years [2] and was estimated to cost USD868 million per marketed drug [3]. This high cost and lengthy process is due to the high risk of drug development failure. It was estimated that only 11% of the drugs that completed developmental stage were approved by the US or European regulators [4]. In year 2000, it was found that 10% of attrition during drug development was contributed by poor pharmacokinetic and bioavailability, while in the clinical stage, 30% of attrition was due to lack of efficacy and another 30% was caused by toxicity or clinical safety issues [4]. Thus, it will be useful to predict these failures prior to the clinical stage in order to reduce drug development costs. It was claimed that savings of USD100 million in development costs per drug could be attained with 10% prediction improvement [5]. Therefore, various methods, such as in vitro, in vivo, or in silico methods, are being used early in the drug development stage to filter out potential failures. An example of an in silico method is quantitative structure–activity relationship (QSAR) models, which can be used to understand drug action, design new compounds, and screen chemical libraries [6–9]. Recently, the European Chemicals Legislation, Registration, Evaluation and Authorisation of Chemicals (REACH) suggested the use of in silico methods as reliable toxicological risk assessment [10, 11].
QSARs, or quantitative structure–property relationships (QSPRs), are mathematical models that attempt to relate the structure-derived features of a compound to its biological or physicochemical activity. Similarly, quantitative structure–toxicity relationship (QSTR) or quantitative structure–pharmacokinetic relationship (QSPkR) is used when the modeling applies on toxicological or pharmacokinetic systems. QSAR (also QSPR, QSTR, and QSPkR) works on the assumption that structurally similar compounds have similar activities. Therefore, these methods have predictive and diagnostic abilities. They can be used to predict the biological activity (e.g., IC50) or class (e.g., inhibitor versus noninhibitors) of compounds before the actual biological testing. They can also be used in the analysis of structural characteristics that can give rise to the properties of interest.
As illustrated in Figure 1.1, developing QSAR models starts with the collection of data for the property of interest while taking into consideration the quality of the data. It is necessary to exclude low-quality data as they will lower the quality of the model. Following that, representation of the collected molecules is done through the use of features, namely molecular descriptors, which describes important information of the molecules. There are many types of molecular descriptors but not all will be useful for a particular modeling task. Thus, uninformative or redundant molecular descriptors should be removed before the modeling process. Subsequently, for tuning and validation of the QSAR model, the full data set is divided into a training set and a testing set prior to learning.
Figure 1.1 General workflow of developing a QSAR model.
During the learning process, various modeling methods like multiple linear regression, logistic regression, and machine learning methods are used to build models that describe the empirical relationship between the structure and property of interest. The optimal model is obtained by searching for the optimal modeling parameters and feature subset simultaneously. This finalized model built from the optimal parameters will then undergo validation with a testing set to ensure that the model is appropriate and useful.
This chapter gives an introduction to the algorithm of the various modeling methods that have been commonly used in constructing QSAR models. We have used most of these methods in developing QSAR models for various pharmacodynamic, pharmacokinetic, and toxicological properties [12–16]. Even though our research have found that models developed using more complex modeling methods like the newer machine learning methods frequently outperform those developed using traditional statistical methods, it is essential to have a good foundation of all these methods. This is because no method is useful for all QSAR problems and the principle of parsimony states that we should use the simplest method that provides the desired performance level. This is to prevent overfitting of the data, which can lead to a loss in generalizability. Data collection, data processing, computation and selection of features, and model validation have been thoroughly reviewed elsewhere [17–22], so they are not described here. Software that is available for QSARs development will be discussed.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!