133,99 €
A guide to applying the power of modern simulation tools to better drug design
Biomolecular Simulations in Structure-based Drug Discovery offers an up-to-date and comprehensive review of modern simulation tools and their applications in real-life drug discovery, for better and quicker results in structure-based drug design. The authors describe common tools used in the biomolecular simulation of drugs and their targets and offer an analysis of the accuracy of the predictions. They also show how to integrate modeling with other experimental data.
Filled with numerous case studies from different therapeutic fields, the book helps professionals to quickly adopt these new methods for their current projects. Experts from the pharmaceutical industry and academic institutions present real-life examples for important target classes such as GPCRs, ion channels and amyloids as well as for common challenges in structure-based drug discovery. Biomolecular Simulations in Structure-based Drug Discovery is an important resource that:
-Contains a review of the current generation of biomolecular simulation tools that have the robustness and speed that allows them to be used as routine tools by non-specialists
-Includes information on the novel methods and strategies for the modeling of drug-target interactions within the framework of real-life drug discovery and development
-Offers numerous illustrative case studies from a wide-range of therapeutic fields
-Presents an application-oriented reference that is ideal for those working in the various fields
Written for medicinal chemists, professionals in the pharmaceutical industry, and pharmaceutical chemists, Biomolecular Simulations in Structure-based Drug Discovery is a comprehensive resource to modern simulation tools that complement and have the potential to complement or replace laboratory assays for better results in drug design.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 754
Veröffentlichungsjahr: 2019
Cover
Foreword
Part I Principles
1 Predictive Power of Biomolecular Simulations
1.1 Design of Biomolecular Simulations
1.2 Collective Variables and Trajectory Clustering
1.3 Accuracy of Biomolecular Simulations
1.4 Sampling
1.5 Binding Free Energy
1.6 Convergence of Free Energy Estimates
1.7 Future Outlook
References
2 Molecular Dynamics–Based Approaches Describing Protein Binding
2.1 Introduction
2.2 Protein–Protein Binding
2.3 Protein–Peptide Binding
2.4 Protein–Ligand Binding
2.5 Future Directions
2.6 Grand Challenges
References
Part II Advanced Algorithms
3 Modeling Ligand–Target Binding with Enhanced Sampling Simulations
3.1 Introduction
3.2 The Limits of Molecular Dynamics
3.3 Tempering Methods
3.4 Multiple Replica Methods
3.5 Endpoint Methods
3.6 Collective Variable‐Based Methods
3.7 Binding Kinetics
3.8 Conclusions
References
Chapter 4: Markov State Models in Drug Design
4.1 Introduction
4.2 Markov State Models
4.3 Microstates
4.4 Long‐Lived Conformations
4.5 Transition Paths
4.6 Outlook
Acknowledgments
References
5 Monte Carlo Techniques for Drug Design: The Success Case of PELE
5.1 Introduction
5.2 The PELE Method
5.3 Examples of PELE's Applications
Acknowledgments
References
6 Understanding the Structure and Dynamics of Peptides and Proteins Through the Lens of Network Science
6.1 Insight into the Rise of Network Science
6.2 Networks of Protein Structures: Topological Features and Applications
6.3 Networks of Protein Dynamics: Merging Molecular Simulation Methods and Network Theory
6.4 Coarse‐Graining and Elastic Network Models: Understanding Protein Dynamics with Networks
6.5 Network Modularization to Understand Protein Structure and Function
6.6 Laboratory Contributions in the Field of Network Science
6.7 Conclusions and Perspectives
Acknowledgments
References
Part III Applications and Success Stories
7 From Computers to Bedside: Computational Chemistry Contributing to FDA Approval
7.1 Introduction
7.2 Rationalizing the Drug Discovery Process: Early Days
7.3 Use of Computer‐Aided Methods in the Drug Discovery Process
7.4 Future Outlook
References
8 Application of Biomolecular Simulations to G Protein–Coupled Receptors (GPCRs)
8.1 Introduction
8.2 MD Simulations for Studying the Conformational Plasticity of GPCRs
8.3 Application of MD Simulations to GPCR Drug Design: Why Should We Use MD?
8.4 Evolution of MD Timescales
8.5 Sharing MD Data via a Public Database
8.6 Conclusions and Perspectives
Acknowledgments
References
9 Molecular Dynamics Applications to GPCR Ligand Design
9.1 Introduction
9.2 The Role of Water in GPCR Structure‐Based Ligand Design
9.3 Ligand‐Binding Free Energy
9.4 Ligand‐Binding Kinetics
9.5 Conclusion
References
10 Ion Channel Simulations
10.1 Introduction
10.2 Overview of Computational Methods Applied to Study Ion Channels
10.3 Properties of Ion Channels Studied by Computational Modeling
10.4 Free Energy Methods Applied to Channels Bearing Hydrophobic Gates
10.5 Conclusion
Acknowledgments
References
11 Understanding Allostery to Design New Drugs
11.1 Introduction
11.2 Protein Allostery: Basic Concepts and Theoretical Framework
11.3 Exploiting Allostery in Drug Discovery and Design
11.4 Chaperones
11.5 Kinases
11.6 GPCRs
11.7 Conclusions
References
Chapter 12: Structure and Stability of Amyloid Protofibrils of Polyglutamine and Polyasparagine from Molecular Dynamics Simulations
12.1 Introduction
12.2 Polyglutamine Protofibrils and Aggregates [49]
12.3 Amyloid Models of Asparagine (N) and Glutamine(Q)
12.4 Summary
Acknowledgments
References
13 Using Biomolecular Simulations to Target Cdc34 in Cancer
13.1 Background
13.2 Families of E2 Enzymes
13.3 Cdc34 Protein Sequence and Structure
13.4 Cdc34 Heterogeneous Conformational Ensemble in Solution
13.5 Long‐Range Communication in Family 3 Enzymes: A Structural Path from the Ub‐Binding Site to the E3 Recognition Site
13.6 Cdc34 Modulation by Phosphorylation: From Phenotype to Structure
13.7 The Dual Role of the Acidic Loop of Cdc34: Regulator of Activity and Interface for E3 Binding
13.8 Different Strategies to Target Cdc34 with Small Molecules
13.9 Conclusions and Perspectives
Acknowledgments
References
Index
End User License Agreement
Chapter 5
Table 5.1 Main MC software used in drug development modeling.
Chapter 6
Table 6.1 Formulae of centrality indices: degree
C
deg
, betweeness
C
spb
, close...
Table 6.2 Software for analysis and visualization of networks.
Chapter 12
Table 12.1
Average crystal parameters for the
ZD systems as shown in Figure
1...
Table 12.2 List of the initial structures and their characteristics.
Chapter 1
Figure 1.1 Gartner hype cycle of inventions.
Figure 1.2 Schematic illustration of designs of biomolecular simulatio...
Figure 1.3 Alternative representations of free energy relationships (s...
Figure 1.4 Improvement of force fields over time. Each force field was...
Figure 1.5 Schematic representation of funnel techniques and distance ...
Figure 1.6 Demonstration of block error analysis on sampling of a mode...
Chapter 3
Figure 3.1 Schematic representation of a replica‐exchange algorithm, w...
Figure 3.2 The effects of MetaD on a monodimensional toy model. In cla...
Figure 3.3 The importance of choosing the correct MetaD CVs on a bidim...
Chapter 4
Figure 4.1 Dominant eigenspace of the propagator. (a) Energy function;...
Figure 4.2 Construction of an MSM. (a) Example trajectory; (b) count ma...
Figure 4.3 Implied timescale test for the one‐dimensional diffusion pro...
Figure 4.4 Workflow of MSM analyses for drug design.
Figure 4.5 MSMs of Cyclosporin A (a) hierarchy of the free energy in wa...
Chapter 5
Figure 5.1 Process flow in a PELE simulation: Each simulation step buil...
Figure 5.2 Example of active site search and binding on the aspirin–pho...
Chapter 6
Figure 6.1 Network built from the Cα atoms of the µ
‐
opioid recep...
Figure 6.2 Illustration of the centrality degree (a), the betweenness c...
Figure 6.3 View of the 10 most central residues, as measured by the clo...
Figure 6.4 All‐atom representation involving 4728 atoms (a) and coarse‐...
Figure 6.5 Two‐bead, PRIMO/PRIMONA [174], MARTINI [167] models, and ext...
Figure 6.6 Various levels of coarse‐graining of the µOR structure, from...
Figure 6.7 Influence of the cutoff value
R
c
on the connectivity of el...
Figure 6.8 Views of the 6.0 Å cutoff (a), the VORO (b), and the VLDP (c...
Figure 6.9 Modularization of a network with 25 nodes represented by blu...
Figure 6.10 Modular structures of µOR as generated from AA MD data with...
Figure 6.11 Molecular structure of endothiapepsin ligands BW624 and CP‐...
Figure 6.12 Superimposition of the atomic (black sticks) and CPs (red s...
Figure 6.13 Design of the N/1 and N/5 CG ENMs of µOR. The parameterizat...
Figure 6.14 (a) Coarse‐grained structure of µOR at a resolution of one ...
Figure 6.15 Comparisons between the AA (red) and CG (black) MD simulati...
Figure 6.16 (a) Design of a multigrained (MG) model of µOR. The applica...
Figure 6.17 Example of an XG model. Two CG resolutions are combined. Th...
Chapter 7
Figure 7.1 Overlap of the lead compound with the C‐terminal carboxylic...
Figure 7.2 Overlap of tirofiban with the RGD region of peptide inhibito...
Figure 7.3 Zolmitriptan overlaid on part of the pharmacophore model and...
Figure 7.4 General structure of 6‐,7‐ or 8‐monosubstituted 1‐ethyl‐1,4‐...
Figure 7.5 (a) Co‐crystal structure of PHA‐665752 bound to the kinase d...
Figure 7.6 Rilpivirine in the NNRTI‐binding pocket (modeled structure)....
Figure 7.7 Model of BILN‐2061 (cyan) bound to full length NS3/4A (prote...
Figure 7.8 3D model of the enzyme and example of docking of CGP38560. P...
Figure 7.9 Homology model of a substituted 2‐anilinopyrimidine compound...
Figure 7.10 Crystal structure of MK‐927 (which combines a lipophilic (p...
Figure 7.11 FEP thermodynamic cycle.
Chapter 8
Figure 8.1 Number of publications per year indexed at Thomson Reuters'...
Figure 8.2 Representation of an atomistic MD simulation including a G...
Figure 8.3 Simplified energy landscape of a GPCR in the absence of liga...
Figure 8.4 Schema of the characteristic timescales for GPCR motions (ti...
Figure 8.5 Evolution of the timescales accessible to atomistic MD simul...
Chapter 9
Figure 9.1 Binding site druggability analysis using GRID [19] to defin...
Figure 9.2 Thermodynamic cycle for relative binding free energy calcula...
Figure 9.3 Ligand–receptor binding energy profile.
Figure 9.4 (a) Time lapse of a supervised molecular dynamics (SuMD) sim...
Figure 9.5 (a) At the top, the 2D structure of one of CRF
1
R NAM studied...
Chapter 10
Figure 10.1 (a) Cumulative ion translocation over 1 µs at different TM...
Figure 10.2 Five Na
+
‐binding sites, labeled S
0
–S
4
, identified in th...
Figure 10.3 KcsA channel: (a) Transmembrane helices shown as ribbons an...
Figure 10.4 Free energy profile (potential of mean force) for the singl...
Figure 10.5 : Translocation pathway following knock‐on mechanism (upper...
Figure 10.6 Free energy as a function of the outermost K
+
ion where...
Figure 10.7 Cartoon representation of hVDAC1 from the first NMR model (...
Figure 10.8 Conductance (
G
) of hVDAC1 calculated over last 60 ns in thr...
Figure 10.9 (a) Position‐dependent diffusion coefficient profile for a ...
Figure 10.10 Structure of the TmCorA protein. (a) Bird's eye view from ...
Figure 10.11 Free energy profiles for a hexahydrated magnesium ion pass...
Figure 10.12 Structure of the 5‐HT
3
R protein. (a) Crystal structure (PD...
Figure 10.13 Free energy profiles for different species crossing the po...
Figure 10.14 Water within the glycine receptor α1 ClyR. (a) From left t...
Figure 10.15 Free energy profiles for a permeating ion through the Orai...
Chapter 11
Figure 11.1 Thermodynamic cycle representing the relationship between ...
Chapter 12
Figure 12.1 Here will show the different
protofibril models. APS a...
Figure 12.2 Initial polyQ crystal structure on the (a)
and (b)
...
Figure 12.3 Time evolution of the Q
40
fibril models.
strands are sh...
Figure 12.4 Time evolution of monomeric fibril models of various length...
Figure 12.5 Shown here are: (a) the eight classes of steric zippers, fo...
Figure 12.6 For each
model labeled in the figure, sample initial co...
Figure 12.7 For each
model labeled in the figure, sample initial co...
Chapter 13
Figure 13.1
A schematic view of the ubiquitination cascade
. The differ...
Figure 13.2
3D architecture of the family 3 of E2 enzymes
. The structur...
Figure 13.3
Cdc34 3D architecture
. The structure of the human Cdc34 (PD...
Figure 13.4
Conformational changes of Cdc34 acidic loop
. The conformati...
Figure 13.5 A schematic view of our computational workflow.
Cover
Table of Contents
Begin Reading
vi
xiii
xiv
xv
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
43
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
E1
Sippl, W., Jung, M. (Eds.)
Epigenetic Drug Discovery
2018
ISBN: 978‐3‐527‐34314‐0
Vol. 74
Giordanetto, F. (Ed.)
Early Drug Development
2018
ISBN: 978‐3‐527‐34149‐8
Vol. 73
Handler, N., Buschmann, H. (Eds.)
Drug Selectivity
2017
ISBN: 978‐3‐527‐33538‐1
Vol. 72
Vaughan, T., Osbourn, J., Jalla, B. (Eds.)
Protein Therapeutics
2017
ISBN: 978‐3‐527‐34086‐6
Vol. 71
Ecker, G. F., Clausen, R. P., and Sitte, H. H. (Eds.)
Transporters as Drug Targets
2017
ISBN: 978‐3‐527‐33384‐4
Vol. 70
Martic‐Kehl, M. I., Schubiger, P.A. (Eds.)
AnimalModels for Human Cancer
Discovery and Development of Novel Therapeutics
2017
ISBN: 978‐3‐527‐33997‐6
Vol. 69
Holenz, Jörg (Ed.)
Lead Generation
Methods and Strategies
2016
ISBN: 978‐3‐527‐33329‐5
Vol. 68
Erlanson, Daniel A. / Jahnke, Wolfgang (Eds.)
Fragment‐based Drug Discovery
Lessons and Outlook
2015
ISBN: 978‐3‐527‐33775‐0
Vol. 67
Urbán, László / Patel, Vinod F. / Vaz, Roy J. (Eds.)
Antitargets and Drug Safety
2015
ISBN: 978‐3‐527‐33511‐4
Vol. 66
Keserü, GyörgyM. / Swinney,David C. (Eds.)
Kinetics and Thermodynamics of Drug Binding
2015
ISBN: 978‐3‐527‐33582‐4
Vol. 65
Pfannkuch, Friedlieb / Suter‐Dick, Laura (Eds.)
Predictive Toxicology
From Vision to Reality
2014
ISBN: 978‐3‐527‐33608‐1
Vol. 64
Edited by
Francesco L. Gervasio and Vojtech Spiwok
Series Editors
Prof. Dr. Raimund Mannhold
Rosenweg 7
40489 Düsseldorf
Germany
Dr. Helmut Buschmann
Aachen, Germany
Sperberweg 15
52076 Aachen
Germany
Dr. Jörg Holenz
GSK
R&D Neurosciences TAU
1250 S. Collegeville Road, PA
United States
Volume Editors
Francesco L. Gervasio
University College London
Chair of Biomolecular Modelling
20 Gordon Street
WC1H 0AJ London
United Kingdom
Vojtech Spiwok
Univ. of Chemistry and Technology
Dept. of Biochemistry and Microbiology
Technická 3
166 28 Prague 6
Czech Republic
All books published by Wiley‐VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.
Library of Congress Card No.:
applied for
British Library Cataloguing‐in‐Publication Data A catalogue record for this book is available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at <http://dnb.d‐nb.de>.
© 2019 Wiley‐VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Print ISBN: 978‐3‐527‐34265‐5
ePDF ISBN: 978‐3‐527‐80684‐3
ePub ISBN: 978‐3‐527‐80685‐0
oBook ISBN: 978‐3‐527‐80683‐6
Cover Design SCHULZ Grafik‐Design, Fußgönheim, Germany
Computational chemistry tools, from quantum chemistry techniques to molecular modeling, have greatly contributed to a number of fields, ranging from geophysics and material chemistry to structural biology and drug design. Dangerous, expensive, and laborious experiments can be often replaced “in silico” by accurate calculations. In drug discovery, a number of techniques at various levels of accuracy and computational cost are in use. Methods on the more accurate end of the spectrum such as fully atomistic molecular simulations have been shown to be able to reliably predict a number of properties of interest, such as the binding pose or the binding free energy. However, they are computationally expensive. This fact has so far hampered the systematic application of simulation‐based methods in drug discovery, while inexpensive heuristic molecular modeling methods, such as protein–ligand docking are routinely used.
However, things are rapidly changing and the potential of atomistic biomolecular simulations in academic and industrial drug discovery is becoming increasingly clear. The question is whether we can expect an evolution or a revolution in this field. There are examples of other areas of life sciences where a revolution took or is taking place. For example, sequencing of the human genome took a decade and was funded by governments of several countries. Today, sequencing of eukaryotic genomes has become a routine, and a million‐genome project is on the way owing to highly efficient and inexpensive parallel sequencing technology. Similarly, genetic manipulations are becoming significantly easier and more efficient owing to CRISPR/Cas technology. At the same time, the deep learning revolution is having a deep impact on many fields. The open question is whether we can expect such a revolution in biomolecular simulations due to new groundbreaking technology and convergence with machine learning techniques or a stepwise evolution due to the availability of new hardware, of grid and cloud resources, as well as advances in force‐field accuracy, enhanced sampling techniques, and other achievements.
The aim of this book is to report on the current state and promising future directions for biomolecular simulations in drug discovery. Although we personally believe that there is true potential for a simulation‐based revolution in drug discovery, we will let the readers draw their own conclusions.
In the first part of the book, called Principles, we give an overview of biomolecular simulation techniques with focus on modeling protein–ligand interactions. When applying any molecular modeling method, we have to ask the question how accurate is the method in comparison with the experiment. There are three major factors influencing the overall accuracy of biomolecular simulations. First, the method itself is approximative. Second, we use a simplified structure–energy relationship (such as molecular mechanics force field), which is approximative, especially for new classes of molecules. And, finally third, the simulated system is an image of a single or few molecules observed for a short time in contrast to the experiment that typically provides observations averaged over a vast number of molecules and over a significantly longer time. In the other words, sampling of states in the simulation may be incomplete compared to sampling in the experiment. These issues are discussed in Chapter 1. Chapter 2 focuses on the “sampling problem,” in contexts relevant to drug discovery, namely, in modeling of protein–protein, protein–peptide, and protein–ligand interactions.
The second part of the book is called Advanced Algorithms. It presents algorithms used to solve problems presented in the first part of the book, especially the sampling problem. It is possible to artificially force the system to sample more states than in a conventional molecular simulation. The dynamics in such simulations is biased, but it is possible to derive statistically meaningful long‐timescale behavior and free energies from such simulations. These techniques, referred to as enhanced sampling techniques, are presented in Chapter 3. The methods include sampling enhancement obtained by raising the temperature (tempering methods), methods employing artificial potentials or forces acting on selected degrees of freedom, combined approaches, and other methods.
The traditional approach to evaluate protein–ligand interactions in drug discovery is based on thermodynamics, i.e. measurement or prediction of Ki, IC50, binding ΔG, or similar parameters. However, recently it turned out that kinetics of protein–ligand binding and unbinding is highly important, often more important than the thermodynamics. Markov state models presented in Chapter 4 provide an elegant way to describe thermodynamics and kinetics of the studied process from various types of molecular simulations.
Other solutions to the sampling problem are based on a simplified representation of the studied system or of its dynamics. These approaches are covered in Chapters 5 and 6. Chapter 5 presents an alternative sampling approach based on a Monte Carlo method: PELE. The dynamics of the system is simplified to harmonic vibrations of a protein and translations and rotations of a ligand. This is used in each step to propose the new state of the system, which is either accepted or rejected in the spirit of the Monte Carlo method. The algorithm is highly efficient in exploring ligand and target dynamics, as demonstrated by a number of ligand design applications. Chapter 6 presents an overview of network models. It is possible to represent the structure of a protein as a network of interactions. This approach makes it possible to simplify (coarse grain) the studied system, study the system in terms of normal modes, and combine these coarse‐grained models with fine‐grained models.
The third part of the book is called Applications and Success Stories. Chapter 7 provides an overview of the applications of molecular modeling methods in drug discovery. It presents various molecular modeling methods, including quantitative structure–activity relationship (QSAR) and ligand‐based models, pharmacophore modeling, protein–ligand docking, biomolecular simulations, and quantum chemistry methods. Each technique is presented together with its practical impact in drug development and with examples of approved drugs.
Chapters 8 and 9 focus on the largest group of drug targets – G protein–coupled receptors (GPCRs), one from the academic and one from industrial perspective. The issues covered by these chapters include sampling problem, the role of membrane and water, free energy predictions, ligand binding kinetics, and others. Simulation of GPCRs is challenging partially due to their membrane environment. Another important group of membrane‐bound targets are ion channels covered in Chapter 10. Special topics related to ion channels, such as modeling of ion selectivity and ion conductance, are described in this chapter.
Allostery is a very important topic when studying protein–ligand interactions because many ligands bind to sites other than those expected and/or make an effect on sites other than the binding one. Allostery, its thermodynamics, ways of modeling, and application on various drug targets are described in Chapter 11.
The last two chapters are focused on specific topics of current relevance in drug discovery. Chapter 12 presents the way to address protein misfolding and aggregation by biomolecular simulations. This is illustrated on polyglutamine and polyasparagine protofibrils from simulations to thermodynamic models of aggregate formation. Chapter 13 targets the cell cycle and the role of ubiquitin‐mediated proteolysis. In the example of Cdc34, it is illustrated how biomolecular simulations can be integrated with structural biology and other methods to elucidate the structure and dynamics of a drug target.
This book was realized thanks to the invitation from Prof. Gerd Folkers and thanks to support by him and other series editors. We gratefully acknowledge their support and patience. We also thank Dr. Frank Weinreich, Dr. Stefanie Volk, and Dr. Sujisha Karunakaran from Wiley‐VCH for their support and pleasant collaboration on this volume.
We believe that the book can add more dynamics to drug design and more drug design to biomolecular simulations.
Prague and London, July 2018
Francesco L. Gervasio
Vojtěch Spiwok
Vojtěch Spiwok
University of Chemistry and Technology, Prague, Department of Biochemistry and Microbiology, Technická 3, 166 28 Prague 6, Czech Republic
Biomolecular simulations are becoming routine in structure‐based drug design and related fields. This chapter briefly presents the history of molecular simulations, basic principles and approximations, and the most common designs of computational experiments. I also discuss statistical analysis of simulation results together with possible limits of accuracy.
The history of computational modeling of molecular structure and dynamics goes back to 1953, to the work of Rosenbluth and coworkers [1]. It introduced the Markov chain Monte Carlo as a method to study a simplified model of the fluid system. Atoms of the studied system were perfectly inelastic and the system was two‐dimensional (2D) instead of three‐dimensional (3D), so the analogy with real molecular systems was not perfect. The first molecular dynamics simulation (i.e. modeling of motions) on the same system was done by Alder and Wainwright in 1957 [2] using perfectly elastic collision between 2D particles. The first molecular simulation with specific atom types was done by Rahman in 1964 [3]. Rahman used a CDC 3600 computer to simulate dynamics of 864 argon atoms modeled using Lennard‐Jones potential. The first simulation of liquid water was published by Rahman and Stillinger in 1971 [4].
Another big milestone was the first biomolecular simulation. McCammon, Gelin, and 2013 Nobel Prize winner Karplus simulated 9.2 ps of the life of the bovine pancreatic trypsin inhibitor (BPTI, also known as aprotinin) in vacuum [5]. The simulation was performed during the CECAM (Centre Européen de Calcul Atomic et Moléculaire) workshop “Models of Protein Dynamics” in Orsay, France on CECAM computer facilities [6]. It was one of the first works showing proteins as a dynamic species with fluid‐like internal motions, even though in the native state.
Biomolecular simulations have undergone a huge progress in terms of accuracy, size of simulated systems, and simulated times since their pioneer times. However, the question arises whether this progress is enough for their practical application in drug discovery, protein engineering, and related applied fields. To address this issue, let me present here the concept of the hype cycle [7] developed by Gartner Inc. and depicted in Figure 1.1. According to this concept, every new invention starts by a Technology Trigger. Visibility of the invention grows until it reaches the Peak of Inflated Expectations. At this point, failures of the invention start to dominate over its benefits and the invention falls into the phase of Trough of Disillusionment. From this phase a new and slower progress starts in the phase of Slope of Enlightenment toward the Plateau of Productivity. Biomolecular simulation passed the Technology Trigger and Peak of Inflated Expectations as many expected that biomolecular simulation would become routine and an inexpensive alternative to experimental testing of compounds for biological activity. Now, in my opinion, biomolecular simulations are located on the Slope of Enlightenment with a slow but steady progress toward the Plateau of Productivity.
Figure 1.1 Gartner hype cycle of inventions.
Biomolecular simulations can follow different designs. I use the term design to describe the setup of the simulation procedure chosen in order to answer the research hypothesis. There are three major designs of molecular simulation. The first design starts from a predicted structure of the molecular system, which we want to evaluate, for example, a protein–ligand complex predicted by a simple protein–ligand docking. I refer to this as the evaluative design (Figure 1.2). The research hypothesis is: Does the predicted structure represent real structure? The basic assumption behind this design is that an accurately predicted structure of the system, for example, an accurately modeled structure of the complex, is lower in free energy than an inaccurately predicted one. The system therefore tends to be stable in a simulation starting from an accurately modeled structure and tends to be unstable in a simulation starting from an inaccurate structure. The evaluative design can be represented by the study of Cavalli et al. [8]. This study was published in 2004, and simulated times are therefore significantly shorter (typically 2.5 ns) than those available today. Nevertheless, the same length of simulations can be used today with much higher throughput in terms of the number of tested compounds or their binding poses; therefore, the study is still highly actual. Docking of propidium into human acetylcholine esterase (Alzheimer disease target) by the program Dock resulted in the prediction of 36 possible binding poses (clusters of docked binding poses). Six of them were then subjected to 2.5‐ns simulation. Evolution of these systems was analyzed in terms of root‐mean‐square deviation (RMSD). Binding poses with high stability in simulations were similar to experimentally determined binding poses for a homologous enzyme.
Figure 1.2 Schematic illustration of designs of biomolecular simulations. Horizontal dimensions correspond to coordinates of the system, and contours correspond to the free energy.
The second design is referred to as refinement design (Figure 1.2). It uses an assumption similar to the evaluative design, i.e. that molecular simulations tend to evolve from high‐free energy states to low‐free energy states. In the refinement design, it is hoped that the dynamics can drive the system from the predicted structure, even though incorrectly predicted, to global free energy minimum, the correct structure, or at least close to it. Naturally, shorter simulation times are necessary to demonstrate correctness or incorrectness of a model by the evaluative design. Longer simulation times are necessary to drive the system from the incorrect to the correct state by the refinement design. In the previous paragraph, I used the study of Cavalli et al. from 2004 [8] as an example of evaluative design. I can present the refinement design on the work published by the same author 11 years later [9]. They used unbiased simulation to predict the binding pose of picomolar inhibitor 4′‐deaza‐1′‐aza‐2′‐deoxy‐1′‐(9‐methylene)‐immucillin‐H in human purine nucleoside phosphorylase. They carried out 14 simulations (500 ns each) of the system containing the trimeric enzyme, 9 ligand molecules (to increase its concentration) placed outside the protein molecule, solvent, and ions. From these simulations, 11 evolved toward binding with a good agreement with the experimentally determined structure of the complex. RMSD from the experimentally determined structure of the complex dropped during these simulations from approximately 6 to 0.2–0.3 nm.
The last design introduced here is referred to as equilibrium design (Figure 1.2). In this design, we hope that the simulation is sufficiently long (or sampling is sufficiently enhanced) to explore all relevant free energy minima and to sample them according to their distribution in the real system. Naturally, the equilibrium design requires longest simulation times or highest sampling enhancement from all three simulation designs. As an example I can present the study by D.E. Shaw Research [10]. The authors simulated systems containing the protein FK506 binding protein (FKBP) with one of six fragment ligands, water, and ions. They carried out 10‐µs simulations for each ligand. The dissociation constant of a complex can be calculated from its association kinetics as KD = koff/kon. Weak binding (high KD) together with reasonably fast binding kinetics therefore implies that unbinding is also sufficiently fast. For this reason, microsecond timescales were enough to observe multiple binding and unbinding events for millimolar ligands. The fragments identified by these simulations as relatively strong binders can be selected and combined into larger compounds with higher affinity in the manner of fragment‐based drug design [11]. Fragment‐based drug design and molecular dynamics simulation seem to be a good combination. Fragment‐based design requires testing of a low number of weak ligands. This is good, since biomolecular simulations are computationally expensive. Reciprocally, weak binding enables to use molecular dynamics simulations in available timescales. Moreover, unlike some experimental methods of fragment‐based drug design, molecular simulations provide binding pose prediction that can be used to combine fragments.
The three designs described are not without pitfalls. Most of these pitfalls are caused by limitations of simulated timescales. It is often difficult or impossible to simulate timescales long enough to destabilize the structure in the evaluation design, reach the global free energy minimum in the refinement design, or obtain the equilibrium distribution in the equilibrium design. This problem can be addressed by enhanced sampling techniques discussed later in this chapter.
The main problem of the evaluative design is that many correct structures of proteins or protein–ligand complexes are relatively flexible. It is therefore difficult to decide whether high flexibility (in terms of RMSD or ligand displacement) indicates a wrong model or not.
This is not the only problem of biomolecular simulation designs. Figure 1.2 shows three minima A, B, and C. Even an incorrect model A may be separated by a large energy barrier from the structure B and from the correct structure C. This can make A stable in the timescales of an evaluative simulation. Similarly, when a refinement simulation evolves from structure A to structure B and stays there, it is not guaranteed that B is the correct structure. Finally, even if a perfect equilibrium sampling is reached between A and B, the unexplored structure C can still exist.
When the system is fully sampled and equilibrium distribution of states is achieved in the equilibrium design, it is possible to calculate a free energy profile of the studied system. For this it is necessary to classify states along the trajectory. In other than equilibrium design, it is necessary to monitor the progress of the simulation. These analyses often employ the concept of collective variables (CVs). A CV is a parameter that can be calculated from the atomic coordinates of the studied system. It can be calculated in every simulation snapshot, so it can be viewed as a function of time (i.e. s(t)). It has to be chosen so that its value changes with the progress of the simulated process. Finally, CVs should be relevant to the experiment. There are simple CVs such as distances between atoms or geometrical centers or 3‐point (valence) and 4‐point (torsion) angles. RMSD from the reference structure often used to monitor stability during simulation is also an example of CV. Other more sophisticated CVs include those specifically developed for studying intermolecular interactions [12] and protein folding [13], principal component analysis (PCA), and related methods [14, 15], machine‐learning‐based CVs [16–18], and others.
Once values of some CV (or CVs) are calculated for all snapshots along the trajectory, it is possible to calculate one‐dimensional (1D), 2D, or multidimensional histograms. These histograms can be expressed in energy units as estimated free energy surface:
where F is a (relative) free energy surface, s is a multidimensional vector of CVs, P is its probability distribution (histogram), k is the Boltzmann constant, and T is temperature. Calculation of an accurate free energy surface requires complete sampling of all relevant states of the simulated system. Its accuracy is addressed later.
A discontinuous alternative to CVs is trajectory clustering. Cluster analysis of simulation coordinates (usually preprocessed by fitting to a reference structure to remove translational and rotational motions) makes it possible to place each simulation snapshot to a certain cluster. Similar to CVs, it is possible to estimate free energy surface as
where Fi and Pi are free energy and probability, respectively, of the ith cluster. Several clustering algorithms, general as well as tailored for molecular simulations, have been tested in the analysis of molecular simulations. Several packages and tools have been developed for trajectory clustering, namely, the gmx cluster from Gromacs package [19], Gromos tools [20], CPPTRAJ from Amber package [21], and stand‐alone packages Bio3D (for R) [22], MDAnalysis (for Python) [23] and MDTraj (for Python) [24]. Many of these tools make it possible to analyze trajectories in terms of both clusters and CVs. Popular algorithms for trajectory clustering are nonhierarchical K‐means [25], K‐medoids [26], and Gromos algorithm by Daura and coworkers [27]. Hierarchical methods can be used for a tree‐based representation of free energy surfaces [28], but they are often used together with nonhierarchical methods to reduce the number of clusters.
A key question in application of nonhierarchical clustering methods, such as the K‐means or K‐medoids algorithm, is the choice of the value of K – the number of clusters. This question is general, not related only to the analysis of molecular dynamics trajectories. Interestingly, the solution of this problem by “Clustering by fast search and find of density peaks,” was developed by molecular scientists, namely, by Laio and Rodriguez, and became widely used in nonmolecular sciences [29]. This method automatically chooses a suitable number of clusters on the basis of density of points.
The result of a CV‐based analysis of a molecular trajectory is a one‐, two‐, or multidimensional probability distribution or a free energy surface. The result of cluster analysis is a list of clusters with representative structures or centroids and with corresponding probabilities or free energies. Alternatively, it is possible to represent clusters in graph‐based or tree‐based representations. The graph‐based representation [30] shows free energy minima as graph nodes. Connection of two nodes by edges usually indicates that a transition between these nodes is kinetically favorable. The tree‐based representation [28] shows free energy minima as nodes and transitions as branches. Finally, the Markov chain model is another elegant way to represent free energy surface. This approach is presented in Chapter 4. Different representations of free energy relationships in molecular systems are depicted in Figure 1.3.
Figure 1.3 Alternative representations of free energy relationships (schematic views).
The predictive power of molecular simulations depends on their accuracy. The accuracy is influenced by accuracy of simulation methods, molecular mechanics (MM) potentials (also referred to as force fields, mathematical models used to calculate potential energy, and forces based on atomic coordinates) and on completeness of sampling of all relevant states of the studies system. Accuracy of simulation methods has been assured by the development of sophisticated thermostats, barostats, and electrostatics models in the past decades. Application of these models and methods nowadays avoids most simulation artifacts. Nowadays one of the few important method‐related artifacts in biomolecular simulations is self‐interaction in the periodic boundary condition because many researchers tend to minimize the simulated system to increase the simulation speed.
The second ingredient in biomolecular simulations is the MM force field. Exciting quantum mechanical (QM) or mixed QM/MM simulations are not discussed here. Force fields have been the subject of intensive development focused on their accuracy. Evaluation of the accuracy of molecular simulations is not trivial. For example, force field accuracy can be simply tested by comparing energies calculated by the force field and by an accurate reference method, for example, by some quantum chemistry method. However, this evaluation approach is tricky. Individual bonded and nonbonded force field terms differ significantly in their magnitudes. For example, a small change in a bond angle can be associated with high change of energy. In contrast, formation of non‐covalent interactions is usually associated with much lower energy changes. Both these terms can contribute differently to overall accuracy of predictions made by molecular simulations. As a result, a force field that seems to be inaccurate by comparison of energies may be, in fact, pretty accurate in practical application and vice versa.
The progress in accuracy of MM potential can be illustrated by Figure 1.4 from the work of Lindorff‐Larsen et al. [31]. These authors systematically tested MM potentials for proteins developed from 1998 to 2011. These potentials were tested by very long simulations of a folded protein and protein folding process. Each potential was given a score from 0 to 6 depending on agreement of simulations with experimental data (0 for the best agreement). Figure 1.4 shows a steady progress in accuracy, with no major accuracy issues in two force fields published in 2010 and 2011. This progress fits well into the picture of the hype cycle with a slow but steady and systematic improvement in the field in the Slope of Enlightenment.
Figure 1.4 Improvement of force fields over time. Each force field was evaluated in three simulation tasks and awarded 0–2 points per task depending on the agreement with experimental data. Low scores indicate good agreement with experiments.
Source: Taken from Lindorff‐Larsen et al. [31], Creative Commons Attribution License.
One problematic feature of most MM force fields is the absence of polarizability. Conventional force fields model atoms as charged points. In reality, charge distribution changes dynamically as a response to the environment. Polarizable versions of CHARMM [32] and special AMOEBA force fields [33] were developed.
Main developers of protein force field also develop compatible general force fields for ligands, either under the same title (such as OPLS3 [34]) or under an alternative name (General Amber Force Field, or GAFF[35], for the Amber force field series or CHARMM General Force Field, or CGenFF[36] for the CHARMM force field series). Some force field developers also provide online tools for generation of force field parameters for an uploaded compound in mol2 or pdb format, such as CGenFF web [36] and SwissParm [37] for CHARMM or LigParGen [38] for OPLS‐AA. A web‐based graphical user interface for CHARMM, known as CHARMM‐GUI [39], also provides this functionality, besides other features such as membrane setup for membrane protein simulations.
When comparing protein and general molecule force fields, the situation is not so bright for general molecules. General druglike molecules are much more diverse than 20 amino acid residues. Therefore, at least early force fields for general small molecules contained utterly erroneous terms, for example, wrong hybridization types. Evolution of general force fields corrected most of these errors; nevertheless, development of force fields applicable for all druglike molecules is challenging and these force fields are still inaccurate for many classes of compounds.
Systematic evaluation of force fields by comparison of energies calculated by force fields and by quantum chemistry methods for optimized structures [40] revealed that most problematic molecules are flexible multitorsion molecules or molecules with unusual conjugation of double bonds; however, the relationship between the structure and force field inaccuracy is not clear.
Also, modeling of interactions between a protein and a ligand can be affected by ligand force field inaccuracies or incompleteness. Widely discussed in this context is a halogen bond CX⋯A, where X is a halogen (usually other than fluorine) and A is a conventional hydrogen bond acceptor, typically oxygen [41]. It has been shown that this type of bond is common in recognition of druglike molecules [42]. Classical DH⋯A hydrogen bond is modeled by most force fields as a combination of electrostatic attraction and van der Waals repulsion between H and A. Since halogens in organic molecules as well as hydrogen bond acceptors are partially negatively charged, interactions between these two groups are rather repulsive. The origin of the halogen bond is in unusual distribution of electrons, referred to as sigma hole, in halogens bound in organic molecules. This phenomenon is usually not modeled by conventional force fields. A new atom type of halogen bond donor atoms has been introduced into the ligand version of optimized potentials for liquid simulations (OPLS) force field and this force field was successfully applied in computational prediction of binding free energies of HIV reverse transcriptase inhibitors [42].
It is possible to improve the accuracy of an individual modeled molecule instead of trying to improve the force field as a whole. Several approaches and tools have been developed for this purpose. For example, it is possible to improve CHARMM force fields using the Force Field Toolkit (ffTK) [43], which is a plugin for a popular visual molecular dynamics (VMD) viewer [44]. Another effort to improve accuracy of simulation of protein–ligand complexes is a repository of ligand parameters. At the website www.ligandbook.org it is possible to find parameters of approximately 3000 molecules in different force fields and for different program packages [45].
The necessity to use femtosecond integration steps together with the fact that each atom in a condensed biomolecular system interacts with another approximately 5000 atoms (considering 2 nm as an interaction cutoff) causes biomolecular simulations that are extremely computationally expensive. The history of biomolecular simulations is tightly connected with availability of computer power. The 1980s were characterized by the introduction of personal computers and a boom in academic supercomputers. The 1990s were characterized by parallelization, i.e. joining of inexpensive computers to larger clusters. Other ideas, such as distributed computing projects using computer power of volunteers' PCs [46], use of GPUs [47], and special purpose computers [48], were introduced later. As a result of the progress in computer power, the first biomolecular simulations studied picosecond timescales, nanosecond simulations became available in the early 1990s, the first microsecond simulations were carried out in the late 1990s, and the milliseconds milestone was reached in around 2010. However, it must be kept in mind that these timescales were typically reached for small molecular systems on cutting‐edge hardware and at the time of their publication were far from routine.
Sampling of a biomolecular system can be compared to the situation when a department store manager wants to evaluate the “affinity” of customers to different parts of the department store he manages. It is possible to choose a certain customer and follow his or her route through the department store. It is then possible to calculate probability for individual departments as a ratio of time spent in the department divided by the total time. It is also possible to use Eq. 1.1 to express this probability as free energy (temperature is discussed later). However, this approach, equivalent to the classical molecular dynamics simulation, is inefficient because the customer may stay for a long time in some department and it can take a very long time to sample all departments.
An alternative in the molecular world to running very long simulations is application of enhanced sampling techniques. These techniques were designed to provide equivalent information as several orders of magnitude longer conventional (unenhanced) simulations. There is a group of enhanced sampling techniques that use a bias force or bias potential to accelerate the studied process. Other methods use elevated temperature or other principles. Several hybrid sampling enhancement methods combining multiple principles have been also developed.
Simulations using a bias potential or a bias force, further referred to as biased simulations, include the umbrella sampling method [49], metadynamics [50], steered molecular dynamics [51], local elevation [52], local elevation umbrella sampling [53], adaptively biased molecular dynamics [54], variationally enhanced sampling [55], flying Gaussian method [56], and others. These methods can be divided into two groups depending on whether the bias potential or force is static or dynamic.
The method known as umbrella sampling uses a static bias potential. In the analogy to the department store presented, it is possible to represent it by organizing sales in some unattractive departments and hiking prices in attractive ones. This will make sampling much more efficient. Provided that it is possible to quantify the effect of sales and price elevations, it is possible to calculate the equilibrium probabilities (probabilities under condition of regular prices) from sampling and from price modifications.
Umbrella sampling introduced by Torrie and Valleau in 1977 [49], originally in connection with the Monte Carlo method, represents methods with a static bias potential (some scientists use the term umbrella sampling as a synonym for any simulation with a static bias potential). In the most common design, it is used to enhance sampling along certain CVs (e.g. protein–ligand distance) to predict the corresponding free energy surface. Umbrella sampling is done by running a series of simulations, each with a bias potential k(s − Si)2/2, wherek determines strength of the bias potential, s is the CV, and Si (for ith simulation) ranges from the initial S0 and the final state SN of the simulated process (e.g. bound and unbound state) and is usually uniformly distributed. This potential forces the ligand to sample all states along the binding pathway. Free energy surface can be calculated by, for example, weighted histogram analysis method (WHAM) [57] or by the reweighting formula [58–60]. These methods are explained later; so, briefly, it is possible to calculate unbiased sampling from the knowledge of the biased sampling and the bias potential. An example of umbrella sampling in drug discovery is the study of Bennion et al. [61]. They simulated permeation of drug molecules through the membrane. They used a coordinate perpendicular to the membrane as the CV. This CV was ranging from 0 to 10 nm in 0.1‐nm windows (i.e. 100 simulations). They correctly ranked tested compounds as impermeable; low, medium or highly permeable; and in a good quantitative agreement with parallel artificial membrane permeability assays (PAMPA).
Biased simulation with a time‐dependent bias potential can be represented by the metadynamics method [50]. In the department store example, it is possible to carry out metadynamics using a device that, at regular intervals, releases some stinky compound. Such a device must be installed onto a customer's shopping basket. If the customer stays for a long time in some department, the device causes the stinky compound to accumulate there. This forces the customer to escape the department and to visit other departments. This makes sampling much more efficient. The free energy surface can be estimated from the amount of the stinky compound, i.e. deep minima require a high amount of the stinky compound.
In the molecular world, that application of metadynamics starts with choice of CVs, typically two. The system is then simulated by conventional simulation for 1 or 2 ps. Then, values of CVs are calculated and recorded as S1. From this point, a bias potential in the form of a Gaussian hill centered in S1 is added to the simulated system. The system evolves for another 1 or 2 ps, then another hill is added to S2, and so forth. The bias potential accumulates in certain free energy minima until this minimum is flooded and the simulation can escape it. This allows for complete sampling of the free energy surface. The free energy surface can be estimated as the negative value of the bias potential [50, 62, 63], because the deeper the free energy minimum, the more hills it needs to flood.
The accuracy of metadynamics (and other biased simulations) is critically dependent on the choice of CVs. Ideally, the CVs must account for all slow degrees of freedom in the simulated system. Existence of some slow degree of freedom not addressed by CVs may cause a significant drop of accuracy. Imagine a simulation of protein–ligand interaction. Naturally, one of the CVs for protein–ligand interaction modeling can be the protein–ligand distance to accelerate binding and unbinding. The second CV should address other slow motions. Imagine the situation that the entrance to the binding site may be occasionally blocked by some amino acid side chain. If the site is blocked, the ligand cannot move inside or outside the binding site. This leads to a huge overestimation or underestimation of the predicted binding free energy.
An ideal solution to this problem would be a second CV that fully addresses side chain motions. It is difficult to design such CVs due to the complexity of the molecular system because there could be multiple problematic side chains or other degrees of freedom. Instead, most researchers rely on sampling. Simulations in timescales of hundreds of nanoseconds or microseconds are usually not long enough to simulate binding and unbinding events, but it is often sufficient to sample such problematic degrees of freedom once binding and unbinding is enhanced.
However, in classical metadynamics, this may cause the problem of hysteresis in the predicted free energy surface due to altering overestimation of the bound and unbound state. This problem can be addressed by well‐tempered metadynamics [64]. Well‐tempered metadynamics is metadynamics with variable hill heights. The height set by user is scaled by exp(−Vbias(s)/kΔT), where ΔT is the difference between sampling temperature and the temperature of the simulation. Classical metadynamics corresponds to ΔT = infinity and unbiased simulation to ΔT = 0. Flooding of the free energy surface in well‐tempered metadynamics slows down until its convergence. The free energy can be calculated as a negative value of the bias potential scaled by (T + ΔT)/ΔT. The fact that the biasing slows down reduces the hysteresis and increases the accuracy. For this reason, well‐tempered metadynamics replaced classical metadynamics in the past decade. Well‐tempered metadynamics, together with a funnel method (described later), was used to simulate binding and unbinding and to accurately predict binding free energies for ligands of GPCR, including cannabinoid CB1[65], β2 adrenergic [66], chemokine CXCR3 [67], and vasopressin [68] receptors.
In the previous paragraph I assumed that a single CV cannot address all slow degrees of freedom. However, it is possible to address many slow degrees of freedom by multiple CVs. It has been shown that metadynamics with more than two or three CVs is not efficient [69]. A special variant called bias exchange metadynamics [70] was developed to run metadynamics with multiple CVs. The system is simulated in multiple (N) replicas (usually one per processor CPU), where N is the number of CVs. Metadynamics biases a single CV in each replica (or there could be some unbiased replicas). Occasionally (every few picoseconds) coordinates are exchanged on the basis of an exchange criteria calculated from potential energies and bias potentials in each system. This makes it possible to predict a one‐dimensional free energy surface for each CV. Calculation of a multidimensional free energy surface requires a special reweighting procedure [71]. The bias exchange metadynamics has been applied in predicting the binding mode of the compound SSR128129E to fibroblast growth factor receptor [72].
Sampling can be also enhanced by elevated temperature. In the department store example, it is possible to find an analogy between temperature and the music played in the store. It has been shown experimentally that a tempo of music in a supermarket influences the pace of shoppers [73]. It is therefore possible to enhance sampling by playing a fast‐paced music. However, by this we would obtain a different free energy surface from the normal music played in the department store. For example, fast moving customers would prefer easy‐to‐find departments and shelves and would ignore difficult‐to‐find ones.
Similarly, in a high‐temperature molecular simulation, we would obtain a free energy surface different from the normal temperature. Such a free energy surface is usually not interesting. For example, the “native” structure of a protein at a temperature higher than its melting temperature is the unfolded structure. There is a method that makes it possible to use elevated temperature to enhance sampling and at the same time to obtain normal‐temperature free energy surfaces. This method is known as parallel tempering and belongs to the family of replica exchange methods. In the department store analogy, it would be necessary to distribute radios with headphones to multiple customers. Customers would listen to music differing in the tempo. In periodic intervals, their music would be exchanged based on the special criteria. Normal‐tempo free energy surface would be obtained by the analysis of trajectories of only those customers who listen to the normal‐tempo music.
In a molecular system, it is possible to run parallel tempering by simulation of multiple replicas of the system at different temperatures. These temperatures are chosen so that the lowest is slightly lower than the normal temperature and the highest is high enough to significantly enhance sampling. Replica exchange attempts are evaluated usually every 1 or 2 ps. The potential energy of the ith replica is compared with the potential energy of the i + 1th replica. If the potential energy of colder replica is lower, the coordinates of replicas are swapped. If not, the Metropolis criterion is calculated as exp((Ei − Ei + 1)(1/kTi − 1/kTi + 1)). If a random number (with a uniform distribution from 0 to 1) is lower than the Metropolis criterion, the coordinates in the replicas are also swapped. If the simulated system adopts an unfavorable (high‐energy) structure, it tends to be exchanged for higher temperature replicas and to climb on the temperature ladder. There it can adopt some nice structure with low energy. Once this happens, it would tend to descend on the temperature ladder. Structures sampled at the temperature of interest can be analyzed by Eq. 1.1 to obtain the corresponding free energy surface.
Parallel tempering is a very powerful method for folding of mini‐proteins. It is particularly suitable for simulation of small systems because large systems require a huge number of replicas to reach reasonable exchange rates (with a low number of exchanges, the method would behave as a series of independent unbiased simulations). I see the highest potential of parallel tempering in drug design in combination with other sampling enhancement methods. Parallel tempering in combination with metadynamics [74] has been applied to compare wild‐type and oncogenic mutants of the epidermal growth factor receptor [75].
An interesting multiple replica method that enhances sampling by cloning and merging replicas is WExplore [76]. This method simulates the system in a constant number of replicas. When two or more replicas sample similar states, they are merged. If a single replica samples some distant state, it is cloned. The free energy method can be obtained from sampling and from cloning and merging history. This method was successfully applied in modeling of the interaction between 1‐(1‐propanoylpiperidin‐4‐yl)‐3‐[4‐(trifluoromethoxy)phenyl]urea (TPPU) and its enzyme target epoxide hydrolase [77].
So far I have presented methods that can be used for general prediction of free energy relationships. Here I present special issues of modeling of protein–ligand interactions and molecular simulation methods especially suited for modeling molecular recognition. An important issue of such simulations is the fact that the entrance into the binding site usually represents only a small part of the overall protein surface. It may seem like a good idea to enhance simulation of protein–ligand binding by biasing the distance between the binding site and the ligand. However, it may happen that the ligand chooses a wrong entrance into the binding site. This significantly slows down the simulation and makes the predicted free energy surface inaccurate. There are two major approaches developed to address this problem. First, is the abovementioned application of a funnel (Figure 1.5
