Computational Toxicology -  - E-Book

Computational Toxicology E-Book

0,0
159,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A key resource for toxicologists across a broad spectrum of fields, this book offers a comprehensive analysis of molecular modelling approaches and strategies applied to risk assessment for pharmaceutical and environmental chemicals. * Provides a perspective of what is currently achievable with computational toxicology and a view to future developments * Helps readers overcome questions of data sources, curation, treatment, and how to model / interpret critical endpoints that support 21st century hazard assessment * Assembles cutting-edge concepts and leading authors into a unique and powerful single-source reference * Includes in-depth looks at QSAR models, physicochemical drug properties, structure-based drug targeting, chemical mixture assessments, and environmental modeling * Features coverage about consumer product safety assessment and chemical defense along with chapters on open source toxicology and big data

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 713

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Dedication

List of Contributors

Preface

Acknowledgments

Part I: Computational Methods

Chapter 1: Accessible Machine Learning Approaches for Toxicology

1.1 Introduction

1.2 Bayesian Models

1.3 Deep Learning Models

1.4 Comparison of Different Machine Learning Methods

1.5 Future Work

Acknowledgments

References

Chapter 2: Quantum Mechanics Approaches in Computational Toxicology

2.1 Translating Computational Chemistry to Predictive Toxicology

2.2 Levels of Theory in Quantum Mechanical Calculations

2.3 Representing Molecular Orbitals

2.4 Hybrid Quantum and Molecular Mechanical Calculations

2.5 Representing System Dynamics

2.6 Developing QM Descriptors

2.7 Rational Design of Safer Chemicals

References

Part II: Applying Computers to Toxicology Assessment: Pharmaceutical, Industrial and Clinical

Chapter 3: Computational Approaches for Predicting hERG Activity

3.1 Introduction

3.2 Computational Approaches

3.3 Ligand-Based Approaches

3.4 Structure-Based Approaches

3.5 Applications to Predict hERG Blockage

3.6 Other Computational Approaches Related to hERG Liability

3.7 Final Remarks

References

Chapter 4: Computational Toxicology for Traditional Chinese Medicine

4.1 Background, Current Status, and Challenges

4.2 Case Study: Large-Scale Prediction on Involvement of Organic Anion Transporter 1 in Traditional Chinese Medicine-Drug Interactions

4.3 Conclusion

Acknowledgment

References

Chapter 5: Pharmacophore Models for Toxicology Prediction

5.1 Introduction

5.2 Antitarget Screening

5.3 Prediction of Liver Toxicity

5.4 Prediction of Cardiovascular Toxicity

5.5 Prediction of Central Nervous System (CNS) Toxicity

5.6 Prediction of Endocrine Disruption

5.7 Prediction of ADME

5.8 General Remarks on the Limits and Future Perspectives for Employing Pharmacophore Models in Toxicological Studies

References

Chapter 6: Transporters in Hepatotoxicity

6.1 Introduction

6.2 Basolateral Transporters

6.3 Canalicular Transporters

6.4 Data Sources for Transporters in Hepatotoxicity

6.5

In Silico

Transporters Models

6.6 Ligand-Based Approaches

6.7 OATP1B1 and OATP1B3

6.8 NTCP

6.9 OCT1

6.10 OCT2

6.11 MRP1, MRP3, and MRP4

6.12 BSEP

6.13 MRP2

6.14 MDR1/P-gp

6.15 MDR3

6.16 BCRP

6.17 MATE1

6.18 ASBT

6.19 Structure-Based Approaches

6.20 Complex Models Incorporating Transporter Information

6.21

In Vitro

Models

6.22 Multiscale Models

6.23 Outlook

Acknowledgments

References

Chapter 7: Cheminformatics in a Clinical Setting

7.1 Introduction

7.2 Similarity Analysis Applied to Drug of Abuse/Toxicology Immunoassays

7.3 Similarity Analysis Applied to Therapeutic Drug Monitoring Immunoassays

7.4 Similarity Analysis Applied to Steroid Hormone Immunoassays

7.5 Cheminformatics Applied to “Designer Drugs”

7.6 Relevance to Antibody-Ligand Interactions

7.7 Conclusions and Future Directions

Acknowledgment

References

Part III: Applying Computers to Toxicology Assessment: Environmental and Regulatory Perspectives

Chapter 8: Computational Tools for ADMET Profiling

8.1 Introduction

8.2 Cheminformatics Approaches for ADMET Profiling

8.3 Unsolved Challenges in Structure Based Profiling

8.4 Perspectives

8.5 Conclusions

Acknowledgments

Disclaimer

References

Chapter 9: Computational Toxicology and Reach

9.1 A Theoretical and Historical Introduction to the Evolution Toward Predictive Models

9.2 Reach and the Other Legislations

9.3 Annex XI of Reach for QSAR Models

9.4 The ECHA Guidelines and the Use of QSAR Models within ECHA

9.5 Conclusions

References

Chapter 10: Computational Approaches to Predicting Dermal Absorption of Complex Topical Mixtures

10.1 Introduction

10.2 Principles of Dermal Absorption

10.3 Dermal Mixtures

10.4 Model Systems

10.5 Local Skin Versus Systemic Endpoints

10.6 QSAR Approaches to Model Dermal Absorption

10.7 Pharmacokinetic Models

10.8 Conclusions

References

Part IV: New Technologies for Toxicology, Future Perspectives

Chapter 11: Big Data in Computational Toxicology: Challenges and Opportunities

11.1 Big Data Scenario of Computational Toxicology

11.2 Fast-Growing Chemical Toxicity Data

11.3 The Use of Big Data Approaches in Modern Computational Toxicology

11.4 Challenges of Big Data Research in Computational Toxicology and Relevant Forecasts

References

Chapter 12: HLA-Mediated Adverse Drug Reactions: Challenges and Opportunities for Predictive Molecular Modeling

12.1 Introduction

12.2 Human Leukocyte Antigens

12.3 Structure-Based Molecular Docking to Study HLA-Mediated ADRs

12.4 Perspectives

References

Chapter 13: Open Science Data Repository for Toxicology

13.1 Introduction

13.2 Open Science Data Repository

13.3 Benefits of OSDR

13.4 Technical Details

13.5 Future Work

References

Chapter 14: Developing Next Generation Tools for Computational Toxicology

14.1 Introduction

14.2 Developing Apps for Chemistry

14.3 Green Chemistry

14.4 Polypharma and Assay Central

14.5 Conclusion

Acknowledgments

References

Index

End User License Agreement

Pages

xvii

xviii

xix

xx

xxi

xxii

xxiii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

211

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

Guide

Cover

Table of Contents

Preface

Part I: Computational Methods

Begin Reading

List of Illustrations

Chapter 1: Accessible Machine Learning Approaches for Toxicology

Figure 1.1 Summary of machine learning models generated for

Mycobacterium tuberculosis in vitro

data. This approach has also been applied to ADME/Tox datasets.

Figure 1.2 Example of Bayesian models implemented in MMDS.

Figure 1.3 (a) A two-layer neural network (one hidden layer of four neurons (or units) and one output layer with two neurons), and three inputs. (b) A three-layer neural network with three inputs, two hidden layers of four neurons each and one output layer. In both cases, there are connections (synapses) between neurons across layers, but not within a layer. Source: Adapted from http://cs231n.github.io/neural-networks-1/.

Figure 1.4 Typical frequency of fingerprints occurrence in the 1024-bin compounds in a dataset.

Chapter 2: Quantum Mechanics Approaches in Computational Toxicology

Figure 2.1 Total mean absolute errors (MAEs) recorded for a selection of popular hybrid density functionals (B3LYP–M06HF), double hybrid functionals (B2PLYP–DSD-BLYP) and two

ab initio

methods (HF and MP2), reflecting basic physicochemical properties, reaction energetics, and noncovalent interactions from the GMTKN30 database. Results were adopted from a study by Goerigk and Grimme [3].

Figure 2.2 Mean absolute errors (MAEs) recorded for a selection of five semiempirical methods (AM1, PM6, and OM1-3) and two DFT methods (B3LYP and PBE) from the reduced (HCNO elements only) GMTKN24 database [4]. Performance across relevant subsets of the GMTKN24 database is provided in pattern fill next to the total MAEs.

Figure 2.3 Electrostatic potentials computed for two structurally different inhibitors of acetylcholinesterase computed using UCSF Chimera v1.11.2. Bound structures were obtained using flexible docking in Autodock Vina.

Figure 2.4 (a) Fukui indices,

f

+

, computed for 4α,β-unsaturated carbonyls using Hirshfeld charges at the mPW1PW91/MIDIX+ level of theory. Maxima in the Fukui function are labeled with a black dot and a corresponding value; black circles mark the next highest values. Free energies of activation were calculated with the PM6 semiempirical method in the gas phase. Sensitization potential categories were derived from LLNA EC3% values [31]. (b) The LUMO orbital for

4

.

Figure 2.5 Two-electron oxidation of hydroquinone (HQ) and

t

-butyl hydroquinone (

t

BHQ) to quinones, calculated at the M06-HF/6-31+G(d) level of theory in the gas phase. The solid black line represents energy difference between the HQ and

t

BHQ pathways, each recorded relative to enthalpies of the fully reduced HQ and

t

BHQ, respectively. The gray line represents the difference between the HQ pathway and the energetically less favorable

t

BHQ pathway. Each specie considered in the oxidation process is recorded below the graph with

t

-butyl substituents omitted for clarity. The species resulting from superoxide radical anion generation, phenoxy radical, and quinone are about 2.5 and 2.7 kcal/mol lower in energy in the (major)

t

BHQ than in the HQ pathway.

Figure 2.6 Linear models for free activation energies (a) and free energies of reaction (b) for nucleophilic substitutions of halides, epoxides, and tosylates; Δ

G

and Δ

G

rxn

values were computed in aqueous solution at the M06-2X/6-311 + G(d,p) level of theory; Δ

G

= 1706.38

s

α

− 27.26

EE

− 243.69

S

− 1.76SASA

α

+ 35.72(

S

× SASA

α

) − 4.02;

N

= 15;

R

2

= 0.98; = 0.97; RMS = 0.96. Δ

G

rxn

= 801.01

s

α

− 4.12

μ

+ 8.90SASA

α

+ 2.04(

μ

× SASA

α

) − 70.04;

N

= 15;

R

2

= 0.95; = 0.93; RMS = 0.35.

s

β

= local softness on the α carbon;

EE

= electrostatic solvation energy;

S

= global softness; SASA

α

= surface accessible solvent area on the α carbon;

μ

= chemical potential [41].

Figure 2.7 (a) Active site of MIF (1CA7) with bound HPP in the keto form from QM/MM/MC simulations. (b)Truncated MIF–HPP complex with about 680 water molecules; the ligand is marked in black.

Figure 2.8 (a) Computed 2D free energy map for the H

2

proton transfer (see Figure 2.7). The white dashed line follows the minimum free energy path. (b) Snapshots of the transition state (TS) and the enolate intermediate illustrating relevant electrostatic interactions. The resolution based on a single FEP window is 0.025 Å.

Figure 2.9 Scatter plots of the octanol-water partition and distribution coefficients (log

P

and log

D

) versus HOMO-LUMO gap (Δ

ϵ

) calculated at the mPW1PW91/MIDIX+ level of theory. The 500+ compounds represented are colored by category of concern for acute aquatic toxicity (red = high concern; orange = medium concern; yellow = low concern; green = no concern) based on a 96-h toxicity assay of the fathead minnow [14]. The highlighted upper-left quadrant marks the “safer chemical space” (log

P/D

< 1.7; Δ

ϵ

> 6 eV), which should be targeted in designing new molecules.

Figure 2.10 Benzyloxazole molecule used in the FEP study by Cole

et al.

[54] and Bollini

et al.

[55]. The R group was iteratively modified to optimize binding affinity toward HIV-RT.

Chapter 3: Computational Approaches for Predicting hERG Activity

Figure 3.1 Structural representation of hERG channel generated through homology modeling. This model was generated using the open conformation of the hERG channel (UniProt accession number: Q12809) and the KvAP crystal structure (PDB code: 1ORQ) of

Mus musculus

[110] as template. The model was generated using a similar protocol reported by Farid

et al

. [102]. (a) Tetrameric representation of hERG channel. (b) Dimeric representation of S5 and S6 segments. The residues usually involved in drug interaction are represented by sticks. Each black sphere represents a potassium ion. (

See color plate section for the color representation of this figure

.)

Figure 3.2 Outcome interpretation from the Pred-hERG web app. Binary prediction, multiclass prediction, and predicted probability maps (PPM) extracted from binary models using Morgan fingerprints with 2048 bits. In the PPMs, green atoms or fragments represent contribution toward blockage of hERG, while pink indicate that it contributes to decrease of hERG blockage, and gray means no contribution. Gray isolines delimit the region of split between the positive (green) and the negative (pink) contribution. (

See color plate section for the color representation of this figure

.)

Figure 3.3 Comparison of structural alerts and the Pred-hERG QSAR models for prediction of hERG blockage. (a) Tertiary amines. (b) Aryl chloride. PP, predicted probability. (

See color plate section for the color representation of this figure

.)

Chapter 4: Computational Toxicology for Traditional Chinese Medicine

Figure 4.1 An OAT1 inhibitor pharmacophore model that consists of a negative ionizable feature (F1, red), one hydrophobe (F2, yellow), and a third feature that can be an aromatic center or a hydrophobic centroid (F3, yellow). In addition, six excluded volumes shown as gray spheres were present in this model. A potent OAT1 inhibitor, bumetanide (IC

50

= 6 μM), has been displayed with the model and the atoms are colored by atom type (carbon, gray; nitrogen, blue; oxygen, red; phosphorus, yellow).

Figure 4.2 The distribution of predicted TCM compounds with OAT1 inhibitory activity in medicinal TCMs. The black bars represent TCMs with three or more compounds mapped to the pharmacophore model and the names of these TCMs are listed in the figure.

Figure 4.3 TCM compounds mapping to the OAT1 inhibitor pharmacophore. The pharmacophore consists of a negative ionizable feature (red) and two hydrophobic features (yellow). For clarity, the excluded volumes are not shown here. (a) rhein; (b) aristolochic acid I; (c) salvianolic acid A; (d) lithospermic acid; (e) rosmarinic acid; (f) ferulic acid; (g) sinapinic acid; (h) and isoferulic acid.

Chapter 5: Pharmacophore Models for Toxicology Prediction

Figure 5.1 Visualization of exemplary pharmacophore models with selected chemical features in different modeling programs. Interactions of equilin cocrystallized with 17β-hydroxysteroid dehydrogenase 1 (PDB entry 1equ [5]) are shown. *, LigandScout; #, discovery studio; §, molecular operating system (MOE) [6]. The functionalities are abbreviated as H, hydrophobic; HBD, hydrogen bond donor; HBA, hydrogen bond acceptor; Xvol, exclusion volume.

Figure 5.2 PPARγ with bound tetrabromobisphenol A (PDB entry 3osw). The pharmacophore model derived from this endocrine disruptor–receptor complex can be refined and used for virtual screening for other PPARγ ligands from environmental chemicals. Protein–ligand interactions are color coded according to Figure 5.1.

Figure 5.3 Pharmacophore models for hERG blockers used in a parallel way. Each model covers a different fraction of active compounds, but is restrictive enough not to find a large number of inactive hits. All hit lists together cover the vast majority of active compounds and find less false positive hits compared to one very general model designed to cover most active compounds at once.

Figure 5.4 Protein–ligand interactions determined by X-ray crystallography of exemplary GPCRs. (a) β1-Adrenoceptor in complex with cyanopindolol (PDB entry 4bvn [51]); (b) dopamine D3 receptor in complex with eticlopride (PDB entry 3pbl [52]); (c) histamine H1 receptor in complex with doxepin (PDB entry 3rze [53]); (d) the protein–ligand interactions of all models superimposed onto each other; (e) all example models share two hydrophobic (yellow) features and a positively ionizable nitrogen (blue star). Figure inspired by Klabunde

et al

. [47].

Figure 5.5 11β-HSDs catalyze the interconversion of the active glucocorticoid cortisone and its inactive metabolite cortisol [6].

Figure 5.6 Interconversion of sex hormones and their metabolites catalyzed by 17β-HSDs [6].

Figure 5.7 Structure–activity relationship rationalization of chemical UV filters of the benzophenone class-inhibiting 17β-HSD3. The three hydrogen bond acceptors (red) and two aromatic rings (blue) are essential for bioactivity. Etherification of one of the hydroxyl groups inactivates the compound (arrows). *, Residual activity of the enzyme was measured at a compound concentration of 20 μM [6].

Chapter 6: Transporters in Hepatotoxicity

Figure 6.1 Transporters located in the hepatocyte. The medium grey symbols represent the canalicular transporters and dark grey ones the basolateral transporters. Cycles represent uptake transporters and ellipses refer to efflux transporters. The arrows define the direction of transport.

Figure 6.2 The cycle of bilirubin in the liver. Bilirubin is taken up from sinusoidal blood by OATP1B1 and OATP1B3. It is metabolized by UGT1A1 into mono- and bi-glucuronidated products that are exported into bile primarily by MRP2 and in smaller extent (smaller arrow) by BCRP. A portion of the glucuronidated or unglucuronidated bilirubin is effluxed into sinusoidal blood by MRP4 and the cycle is repeated.

Chapter 7: Cheminformatics in a Clinical Setting

Figure 7.1 Illustration of structural similarity. Using phencyclidine (PCP) as the target compound, 2D similarity to five different compounds was calculated using MDL public keys and the Tanimoto coefficient; three of these (dextromethorphan, chlorpromazine and tramadol) have been reported to cross-react with at least some marketed PCP immunoassays, and the other two (ketamine and ibuprofen) have not been reported to cross-react with PCP screening assays. PCP has the highest similarity (in descending order) to dextromethorphan, chlorpromazine, and tramadol. PCP has low structural similarity to ketamine (despite having similar pharmacological properties to PCP) and essentially no structural similarity to ibuprofen.

Figure 7.2 Similarity of drugs and drug metabolites relative to the target compounds for four broadly specific DOA/Tox assays. Cross-reactivity data for four DOA/Tox assays were sorted into six categories. The similarity (using MDL public keys and the Tanimoto coefficient) of each tested compound to the target compound of the DOA/Tox assay was plotted. (a) Amphetamine assays (using

d

-amphetamine as the target). (b) Barbiturate assays (using secobarbital as the target compound). (c) Benzodiazepine assays (using diazepam as the target compound). (d) TCA assays (using desipramine as the target compound). (

Figure 7.3 Similarity of drugs and drug metabolites relative to the target compounds for four DOA/Tox assays. Cross-reactivity data for four DOA/Tox assays were sorted into six categories. The similarity (using MDL public keys and the Tanimoto coefficient) of each tested compound to the target compound of the DOA/Tox assay of the DOA/Tox assay was plotted. (a) Cannabinoid assays (using 9-carboxy-11-nor-Δ

9

-tetrahydrocannabinol as the target compound). (b) Cocaine metabolite (benzoylecgonine) assays. (c) Opiate assays (using morphine as the target compound). (d) Phencyclidine assays.

Figure 7.4 Similarity of drugs and drug metabolites relative to the target compounds for four TDM assays. Cross-reactivity data for four TDM assays were sorted into three categories. The similarity (using MDL public keys and the Tanimoto coefficient) of each tested compound to the target compound of each assay was plotted. (a) Cyclosporine assays. (b) Lamotrigine assays. (c) Theophylline assays. (d) Valproic acid assays.

Figure 7.5 Cortisol immunoassay cross-reactivity and similarity predictions. (a) The plot shows the cortisol reference range for adults (highlighted in yellow) in comparison to the predicted apparent cortisol concentrations produced on the Roche Elecsys Cortisol assay by 6-methylprednisolone, prednisolone, 21-deoxycortisol (healthy controls and patients with 21-hydroxylase deficiency), and 11-deoxycortisol (healthy controls, patients with 11β-hydroxylase deficiency, and following metyrapone challenge). Table 1 contains the concentration ranges and percentage of cross-reactivity values from which the estimated apparent cortisol concentrations are derived. (b) Two-dimensional similarity of compounds to cortisol is shown, sorted by degree of cross-reactivity in the Roche Cortisol assay (horizontal line in each column indicates average similarity within that group). Similarity values vary from 0 to 1, with 1 being maximally similar. The compounds are subdivided into categories of strong cross-reactivity (5% or greater, black circles), weak cross-reactivity (0.5-4.9%, red squares), very weak cross-reactivity (0.05-0.49%, blue triangles), and no cross-reactivity (<0.05%, green diamonds) to the Roche Cortisol assay.

Figure 7.6 Testosterone immunoassay cross-reactivity and similarity predictions. (a) The plot shows the testosterone reference range for males and females (highlighted in yellow) in comparison to the predicted apparent testosterone concentrations produced on the Roche Elecsys Testosterone II assay by methyltestosterone, norethindrone, nandrolone, and androstenedione (healthy controls and patients with 21-hydroxylase deficiency). Table 5 contains the concentration ranges and percent cross-reactivity values from which the estimated apparent testosterone concentrations are derived. (b) Two-dimensional similarity of compounds to testosterone is shown, sorted by degree of cross-reactivity in the Roche Testosterone II assay (horizontal line in each column indicates average similarity within that group). Similarity values vary from 0 to 1, with 1 being maximally similar. The compounds are subdivided into categories of strong cross-reactivity (5% or greater, black circles), weak cross-reactivity (0.5-4.9%, red squares), very weak cross-reactivity (0.05-0.49%, blue triangles), and no cross-reactivity (<0.05%, green diamonds) to the Roche Testosterone II assay.

Figure 7.7 Representative chemical structures of amphetamine-like drugs. Abbreviations: MDMA, 3,4-methylenedioxy-

N

-methamphetamine; MDPV, 3,4-methylenedioxypyrovalerone; MDPBP, 3′,4′-methylenedioxy-α-pyrrolidinobutiophenone.

Figure 7.8 Representative chemical structures of cannabinoids.

Figure 7.9 Similarity comparisons of cannabinoids. Test compounds are divided into broad categories (JWH series, AM/UR/RCS/XLR series, other synthetic cannabinoids not possessing the chemical backbone of THC, cannabinoids sharing chemical backbone of THC, endogenous eicosanoid cannabinoids, and non-cannabinoids). (a) 2D similarity of the

N

-pentanoic acid metabolite of JWH-018 to 168 other compounds. (b) 2D similarity of the

N

-butanoic acid metabolite of JWH-073 to 168 other compounds. (c) 2D similarity of the

N

-4-hydroxy metabolite of JWH-250 to 168 other compounds. (d) 2D similarity of 9-carboxy-THC to 168 other compounds.

Chapter 8: Computational Tools for ADMET Profiling

Figure 8.1 Predictive QSAR modeling workflow [38].

Figure 8.2 Chemical and biological similarities do not correlate. Each point represents a pair of compounds characterized by pairwise chemical similarity (

y

-axis) versus biological similarity (

x

-axis). Examples of

a priori

outliers that should be flagged and analyzed separately are shown.

Figure 8.3 Strategies for utilizing diverse data streams for predicting higher-order biological effects.

Figure 8.4 Two-step hierarchical

k

-nearest neighbor (

k

NN) QSAR workflow to develop an enhanced rat acute toxicity (LD

50

) model by using cytotoxicity data (IC

50

) as biological profile descriptors.

Figure 8.5 Main window of the CBRA program divided into three parts: (a) selection of input files, (b) colored radial plot, (c) molecular structure viewer, textual information, and options.

Figure 8.6 Graphical representation of a local neighborhood for Benoxacor [65].

Chapter 9: Computational Toxicology and Reach

Figure 9.1 The first page of the output of the CAESAR model for the target compound.

Figure 9.2 The page with the confidence interval.

Figure 9.3 The list of the sixth most similar compounds.

Figure 9.4 The applicability domain index and its components.

Figure 9.5 The chart with the

M

log

P

and logBCF plots.

Figure 9.6 The panel to access information about the models.

Figure 9.7 Estimations provided by the four individual models plus the consensus for a chemical used as example (ethyl 2-bromobutanoate).

Figure 9.8 The six most similar compounds to the target present in the original dataset and their respective observed and predicted values.

Figure 9.9 The SA8 and the three most similar compounds of the training set with the same SA.

Figure 9.10 The SA SM93 and the three most similar compounds of the training set with the same SA.

Chapter 10: Computational Approaches to Predicting Dermal Absorption of Complex Topical Mixtures

Figure 10.1 Light micrograph of normal human skin. SC, stratum corneum; E, epidermis; D, Dermis. —— = 50 µm.

Figure 10.2 Relationship between anatomical regions in skin where chemical mixture modulation of absorption can occur, physicochemical processes involved, and type of modeling employed.

Figure 10.3 Relationship between a QSAR model without and with a mixture factor component included. Note that the overall slope of the QSAR is dependent upon the penetrant's chemical properties, while mixture component effects result in columns along the penetrant property unless mixture components based on the mixture properties are also included.

Figure 10.4 QSPR model fit of porcine skin diffusion cell data using diffusivity (

D

) and partition (

K

) coefficients estimated by a random process dermatokinetic model [59].

Figure 10.5 QSPR model fit of permeability coefficients obtained from a dermatokinetic model of

in situ

perfused porcine skin flap [60].

Figure 10.6 Compartmental pharmacokinetic model linking skin absorption determined in an

in vitro

model to a systemic model to predict plasma concentration time profiles

in vivo

.

Figure 10.7 A comparison model fits with (Model 2006) and without (Model 1005) incorporation of a random process to account for transient changes in diffusivity [59].

Chapter 11: Big Data in Computational Toxicology: Challenges and Opportunities

Figure 11.1 The “four V's” of big data can be used to describe the properties of these fast growing chemical toxicity data.

Figure 11.2 Increase in the number of compounds and bioassays recorded in PubChem within eight years (from September 2008 to September 2015).

Chapter 12: HLA-Mediated Adverse Drug Reactions: Challenges and Opportunities for Predictive Molecular Modeling

Figure 12.1 Structure of HLA-variants Class I and Class II.

Figure 12.2 HLA-drug binding mechanism adapted from Illing

et al.

[48] for T-cell activation (a) Altered repertoire (non-covalent). (B) p. i. complex (non-covalent). (C) Hapten complex (non-covalent). The non-covalent T-cell interaction is not shown.

Figure 12.3 Alternative p. i. complex CD8

+

T-cell signaling pathway for carbamazepine binding with HLA-B*15:02.

Figure 12.4 Schematic for using molecular docking to perform virtual screening at HLA-B*57:01 variant.

Figure 12.5 (a) Self-docking of abacavir alignment, (b) binding modes of abacavir, (c), (d) docking results.

Figure 12.6 Chemical structures of the 14 drugs used to construct a virtual screening molecular docking model.

Figure 12.7 Heat map of DS for full set. Green spaces represent the most favorable docking scores (DS < −7 kcal/mol), while spaces transition from orange to red represent drugs that have nonfavorable interactions with HLA-B*57:01 (DS > −7 kcal/mol). White spaces indicate that GLIDE was unable to identify a best binding mode between drug and HLA-B*57:01.

Figure 12.8 Heat map of eM scores for the full test set. Green spaces represent the most favorable docking scores (eM < −50 kcal/mol) while spaces transition from yellow to red represent drugs that have nonfavorable interactions with HLA-B*57:01 (eM > −50 kcal/mol). White spaces indicate that GLIDE was unable to identify a best binding mode between drug and HLA-B*57:01.

Figure 12.9 Full docking summary combining SP and XP results. Green represents compounds that passed both DS (DS < −7 kcal/mol) and eM (eM < −50 kcal/mol) thresholds for SP and XP scoring functions; yellow represents compounds that passed the thresholds for XP but failed using SP; orange represents compounds that failed the XP thresholds but passed SP; and red represents the compounds that failed the thresholds for both XP and SP scoring functions.

Figure 12.10 Population distribution by percentage for three select HLA-variants: HLA-A*31:01, HLA-B*15:02, and HLA-B*57:01. The ethnicities studied were African-American (

n

= 251), Caucasian (

n

= 265), Hispanic (

n

= 234), North American Natives (

n

= 187), and Asians (

n

= 358) from the United States. Please refer to Cao

et al.

2001 [73] for further details.

Chapter 13: Open Science Data Repository for Toxicology

Figure 13.1 (1) Examples of the OSDR prototype to date showing bidirectional integration with various cloud drives allows seamless data transfers between cloud storage and OSDR; (2) web user interface also allows intuitive data deposition using drag and drop; (3) concise filter system provides a convenient way of navigating information stored or indexed in OSDR; (4) hierarchical presentation of information allows one to arrange the data based on organization or research structure; (5) standard CMS (content management system) operations are supported; (6) various view modes allow representing complex information in a visual and concise manner; (7) user interface based on modern web frameworks provides an excellent user experience.

Figure 13.2 Examples of the OSDR prototype to date showing OSDR tabular data entry. Mapping columns from a CSV file (1) to their semantic meaning (2) allows to resolve entries in real-time into a set of public database identifiers (3, ChemSpider, ChEMBL, PubChem), create a chemical structure from provided information (4), and calculate conversion confidence value based on a set of mappings (e.g., chemical name, InChI, SMILES).

Figure 13.4 Examples of the OSDR prototype to date. Built in preview mode showing different file types (1, word; 2, excel; 3, powerpoint; 4, PDF).

Figure 13.5 OSDR microservice overview.

Figure 13.3 Examples of the OSDR prototype to date. (a) Document browse mode with thumbnail previews. (b) Document view mode with a larger preview and other information arranged into infoboxes.

Figure 13.6 Logical architecture of OSDR with cheminformatics and machine learning modules.

Figure 13.7 OSDR microservice-oriented architecture.

Figure 13.8 OSDR development workflow.

Chapter 14: Developing Next Generation Tools for Computational Toxicology

Figure 14.1 Screenshots of the Mobile Molecular DataSheet.

Figure 14.2 The Green Solvents app. (a) Molecule overview. (b) Molecule details list scores (good = 1, bad = 10) for safety, health, flammability, environment, waste, reactivity, and lifecycle criteria. (

See color plate section for the color representation of this figure

.)

Figure 14.3 Examples of green reactions from the Green Lab Notebook app.

Figure 14.4 Example of preliminary work. (a) Highlighting molecules using Bayesian models for various ADME/Tox properties. (b) Clustering molecules using fingerprint descriptors. (

See color plate section for the color representation of this figure

.)

Figure 14.5 Visualization of data cut-offs, ROC plots, and active and inactive molecules for hERG

Ki

data from ChEMBL.

Figure 14.6 Preliminary work using open datasets and computed models for (a) EPA Tox21 data used to make predictions that are visualized in the PolyPharma mobile app. (b) Novel visualization and prediction methods in PolyPharma showing atom highlighting for each model and clustering. http://itunes.apple.com/app/polypharma/id1025327772. (

See color plate section for the color representation of this figure

.)

Figure 14.7 Assay Central schematic.

List of Tables

Chapter 1: Accessible Machine Learning Approaches for Toxicology

Table 1.1 Comparison of machine learning methods using FCFP6 1024 bit descriptors on ADME/Tox properties using fivefold cross-validation ROC values

Table 1.2 Comparison of machine learning methods using FCFP6 1024-bit descriptors on ADME/Tox properties using fivefold cross-validation F1 values at

p

= 0.5

Chapter 2: Quantum Mechanics Approaches in Computational Toxicology

Table 2.1 Determining selected physicochemical properties for a clinical prodrug (CAS 623152-11-4) from an MC simulation versus a simple geometry optimization using the AM1 semiempirical method

Table 2.2 Examples of global electronic parameters calculated from frontier molecular orbitals

Table 2.3 Global electronic parameters for 3-penten-2-one, propargyl acrylate, allyl acrylate, and methyl acrylate derived from HOMO and LUMO energies at the mPW1PW91/MIDIX + level of theory

Table 2.4 Local electronic parameters derived from density functional theory

Table 2.5 Hydrogen bonding (HB) as an inverse metric of human skin permeability

Table 2.6 Reaction barriers correlated to GSH-binding rate constants for methylacrolein, 3-penten-2-one, allyl acrylate, and ethyl crotonate

Table 2.7 Reaction energetics as a predictor of mutagenicity potentials for 2,2-difluorooxirane, 2,2-dichlorooxirane, 2,3-dichlorooxirane, and 2,2,3-trichlorooxirane

Table 2.8 Reaction energetics computed for three benzyl halide skin sensitizers at the M06-2X/6-311 + G(d, p) level in gas phase and in aqueous solution

Chapter 3: Computational Approaches for Predicting hERG Activity

Table 3.1 QSAR studies for predicting hERG blockage, published during the period 2014-2016

Chapter 4: Computational Toxicology for Traditional Chinese Medicine

Table 4.1 Representative molecules used for OAT1 inhibitor pharmacophore model generation and validation

Table 4.2 Example TCM compounds with experimental information about interactions with OAT1.

Table 4.3 Structurally similar TCM compounds without experimental validation

Chapter 5: Pharmacophore Models for Toxicology Prediction

Table 5.1 Experimentally validated pharmacophore models for 17β-HSD inhibitors

Chapter 6: Transporters in Hepatotoxicity

Table 6.1 Summary of the best-performing models for transporters

Chapter 9: Computational Toxicology and Reach

Table 9.1 The main differences between old models and models for regulatory purposes

Chapter 10: Computational Approaches to Predicting Dermal Absorption of Complex Topical Mixtures

Table 10.1 Experimental variables that should be controlled or documented when conducting dermal absorption studies

Chapter 11: Big Data in Computational Toxicology: Challenges and Opportunities

Table 11.1 Public available toxicity data resources (as of 10/22/2016)

Table 11.2 Twenty human toxicants with their relevant PubChem bioassay responses

Chapter 12: HLA-Mediated Adverse Drug Reactions: Challenges and Opportunities for Predictive Molecular Modeling

Table 12.1 List of drug-HLA associations with their reported odds ratios

Chapter 14: Developing Next Generation Tools for Computational Toxicology

Table 14.1 Mobile apps for chemistry developed by Molecular Materials Informatics, Inc

Wiley Series on Technologies for the Pharmaceutical Industry Sean Ekins, Series Editor

 

Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals

Edited by Sean Ekins

 

Pharmaceutical Applications of Raman Spectroscopy

Edited by Slobodan Šašić

 

Pathway Analysis for Drug Discovery: Computational Infrastructure and Applications

Edited by Anton Yuryev

 

Drug Efficacy, Safety, and Biologics Discovery: Enmerging Technologies and Tools

Edited by Sean Ekins and Jinghai J. Xu

 

The Engines of Hippocrates: From the Dawn of Medicine to Medical and Pharmaceutical Informatics

Barry Robson and O.K. Baek

 

Pharmaceutical Data Mining: Applications for Drug Discovery

Edited by Konstantin V. Balakin

 

The Agile Approach to Adaptive research: Optimizing Efficiency in Clinical Development

Michael J. Rosenberg

 

Pharmaceutical and Biomedical Project Management in a Changing Global Environment

Edited by Scott D. Babler

 

Systems Biology in Drug Discovery and Development

Edited by Daniel L. Young and Seth Michelson

 

Collaborative Computational Technologies for Biomedical Research

Edited by Sean Ekins, Maggie A.Z. Hupcey and Antony J. William

 

Predictive Approaches in Drug Discovery and Development: Biomarkers and In Vitro/ In Vivo correlations

Edited by J. Andrew Williams, Richard Lalonde, Jeffrey Koup and David D. Christ

 

Collaborative Innovation in Drug Discovery, Strategies for Public and Private Partnerships

Edited by Rathnam Chaguturu

 

Computational Toxicology: Risk Assessment for Chemicals

Edited by Sean Ekins

Computational Toxicology

Risk Assessment for Chemicals

 

Edited by

Sean Ekins

Collaborations Pharmaceuticals, Inc. Raleigh, USA

 

 

 

 

 

This edition first published 2018

© 2018 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Sean Ekins to be identified as the Editor in this work has been asserted in accordance with law.

Registered Office

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office

111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data:

Names: Ekins, Sean, editor.

Title: Computational toxicology : risk assessment for chemicals / edited by Sean Ekins.

Description: First edition. | Hoboken, NJ : John Wiley & Sons, 2018. | Series: Wiley series on technologies for the pharmaceutical industry | Includes bibliographical references and index. |

Identifiers: LCCN 2017037714 (print) | LCCN 2017046458 (ebook) | ISBN 9781119282570 (pdf) | ISBN 9781119282587 (epub) | ISBN 9781119282563 (cloth)

Subjects: LCSH: Toxicology-Mathematical models. | Toxicology-Computer simulation. | QSAR (Biochemistry)

Classification: LCC RA1199.4.M37 (ebook) | LCC RA1199.4.M37 C66 2018 (print) | DDC 615.90285-dc23

LC record available at https://lccn.loc.gov/2017037714

Cover Design: Wiley

Cover Images: (Front cover) Courtesy of Daniela Schuster;

(Author photo) Courtesy of Sean Ekins

I should have no objection to go over the same life from its beginning to the end: requesting only the advantage authors have, of correcting in a second edition the faults of the first.

Benjamin Franklin

To my family and collaborators.

List of Contributors

Ni Ai

Pharmaceutical Informatics Institute

College of Pharmaceutical Sciences

Zhejiang University

Hangzhou

Zhejiang, PR

China

 

Vinicius M. Alves

LabMol – Laboratory for Molecular Modeling and Design, Faculty of Pharmacy

Federal University of Goias

Goiania, GO

Brazil

 

Carolina Horta Andrade

LabMol – Laboratory for Molecular Modeling and Design, Faculty of Pharmacy

Federal University of Goias

Goiania, GO

Brazil

 

Rodolpho C. Braga

LabMol – Laboratory for Molecular Modeling and Design, Faculty of Pharmacy

Federal University of Goias

Goiania, GO

Brazil

 

Jason Chittenden

Center for Chemical Toxicology Research and Pharmacokinetics Biomathematics Program

North Carolina State University

Raleigh, NC

USA

 

Alex M. Clark

Molecular Materials Informatics, Inc.

Montreal, Quebec

Canada

 

Daniela Digles

Department of Pharmaceutical Chemistry

University of Vienna

Wien

Austria

 

George van Den Driessche

Department of Chemistry

Bioinformatics Research Center

North Carolina State University

Raleigh, NC

USA

 

Gerhard F. Ecker

Department of Pharmaceutical Chemistry

University of Vienna

Wien

Austria

 

Sean Ekins

Collaborations Pharmaceuticals, Inc.

Raleigh, NC

USA

 

Emilio Benfenati

IRCCS – Istituto di Ricerche Farmacologiche “Mario Negri”

Laboratory of Environmental Chemistry and Toxicology

Milan

Italy

 

Xiaohui Fan

Pharmaceutical Informatics Institute

College of Pharmaceutical Sciences

Zhejiang University

Hangzhou

Zhejiang, PR

China

 

Denis Fourches

Department of Chemistry

Bioinformatics Research Center

North Carolina State University

Raleigh, NC

USA

 

Joel S. Freundlich

Department of Pharmacology & Physiology

New Jersey Medical School

Rutgers University

Newark, NJ

USA

and

Division of Infectious Disease

Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens

New Jersey Medical School, Rutgers University

Newark, NJ

USA

 

Chris Grulke

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Sankalp Jain

Department of Pharmaceutical Chemistry

University of Vienna

Wien

Austria

 

Alexandru Korotcov

Gaithersburg, MD

USA

 

Jakub Kostal

Chemistry Department

The George Washington University

Washington DC

USA

 

Eleni Kotsampasakou

Department of Pharmaceutical Chemistry

University of Vienna

Wien

Austria

 

Matthew D. Krasowski

Department of Pathology

University of Iowa Hospitals and Clinics

Iowa City, IA

USA

 

Mary A. Lingerfelt

Collaborations Pharmaceuticals, Inc.

Raleigh, NC

USA

 

Anna Lombardo

IRCCS – Istituto di Ricerche Farmacologiche “Mario Negri”

Laboratory of Environmental Chemistry and Toxicology

Milan

Italy

 

Grace Patlewicz

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Alexander L. Perryman

Department of Pharmacology & Physiology

New Jersey Medical School

Rutgers University

Newark, NJ

USA

 

Ann Richard

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Jim E. Riviere

Center for Chemical Toxicology Research and Pharmacokinetics Biomathematics Program

North Carolina State University

Raleigh, NC

USA

 

Alessandra Roncaglioni

IRCCS – Istituto di Ricerche Farmacologiche “Mario Negri”

Laboratory of Environmental Chemistry and Toxicology

Milan

Italy

 

Daniela Schuster

Institute of Pharmacy/Pharmaceutical Chemistry

University of Innsbruck

Innsbruck

Austria

 

Imran Shah

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Valery Tkachenko

Rockville, MD

USA

 

Alexander Tropsha

UNC Eshelman School of Pharmacy

University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

 

John Wambaugh

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Antony J. Williams

National Center for Computational Toxicology, Office of Research and Development

U.S. Environmental Protection Agency

Research Triangle Park

Durham, NC

USA

 

Richard Zakharov

Rockville, MD

USA

 

Linlin Zhao

Center for Computational and Integrative Biology

Rutgers University

Camden, NJ

USA

 

Hao Zhu

Center for Computational and Integrative Biology

Rutgers University

Camden, NJ

USA

and

Department of Chemistry

Rutgers University

Camden, NJ

USA

 

Kimberley M. Zorn

Collaborations Pharmaceuticals, Inc.

Raleigh, NC

USA

Preface

Since the publication of Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals in 2007 a lot has happened both in the career of the editor and in science in general. For one, my focus has expanded towards many computational applications to drug discovery rather than solely focused on ADME/Tox. I have also garnered new collaborators some of whom have very graciously agreed to contribute to this volume. Science is changing. Publishing may be adjusting slowly too. This book will likely be read as much on mobile devices or computers as in physical hard copies. Computational toxicology has also evolved in the past decade with the dramatic increase in public data availability. There have also been a number of more collaborative projects in Europe around toxicology (e.g. e-Tox and OpenTox), in addition we have seen a growth in open computational tools and model sharing (QSAR toolbox, Chembench, CDD, Bioclipse etc.). Groups like the EPA have developed and expanded ToxCast which represents a valuable resource for toxicology modeling. We are now therefore in the age of truly Big Data compared with a decade ago and there have been several efforts to combine different types of data for toxicology. To round this off, the growth in nanotechnology has seen the emergence of computational nanotoxicology which would not have been predicted my earlier book.

This book is therefore aimed at this next generation of computational toxicology scientist, comprehensively discussing the state-of-the-art of currently available molecular-modelling tools and the role of these in testing strategies for different types of toxicity. The overall role of these computational approaches in addressing environmental and occupational toxicity is also covered. These chapters before you aim to describe topics in an accessible manner especially for those who are not experts in the field. My goal with this book was to not cover too much of the same ground as the earlier book because much of what we published then is still generally valid, but to make the book focused on newer topics. I hope this book also serves to introduce some of the younger scientists from around the world who will likely drive this next generation of computational toxicology for many years to come. Finally, I hope this book inspires scientists to pursue computational toxicology so that it continues to expand across different industries from pharmaceutical to consumer products and its importance increases, as it has over the past decade.

November 12, 2017

Sean Ekins Fuquay Varina, NC, USA

Acknowledgments

I am extremely grateful to Jonathan Rose and colleagues at Wiley for their assistance and considerable patience. My proposal reviewers are gratefully acknowledged for their many suggestions which helped shape this.

I would like to acknowledge my many collaborators over the years whose work in some cases has been mentioned here. In particular, Dr Joel S. Freundlich, Dr Antony J. Williams, Dr Alex M. Clark, Dr Matthew D. Krasowski, Dr Carolina H. Andrade, and many others. I am also grateful for the support of SC Johnson who have kept me challenged and engaged with new applications for computational toxicology over the years. I would also like to acknowledge Dr Daniela Schuster for the kind use of her graphic for the book cover.

This book would not have been possible without the support of Dr Maggie A.Z. Hupcey and my family who have tolerated late nights, and frequent disappearances to the library to write over the holidays.

Part IComputational Methods

Chapter 1Accessible Machine Learning Approaches for Toxicology

Sean Ekins1, Alex M. Clark2, Alexander L. Perryman3, Joel S. Freundlich3,4, Alexandru Korotcov5 and Valery Tkachenko6

1Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA

2Molecular Materials Informatics, Inc., Montreal, Quebec, Canada

3Department of Pharmacology & Physiology, New Jersey Medical School, Rutgers University, Newark, NJ, USA

4Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, New Jersey Medical School, Rutgers University, Newark, NJ, USA

5Gaithersburg, MD, USA

6Rockville, MD, USA

Chapter Menu

IntroductionBayesian ModelsDeep Learning ModelsComparison of Different Machine Learning MethodsFuture Work

1.1 Introduction

Computational approaches have in recent years played an increasingly important role in the drug discovery process within large pharmaceutical firms. Virtual screening of compounds using ligand-based and structure-based methods to predict potency enables more efficient utilization of high throughput screening (HTS) resources, by enriching the set of compounds physically screened with those more likely to yield hits [1–4]. Computation of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties exploiting statistical techniques greatly reduces the number of expensive assays that must be performed, now making it practical to consider these factors very early in the discovery process to minimize late-stage failures of potent lead compounds that are not drug-like [5–11]. Large pharma have successfully integrated these in silico methods into operational practice, validated them, and then realized their benefits, because these firms have (i) expensive commercial software to build models, (ii) large, diverse proprietary datasets based on consistent experimental protocols to train and test the models, and (iii) staff with extensive computational and medicinal chemistry expertise to run the models and interpret the results. Drug discovery efforts centered in universities, foundations, government laboratories, and small biotechnology companies, however, generally lack these three critical resources and, as a result, have yet to exploit the full benefits of in silico methods. For close to a decade, we have aimed to used machine learning approaches and have evaluated how we could circumvent these limitations so that others can benefit from current and emerging best industry practices.

The current practice in pharma is to integrate in silico predictions into a combined workflow together with in vitro assays to find “hits” that can then be reconfirmed and optimized [12]. The incremental cost of a virtual screen is minimal, and the savings compared with a physical screen are magnified if the compound would also need to be synthesized rather than purchased from a vendor. Imagine if the blind hit rate against some library is 1%, and the in silico model can pre-filter the library to give an experimental hit rate of 2%, then significant resources are freed up to focus on other promising regions of chemical property space [13]. Our past pharmaceuticals collaborations [14, 15] have suggested that computational approaches are critical to making drug discovery more efficient.

The relatively high cost of in vivo and in vitro screening of ADME and toxicity properties of molecules has motivated our efforts to develop in silico methods to filter and select a subset of compounds for testing. By relying on very large, internally consistent datasets, large pharma has succeeded in developing highly predictive proprietary models [5–8]. At Pfizer (and probably other companies), for example, many of these models (e.g., those that predict the volume of distribution, aqueous kinetic solubility, acid dissociation constant, and distribution coefficient) [5–8, 16] are believed (according to discussions with scientists) to be so accurate that they have essentially put experimental assays out of business. In most other cases, large pharma perform experimental assays for a small fraction of compounds of interest to augment or validate their computational models. Efforts by smaller pharma and academia have not been as successful, largely because they have, by necessity, drawn upon much smaller datasets and, in a few cases, tried to combine them [11, 17–22]. However, this is changing rapidly, and public datasets in PubChem, ChEMBL, Collaborative Drug Discovery (CDD) and elsewhere are becoming available for ADME/Tox properties. For example, the CDD public database has >100 public datasets that can be used to generate community-based models, including extensive neglected infectious disease structure–activity relationship (SAR) datasets (malaria, tuberculosis, Chagas disease, etc.), and ADMEdata.com datasets that are broadly applicable to many projects. Recent efforts with them have led to a platform that enables drug discovery projects to benefit from open source machine learning algorithms and descriptors in a secure environment, which allows models to be shared with collaborators or made accessible to the community.

In the area of pharmaceutical research and development and specifically that of cheminformatics, there are many machine learning methods, such as support vector machines (SVM), k-nearest neighbors, naïve Bayesian, and decision trees, [23] which have seen increasing use as our datasets, have grown to become “big data” [24–27]. These methods [23] can be used for binary classification, multiple classes, or continuous data. In more recent years, the biological data amassed from HTS and high content screens has called for different tools to be used that can account for some of the issues with this bigger data [26]. Many of these resulting machine learning models can also be implemented on a mobile phone [28, 29].

1.2 Bayesian Models

Our machine learning experience over a decade [14, 30–46] has focused on Bayesian approaches (Figure 1.1). Bayesian models classify data as active or inactive on the basis of user-defined thresholds using a simple probabilistic classification model based on Bayes' theorem. We initially used the Bayesian modeling software within the Pipeline Pilot and Discovery Studio (BIOVIA) with many ADME/Tox and drug discovery datasets. Most of these models have used molecular function class fingerprints of maximum diameter 6 and several other simple descriptors [47, 48]. The models were internally validated through the generation of receiver operator characteristic (ROC) plots. We have also compared single- and dual-event Bayesian models utilizing published screening data [49, 50]. As an example, the single-event models use only whole-cell antitubercular activity, either at a single compound concentration or as a dose–response IC50 or IC90 (amount of compound inhibiting 50% or 90% of growth, respectively), while the dual-event models also use a selectivity index (SI = CC50/IC90, where CC50 is the compound concentration that is cytotoxic and inhibits 50% of the growth of Vero cells). While single-event models [13, 51, 52] are widely published, dual-event models [53] attempt to predict active compounds with acceptable relative activity against the pathogen (in this case, Mtb), versus the model mammalian cell line (e.g., Vero cells). Our models identified 4–10 times more active compounds than random screening did and the models also had relatively high hit rates, for example, 14% [54], 71% (Figure 1.1) [53], or intermediate [55] for Mtb. Recent machine learning work on Chagas disease has identified in vivo active compounds [56], one of which is an approved antimalarial in Europe. Most recently, we have been actively constructing Bayesian models for ADME properties such as aqueous solubility, mouse liver microsomal stability [57], and Caco-2 cell permeability [30], which complement our earlier ADME/Tox machine learning work [13, 52, 58–64]. We have also summarized the application of these methods to toxicology datasets [58] and transporters [34, 59, 62, 63, 65–67]. This has led to models with generally good to acceptable ROC scores > 0.7 [30]. Open source implementation of the ECFP6/FCFP6 fingerprints [28] and Bayesian model building module [25, 30] has also enabled their use in new software implementations (see later). We are keen to explore machine learning algorithms and make them accessible for seeding drug discovery projects, as we have demonstrated.

Figure 1.1 Summary of machine learning models generated for Mycobacterium tuberculosis in vitro data. This approach has also been applied to ADME/Tox datasets.

1.2.1 CDD Models

ADME properties have been modeled by us with collaborators [30] and others using an array of machine learning algorithms, such as SVMs [68], Bayesian modeling [69], Gaussian processes [70], or others [71]. A major challenge remains the ability to share such models. CDD has developed and marketed a robust, innovative commercial software platform that enables scientists to archive, mine, and (optionally) share SAR, ADME/Tox, and other types of preclinical research data [72]. CDD hosts the software and customers' data vaults on its secure servers. CDD collaborated with computational chemists at Pfizer in a proof of concept study. This demonstrated that models constructed with open descriptors and keys (chemical development kit, CDK + SMARTS) using open software (C5.0 - once built, models can be made open) performed essentially identically to expensive proprietary descriptors and models (MOE2D + SMARTS + Rulequest's Cubist) across all metrics of performance when evaluated on multiple Pfizer-proprietary ADME datasets: human liver microsomal (HLM) stability, RRCK passive permeability, P-gp efflux, and aqueous solubility [14]. Pfizer's HLM dataset, for example, contained more than 230,000 compounds and covered a diverse range of chemistry, as well as many therapeutic areas. The HLM dataset was split into a training set (80%) and a test set (20%) using the venetian blind splitting method; in addition, a newly screened set of 2310 compounds was evaluated as a blind dataset. All the key metrics of model performance - for example, R2, root-mean-square error (RMSE), kappa, sensitivity, specificity, positive predictive value (PPV) - were nearly identical for the open source approach versus the proprietary software (e.g., PPV of 0.80 vs 0.82). The open source approach even computed slightly faster (0.2 vs 0.3 s/compound). All the datasets studied yielded the same conclusion, that is, models built with open descriptors and models are as predictive as the commercial tools [14].

This result is an important prerequisite for a goal of creating a machine learning model exchange platform that can be deployed without requiring licenses for other software or algorithms, which would otherwise make it too expensive to achieve widespread adoption [73, 74]. This preliminary study did not directly address the issue of whether the descriptors mask the underlying data sufficiently well that structure identities cannot be reverse-engineered, but others have begun to assess this question with respect to an array of molecular descriptor types [75] and open source descriptors and models could be used in any other software (GLP license).

Compared to the large datasets available in pharma, there are few that are freely available. Jean Claude Bradley, Andrew Lang, and Antony Williams have, however, provided a curated dataset of melting points for the community using several open data sources, which was then used for modeling. A training set comprising 2205 compounds and a test set of 500 compounds with doubly validated melting points were used with 132 Open CDK [76] descriptors and the RandomForest package (v4.5-34) in R. The resulting RandomForest model had an RMSE of 40.9 °C and an R2 value of 0.82 when used to predict the test set. We then compared these results to what could be obtained in the commercial SAS JMP (v8.0.1, SAS, Cary, NC) and Discovery Studio (v2.5.5. San Diego, CA). A neural network model in SAS had an RMSE of 48.5 °C and an R2 value of 0.75. In comparison, a backpropagation neural network model in Discovery Studio had an RMSE of 40.8 °C and an R2 value of 0.83 for the same test set. These melting point models are all superior to 17 models identified in 10 papers between 2003 and 2011 using commercial and other tools [77]. The results also suggested that open descriptors and algorithms can produce models that are comparable to those generated with commercial tools.