Applied Chemoinformatics -  - E-Book

Applied Chemoinformatics E-Book

0,0
120,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Edited by world-famous pioneers in chemoinformatics, this is a clearly structured and applications-oriented approach to the topic, providing up-to-date and focused information on the wide range of applications in this exciting field.
The authors explain methods and software tools, such that the reader will not only learn the basics but also how to use the different software packages available. Experts describe applications in such different fields as structure-spectra correlations, virtual screening, prediction of active sites, library design, the prediction of the properties of chemicals, the development of new cosmetics products, quality control in food, the design of new materials with improved properties, toxicity modeling, assessment of the risk of chemicals, and the control of chemical processes.
The book is aimed at advanced students as well as lectures but also at scientists that want to learn how chemoinformatics could assist them in solving their daily scientific tasks.
Together with the corresponding textbook Chemoinformatics - Basic Concepts and Methods (ISBN 9783527331093) on the fundamentals of chemoinformatics readers will have a comprehensive overview of the field.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1151

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Title Page

Copyright

Dedication

Foreword

List of Contributors

Chapter 1: Introduction

1.1 The Rationale for the Books

1.2 Development of the Field

1.3 The Basis of Chemoinformatics and the Diversity of Applications

Reference

Chapter 2: QSAR/QSPR

2.1 Introduction

2.2 Data Handling and Curation

2.3 Molecular Descriptors

2.4 Methods for Data Analysis

2.5 Classification Methods

2.6 Methods for Data Modeling

2.7 Summary on Data Analysis Methods

2.8 Model Validation

2.9 Regulatory Use of QSARs

Selected Reading

Reference

Chapter 3: Prediction of Physicochemical Properties of Compounds

3.1 Introduction

3.2 Overview of Modeling Approaches to Predict Physicochemical Properties

3.3 Methods for the Prediction of Individual Properties

3.4 Limitations of Statistical Methods

3.5 Outlook and Perspectives

Selected Reading

References

Chapter 4: Chemical Reactions

Chapter 4.1: Chemical Reactions – An Introduction

References

Chapter 4.2: Reaction Prediction and Synthesis Design

4.2.1 Introduction

4.2.2 Reaction Prediction

4.2.3 Synthesis Design

4.2.4 Conclusion

References

Chapter 4.3: Explorations into Biochemical Pathways

4.3.1 Introduction

4.3.2 The BioPath.Database

4.3.3 BioPath.Explore

4.3.4 Search Results

4.3.5 Exploitation of the Information in BioPath.Database

4.3.6 Summary

Selected Reading

References

Chapter 5: Structure–Spectrum Correlations and Computer-Assisted Structure Elucidation

5.1 Introduction

5.2 Molecular Descriptors

5.3 Infrared Spectra

5.4 NMR Spectra

5.5 Mass Spectra

5.6 Computer-Aided Structure Elucidation (CASE)

Selected Reading

Acknowledgement

References

Chapter 6.1: Drug Discovery: An Overview

6.1.1 Introduction

6.1.2 Definitions of Some Terms Used in Drug Design

6.1.3 The Drug Discovery Process

6.1.4 Bio- and Chemoinformatics Tools for Drug Design

6.1.5 Structure-based and Ligand-Based Drug Design

6.1.6 Target Identification and Validation

6.1.7 Lead Finding

6.1.8 Lead Optimization

6.1.9 Preclinical and Clinical Trials

6.1.10 Outlook: Future Perspectives

Selected Reading

References

Chapter 6.2: Bridging Information on Drugs, Targets, and Diseases

6.2.1 Introduction

6.2.2 Existing Data Sources

6.2.3 Drug Discovery Use Cases in Computational Life Sciences

6.2.4 Discussion and Outlook

Selected Reading

References

Chapter 6.3: Chemoinformatics in Natural Product Research

6.3.1 Introduction

6.3.2 Potential and Challenges

6.3.3 Access to Software and Data

6.3.4

In Silico

Driven Pharmacognosy-Hyphenated Strategies

6.3.5 Opportunities

6.3.6 Miscellaneous Applications

6.3.7 Limits

6.3.8 Conclusion and Outlook

Selected Reading

References

Chapter 6.4: Chemoinformatics of Chinese Herbal Medicines

6.4.1 Introduction

6.4.2 Type 2 Diabetes: The Western Approach

6.4.3 Type 2 Diabetes: The Chinese Herbal Medicines Approach

6.4.4 Building a Bridge

6.4.5 Screening Approach

Selected Reading

References

Chapter 6.5: PubChem

6.5.1 Introduction

6.5.2 Objectives

6.5.3 Architecture

6.5.4 Data Sources

6.5.5 Submission Processing and Structure Representation

6.5.6 Data Augmentation

6.5.7 Preparation for Database Storage

6.5.8 Query Data Preparation and Structure Searching

6.5.9 Structure Query Input

6.5.10 Query Processing

6.5.11 Getting Started with PubChem

6.5.12 Web Services

6.5.13 Conclusion

References

Chapter 6.6: Pharmacophore Perception and Applications

6.6.1 Introduction

6.6.2 Historical Development of the Modern Pharmacophore Concept

6.6.3 Representation of Pharmacophores

6.6.4 Pharmacophore Modeling

6.6.5 Application of Pharmacophores in Drug Design

6.6.6 Software for Computer-Aided Pharmacophore Modeling and Screening

6.6.7 Summary

Selected Reading

References

Chapter 6.7: Prediction, Analysis, and Comparison of Active Sites

6.7.1 Introduction

6.7.2 Active Site Prediction Algorithms

6.7.3 Target Prioritization: Druggability Prediction

6.7.4 Search for Sequentially Homologous Pockets

6.7.5 Target Comparison: Virtual Active Site Screening

6.7.6 Summary and Outlook

Selected Reading

References

Chapter 6.8: Structure-Based Virtual Screening

6.8.1 Introduction

6.8.2 Docking Algorithms

6.8.3 Scoring

6.8.4 Structure-Based Virtual Screening Workflow

6.8.5 Protein-Based Pharmacophoric Filters

6.8.6 Validation

6.8.7 Summary and Outlook

Selected Reading

References

Chapter 6.9: Prediction of ADME Properties

6.9.1 Introduction

6.9.2 General Consideration on SPR/QSPR Models

6.9.3 Estimation of Aqueous Solubility (log

S

)

6.9.4 Estimation of Blood–Brain Barrier Permeability (log

BB

)

6.9.5 Estimation of Human Intestinal Absorption (HIA)

6.9.6 Other ADME Properties

6.9.7 Summary

Selected Reading

References

Chapter 6.10: Prediction of Xenobiotic Metabolism

6.10.1 Introduction: The Importance of Xenobiotic Biotransformation in the Life Sciences

6.10.2 Biotransformation Types

6.10.3 Brief Review of Methods

6.10.4 User Needs: Scientists Use Metabolism Information in Different Ways

6.10.5 Case Studies

Selected Reading

References

Chapter 6.11: Chemoinformatics at the CADD Group of the National Cancer Institute

6.11.1 Introduction and History

6.11.2 Chemical Information Services

6.11.3 Tools and Software

6.11.4 Synthesis and Activity Predictions

6.11.5 Downloadable Datasets

References

Chapter 6.12: Uncommon Data Sources for QSAR Modeling

6.12.1 Introduction

6.12.2 Observational Metadata and QSAR Modeling

6.12.3 Pharmacovigilance and QSAR

6.12.4 Conclusions

Selected Reading

References

Chapter 6.13: Future Perspectives of Computational Drug Design

6.13.1 Where Do the Medicines of the Future Come from?

6.13.2 Integrating Design, Synthesis, and Testing

6.13.3 Toward Precision Medicine

6.13.4 Learning from Nature: From Complex Templates to Simple Designs

6.13.5 Conclusions

Selected Reading

References

Chapter 7: Computational Approaches in Agricultural Research

7.1 Introduction

7.2 Research Strategies

7.3 Estimation of Adverse Effects

7.4 Conclusion

Selected Reading

References

Chapter 8: Chemoinformatics in Modern Regulatory Science

8.1 Introduction

8.2 Data Gap Filling Methods in Risk Assessment

8.4 New Approach Descriptors

8.5 Chemical Space Analysis

8.6 Summary

Selected Reading

References

Chapter 9: Chemometrics in Analytical Chemistry

9.1 Introduction

9.2 Sources of Data: Data Preprocessing

9.3 Data Analysis Methods

9.4 Validation

9.5 Applications

9.6 Outlook and Prospects

Selected Reading

References

Chapter 10: Chemoinformatics in Food Science

10.1 Introduction

10.2 Scope of Chemoinformatics in Food Chemistry

10.3 Molecular Databases of Food Chemicals

10.4 Chemical Space of Food Chemicals

10.5 Structure–Property Relationships

10.6 Computational Screening and Data Mining of Food Chemicals Libraries

10.7 Conclusion

Selected Reading

References

Chapter 11: Computational Approaches to Cosmetics Products Discovery

11.1 Introduction: Cosmetics Demands on Computational Approaches

11.2 Case I: The Multifunctional Role of Ectoine as a Natural Cell Protectant (Product: Ectoine, “Cell Protection Factor”, and Moisturizer)

11.3 Case II: A Smart Cyclopeptide Mimics the RGD Containing Cell Adhesion Proteins at the Right Site (Product: Cyclopeptide-5: Antiaging)

11.4 Conclusions: Cases I and II

References

Chapter 12: Applications in Materials Science

12.1 Introduction

12.2 Why Materials Are Harder to Model than Molecules

12.3 Why Are Chemoinformatics Methods Important Now?

12.4 How Do You Describe Materials Mathematically?

12.5 How Well do Chemoinformatics Methods Work on Materials?

12.6 What Are the Pitfalls when Modeling Materials?

12.7 How Do You Make Good Models and Avoid the Pitfalls?

12.8 Materials Examples

12.9 Biomaterials Examples

12.10 Perspectives

Selected Reading

References

Chapter 13: Process Control and Soft Sensors

13.1 Introduction

13.2 Roles of Soft Sensors

13.3 Problems with Soft Sensors

13.4 Adaptive Soft Sensors

13.5 Database Monitoring for Soft Sensors

13.6 Efficient Process Control Using Soft Sensors

13.7 Conclusions

Selected Readings

References

Chapter 14: Future Directions

14.1 Well-Established Fields of Application

14.2 Emerging Fields of Application

14.3 Renaissance of Some Fields

14.4 Combined Use of Chemoinformatics Methods

14.5 Impact on Chemical Research

Index

End User License Agreement

Pages

C1

iii

iv

v

vi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xxiv

xxv

xxvi

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

395

396

397

398

399

400

401

402

403

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

Guide

Cover

Table of Contents

Foreword

List of Illustrations

Chapter 1: Introduction

Figure 1.1 Fundamental questions of a chemist and the chemoinformatics methods that can be used in providing support for solving these tasks.

Chapter 2: QSAR/QSPR

Figure 2.1 The general QSAR/QSPR procedure.

Figure 2.2 Flow chart showing the general steps for generating QSAR models.

Figure 2.3 Example of a simple decision tree.

Figure 2.4 ANN, example of a two-layered ANN (there are only two layers of weights!).

Figure 2.5 General modeling/validation workflow.

Figure 2.6 PCA score projection of Iris dataset objects on the plane defined by principal components PC1 and PC2; (○)

Iris setosa

, (□)

Iris versicolor

, (◊)

Iris virginica

.

Figure 2.7 Example of the distribution of objects selected by the CADEX method. A PCA score projection of the Iris dataset objects on the plane defined by PC1 and PC2; open and filled symbols represent training and test sets, respectively.

Figure 2.8 Schematic representation of the chemical space for a set of compounds described by the first three principal components (PC1-3). The test set molecules (grey balls) are located within the applicability domain covered by the molecules of the training set (black balls).

Figure 2.9 Modeling/validation workflow. The scope of the external validation is shown in gray color; the scope of the inner validation over the validation set is shown in cross-hatching. The internal validation methods are shown with a bold frame.

Figure 2.10 Two possible ways to perform cross-validation.

Figure 2.11 Example ROC curve (bold line) for a binary classification problem. The straight line is obtained by guessing the class membership.

Chapter 3: Prediction of Physicochemical Properties of Compounds

Figure 3.1 Functional groups for alkanes according to Benson's [7] notation.

Figure 3.2 RMSE of different models as a function of the number of non-hydrogen atoms (NHA) in molecules.

Figure 3.3 The model prediction errors for five sets of compounds are shown as a function of temperature. The model errors for the PATENTS sets match the experimental accuracy of the data in this set.

Figure 3.4 Microspecies and constants using the example of cetirizine. The microspecies are represented as triplets, where the first position refers to the hydroxyl group of the carboxylic acid group, the second one refers to the middle nitrogen atom, and the third position refers to the nitrogen atom farthest away from the carboxylic group; for example, •○• represents the zwitterionic form with one proton bound to the middle nitrogen, the dominant neutral form of cetirizine. (a) Cetirizine. Protonation sites (OH, N, N) in bold face. (b) Protonation scheme. Cetirizine has

n

= 3 protonation sizes, and thus 2

3

= 8 microstates and 3 × 2

2

= 12 microequilibria. The 3 + 1 = 4 macrostates are shown below, with

h

= number of bound hydrogens. (c) Distribution of microspecies as a function of pH. The microspecies •○○ and ○○• are very close to the baseline. (d) Microconstants. All values are experimentally determined.

Figure 3.5 Reaction equation for the ionization of aliphatic carboxylic acids with the physicochemical effects is indicated:

α

O

is effective polarizability,

Q

σ

is inductive effect on the ionizable atom,

A

2D

is steric hindrance at the ionization site, and

χ

π

is electronegativity at the

π

-carbon atom.

Chapter 4: Chemical Reactions

Figure 4.1 Different types of problems encountered in dealing with chemical reactions.

Figure 4.2 Intermediate in the synthesis of maitotoxin, one of the most complex targets for total synthesis. The light grey parts of the molecule show the regions that have been synthesized [6]. Much of the hard work has been completed. However, joining these fragments together will require substantial work, as will completing the remaining fragments of the molecule.

Figure 4.3 Eribulin/Halaven. The most complex molecule synthesized and sold.

Figure 4.4 Acid-catalyzed rearrangement to dolabriferol.

Figure 4.5 An overview of the methods in WODCA.

Figure 4.6 Biochemical Pathways wall chart (https://www.roche.com/pathways [3] – accessed January 2018).

Figure 4.7 Details on a reaction on the Biochemical Pathways wall chart.

Figure 4.8 Positions indicating the occurrence of l-glutamate on the wall chart.

Figure 4.9 Different views on a biochemical reaction.

Figure 4.10 Details on the query molecule chorismate.

Figure 4.11 Results of a search for monooxygenases in the “Reaction” field.

Figure 4.12 Results for searching for enzyme EC “1.13.12.4.”

Figure 4.13 Shortest pathway from farnesyl-diphosphate to artemisinin.

Figure 4.14 Section of the full SOM of the 135 reactions that shows the distribution of reactions of the subsubclasses of EC 3.1.c.d.

Figure 4.15 SOM of reactions from all classes of enzymes from EC 1.b.c.d. to EC 6.b.c.d.

Figure 4.16 Effect of an enzyme on the energies of the substrate, the transition state, the intermediate, and the product of a reaction. The substantial lowering of the energies of the transition states and the intermediate is clearly distinguished.

Figure 4.17 The intermediate of the conversion of AMP to IMP obtained by addition of water to AMP. The structure of the inhibitor of AMP deaminase, carbocyclic coformycin.

Figure 4.18 Superimpositions of the 3D structure of the inhibitor carbocyclic coformycin onto the substrate AMP, the intermediate, and the product IMP of the deamination of AMP.

Figure 4.19 Outline of the method for uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Figure 4.20 Outline of the chemoinformatics analysis of the reverse pathway engineering approach.

Figure 4.21 Three pathways to 3-methylbutanoic acid, generated by the RPE approach. The sequences (a) and (b) are known; sequence (c) is novel.

Figure 4.22 Suggested reaction from the sequence in Figure 4.21c and the reference reaction from BioPath.Database.

Figure 4.23 Tree of the l-lactate oxidase (LOX) homologs from lactic acid bacteria (LAB).

Chapter 5: Structure–Spectrum Correlations and Computer-Assisted Structure Elucidation

Figure 5.1 The HOSE code for a selected carbon atom describes its structural environment in hierarchical order by walking through the molecule in spheres. Only the non-hydrogen atoms are explicitly considered.

Figure 5.2 RDF code for a phosphonic ester using atomic numbers as atom property.

Figure 5.3 Training of a CPG NN to learn relationships between structures and IR spectra and example of a simulated spectrum.

Figure 5.4 Scheme of a counterpropagation network for the derivation of 3D structures.

Figure 5.5 Radial distribution function for proton H-6 using partial atomic charge as the atomic property and indications of the distances contributing to each peak.

Figure 5.6 Experimental

1

H NMR spectrum (below) of the structure in the upper right corner compared to the full spectrum predicted by SPINUS (above) for the same structure. (*) In the experimental spectrum, the signal at 7.26 ppm is from the solvent (CDCl

3

), the signal at 5.98 ppm is from the exchangeable NH proton, and the peaks at 2.85–2.95 ppm are from residues of DMF.

Figure 5.7 Screen capture of Mnova NMR commercial software (Mestrelab Research, S.L, www.mestrelab.com). In this example, the selected active page contains one experimental

1

H NMR spectrum, fully processed and analyzed, stacked together with its

1

H NMR predicted counterpart. Automatic assignment of the signals (possible with

1

H,

13

C, HSQC, and COSY experiments) is also depicted; atom labels are color-coded depending upon the quality of the assignment as derived from a fuzzy logic expert system.

Figure 5.8 Screenshot of Mass Frontier software (HighChem, Ltd., www.highchem.com) showing predicted fragmentation mechanisms for a user provided compound.

Figure 5.9 Screenshot of ACD/Structure Elucidator Suite, Version 2016.1, Advanced Chemistry Development, Inc., Toronto, ON, Canada, www.acdlabs.com.

Chapter 6.1: Drug Discovery: An Overview

Figure 6.1.1 The drug discovery process.

Figure 6.1.2 Distribution of drug targets.

Figure 6.1.3 Kohonen map (10x7) obtained from a dataset of 112 dopamine agonists and 60 benzodiazepine agonists.

Figure 6.1.4 Kohonen map (40x30) of a dataset consisting of the dopamine and benzodiazepine agonists of Figure 6.1.3 and 8,223 compounds of a chemical supplier catalog.

Figure 6.1.5 Structures that were mapped into the neuron at position 5,8 of the Kohonen map of Figure 6.1.4.

Figure 6.1.6 Synthesis and high-throughput screening results of a library of hydantoins.

Figure 6.1.7 SOMs of a library of 5,513 hydantoins obtained through six different structure representations. ESP, electrostatic potential; HBP, hydrogen-bonding potential; HYP, hydrophobicity potential. Neurons that obtained a hit in dark gray; neurons with only non-hits in light gray.

Figure 6.1.8 Development of a filter for hits in the hydantoin library. (a) SOM of the training set, (b) classification map obtained from the neurons with hits and their first-sphere neighbor neurons, and (c) SOM of the test set.

Figure 6.1.9 Flexible superimposition of the 3D structure of three muscle relaxants: chlorpromazine, tolperisone, and tizanidine.

Figure 6.1.10 Different strategies to design a ligand in target-based drug discovery: docking (left), building (center), and linking (right). D = H-bond donor, A = H-bond acceptor, H1, H2 = hydrophobic regions of the protein.

Figure 6.1.11 Thermodynamics of ligand binding.

Figure 6.1.12 Factors affecting lead identification and optimization.

Figure 6.1.13 Compounds showing baseline toxicity (narcosis).

Figure 6.1.14 Compounds having a variety of toxic modes of action (MOA).

Figure 6.1.15 Architecture of a counterpropagation neural network for classifying phenols into four different MOAs.

Figure 6.1.16 Distribution of phenols in the four output layers of the counterpropagation network.

Figure 6.1.17 Screen of ChemoTyper: on the left are three chemotypes of the thalidomide skeleton with chemotypes differentiated by sigma charge (light gray) and total charge (dark gray); on the right-hand side is part of a dataset that indicates hits for the two different chemotypes.

Chapter 6.3: Chemoinformatics in Natural Product Research

Figure 6.3.1 Selection of appropriate modeling tools depending on the aim of the study.

Figure 6.3.2 Depending on the selected methods, theoretical validation experiments are necessary to select the best performing models for making predictions.

Figure 6.3.3 Virtual screening workflow including additional filtering and selection criteria.

Figure 6.3.4 Implementation of chemoinformatics in natural product research.

Figure 6.3.5 Virtual screening approach for the identification of novel active constituents for the target of interest.

Figure 6.3.6 Perlatolic acid fitted into a pharmacophore model for mPGES-1 inhibitors. Chemical features are color-coded: hydrophobic, gray; negatively ionizable group, dark blue; aromatic ring (brown and blue plain). A steric restriction (cyan shape) is depicted as light gray cloud.

Figure 6.3.7 Target fishing approach for the identification of macromolecular targets for a specific compound.

Figure 6.3.8 Computational methods as tools to provide insight into the molecular ligand binding interaction.

Figure 6.3.9 Computational methods for the selection of plant material as promising starting point for experimental investigations.

Chapter 6.4:

Figure 6.4.1 Creating a herbal prescription database, which contains data regarding herbs and their chemical structures of active constituents against “Xiaoke.”

Figure 6.4.2 Creating anti-T2D compound database (ADB) from scientific literatures, which contains the chemical structures of anti-T2D agents, the mechanisms, and targets.

Figure 6.4.3 HCMN. The network demonstrates the relations among herbs, chemotypes, and mechanisms of actions.

Figure 6.4.4

Radix Rehmanniae

(H02) is one of the top herbs for treating T2D; its main active constituents have cinnamyl-acid-like common fragments/chemotypes, which are associated with mechanism group M06, particularly for ALR2 target. Library for virtual screening. The compounds selected for the virtual library are based on cinnamyl-acid-like scaffolds/mimics.

Figure 6.4.5 Discovering new anti-T2D agents. The virtual hits are confirmed by chemistry and bioassays.

Figure 6.4.6 Cinnamyl-acid-like compounds as anti-T2D agents (ALR2 inhibitors).

Figure 6.4.7 Epalrestat.

Figure 6.4.8 Anti-T2D agents derived from Dihuang and Huangqi. Cinnamyl-acid-like chemotypes are in bold. Glycosides are colored in light grey.

Chapter 6.6:

Figure 6.6.1

p

-Aminobenzoic acid (PABA) and

p

-aminobenzenesulfonamide are isosteres and show similarities regarding interatomic distances that are critical for binding to the dihydrofolate reductase enzyme surface [2]. Binding of the sulfonamide instead of PABA thus inhibits the biosynthesis of tetrahydrofolic acid.

Figure 6.6.2 Analogy between estradiol and

trans

-diethylstilbestrol [2].

Figure 6.6.3 Interaction capabilities of natural (

R

)-(−)-adrenaline (a) and its stereoisomer (

S

)-(+)-adrenaline (b) [2].

Figure 6.6.4 Hydrogen bonding geometry: the involved N, H, and O atoms are nearly linearly aligned. The N–O distance is typically between 2.8 and 3.2 Å. The N–H–O angle is >150° and the CO–H angle between 100° and 180°.

Figure 6.6.5 On the formation of lipophilic contacts (hydrophobic interactions), water molecules covering lipophilic areas of the binding pocket are forced to move to the outside of the ligand–receptor complex. This increases the entropy of the system due to a gain in mobility of the water molecules. The resulting contribution to the binding affinity is typically between −100 and −200 J/mol per Å

2

of the lipophilic contact surface.

Figure 6.6.6 Steric configurations of π–π and cation–π interactions [25].

Figure 6.6.7 Thermolysin in complex with the hydroxamic acid inhibitor

N

-[(2

S

)-2-benzyl-3-(hydroxyamino)-3-oxopropanoyl]-l-alanyl-

N

-(4-nitrophenyl)glycinamide (BAN, PDB-code: 5TLN). The Zn

2+

ion is penta-coordinated with the characteristic amino acids Glu166, Hist142, and His146 of thermolysin, and the hydroxyl- and carbonyl-oxygen of the hydroxamic acid moiety [27].

Figure 6.6.8 Receptor-based pharmacophore generated by

LigandScout

for the CDK2/inhibitor complex 1KE9. Gray spheres represent exclusion volumes that model the shape of the receptor surface. Yellow spheres represent hydrophobic, green arrows hydrogen bond donor, and red arrows hydrogen bond acceptor features. The blue spherical star represents a positive ionizable group in an ionic interaction.

Figure 6.6.9 Ligand-based pharmacophore modeling workflow starting from a set of known actives.

Figure 6.6.10 (a) Selection of

n

molecules from a database with

N

entries. (b) ROC curves for an ideal, an overlapping and a random distribution of actives and decoys.

Figure 6.6.11 (a) 3D pharmacophore derived from the TLR2 binding site. (b) Binding mode of the TLR2 antagonist discovered by pharmacophore-based virtual screening.

Figure 6.6.12 3D pharmacophore and dynophore of kaempferol bound to SULT1E1. (a) Static view of kaempferol bound to SULT1E1 with depicted 3D pharmacophore. (b) Kaempferol is represented with the resulting dynophore showed as spatial point clouds.

Chapter 6.7:

Figure 6.7.1 (a) Protein structure (gray) with unbound potential ligand (blue) in surface representation. (b) Prediction of potential binding sites (yellow, red, blue). (c) Protein–ligand complex structure, where ligand binds to the yellow binding site.

Figure 6.7.2 Exemplary illustration of pocket detection methods vertically grouped into grid-based and grid-free approaches and horizontally separated into geometry- and energy-based methods. Republished with permission of Future Science Group, from Future Med. Chem. (2014) 6(3), 319–31; permission conveyed through Copyright Clearance Center, Inc.

Figure 6.7.3 DoGSiteScorer-based model building and druggability prediction: First, pockets were predicted for all structures of the DD dataset. Descriptors were calculated and discriminative features were selected, based on which a SVM model was trained. Finally, this model can be used for druggability predictions of novel target structures.

Figure 6.7.4 (a) Binding site of monomeric cyclin-dependent kinase 2 (PDB code 1AQ1) [74]. (b) Four identical subunits of HIV-1 protease forming two symmetrical binding sites (PDB code 5KR2) [75].

Figure 6.7.5 Depiction of the structural triangle descriptor used in TrixP. (A) Example of the TrixP descriptor with two hydrogen-bond donors (blue) as well as an apolar point (yellow) as triangle corners. (B) Schematic superposition of two different binding sites based on a matching descriptor with a hydrogen-bond donor (blue), a hydrogen-bond acceptor (red), and an apolar point (yellow) as triangle corners. (a,b) show identical descriptors in two different binding sites and (c) shows the respective superposition of the binding sites based on the matching descriptors.

Chapter 6.8: Structure-Based Virtual Screening

Figure 6.8.1 Interactions in a protein–ligand complex (PDB code: 1SQN). The major energetic contributions result from hydrogen bonds and hydrophobic effect. The protein surface is colored according to hydrophobicity (dark gray, hydrophilic atoms; white, hydrophobic atoms). The dominating interaction in the complex of norethindrone with the progesterone receptor is the hydrophobic effect, which is caused by the burial of the four aliphatic rings of the ligand in a deep hydrophobic pocket of the protein. The two hydrogen bonds (left side) contribute less to the overall binding affinity; they rather assist in orientating the ligand in the active site.

Figure 6.8.2 Best practice SBVS workflow.

Figure 6.8.3 Score histograms (a) allow an intuitive assessment of a docking program's sensitivity and specificity for a defined cutoff. Enrichment plots (b) and ROC curves (c) are used to assess a docking program's quality. The example shows two hypothetical docking runs for a library of 10,000 compounds containing 100 active ligands.

Chapter 6.9: Prediction of ADME Properties

Figure 6.9.1 Performance of classification models on blood–brain barrier permeability

Chapter 6.10: Prediction of Xenobiotic Metabolism

Figure 6.10.1 Generalized scheme of the consequences of metabolic biotransformation.

Figure 6.10.2 (a) Simplified catalytic cycle for monooxygenation effecting the overall conversion: . (b) The CYP iron-heme prosthetic group – the enzyme's catalytic center.

Figure 6.10.3 Indomethacin and a number of its amide derivatives (from Ref. [13]) with half-life values in minutes (

t

1/2

) indicated for rat and human liver microsomes. Preferred sites of metabolism for each analog as estimated by the MetaSite program are indicated by a grey circle. (Adapted from Marchant

et al

. 2016 [13].)

Figure 6.10.4 The structure of quetiapine (

6

). Areas of metabolic liability are indicted by grey colored spheres:

O

-dealkylation,

N

-dealkylation, sulfur oxidation, and carboaromatic hydroxylation.

Figure 6.10.5

In vivo

(pig) and

in vitro

(human and porcine liver microsomes) metabolism of 25B-NBOMe.

Figure 6.10.6 Top five observed sites of metabolism as predicted by Meteor Nexus. The annotated sites of metabolism (SoMs) are indicated in Table 6.10.4.

Figure 6.10.7 Putative pathways of toxification suggested by a Meteor Nexus analysis of 25B-NBOMe. Potentially adduct-forming intermediates are depicted in light grey (18, 20, 21, 23).

Chapter 6.11: Chemoinformatics at the CADD Group of the National Cancer Institute

Figure 6.11.1 Web form of the Enhanced NCI Database Browser.

Figure 6.11.2 Possible workflows of CIR queries.

Figure 6.11.3 CSLS results page (excerpt) for search with query string “740.”

Figure 6.11.4 Web form of the GIF/PNG Creator web service.

Chapter 6.12: Uncommon Data Sources for QSAR Modeling

Figure 6.12.1 The growth of publications on QSAR modeling correlates with the accumulation of experimental data. The chart is generated by Google Ngram Viewer (http://books.google.com/ngrams);

Y

-axis – percentage among all books in the Google Ngram.

Figure 6.12.2 Schematic workflow showing the use of multiple data sources for developing, interpreting, and validating QSAR models that classify drugs as SJS-active or inactive. VigiBase provided 364 drugs whose chemical structures were used as variables for QSAR modeling. QSAR models provided structural alerts for interpretation and predicted potential SJS actives and inactives in DrugBank. Finally, the predicted actives and inactives were evaluated for evidence of SJS activity or lack thereof in VigiBase, ChemoText, and Micromedex (see text for additional discussion).

Chapter 6.13: Future Perspectives of Computational Drug Design

Figure 6.13.1 Schematic of an integrated design, synthesis, and screening platform illustrating the fully automated process with a feedback loop for adaptive compound optimization. An adaptive quantitative structure–activity relationship model guides the compound design and selection process. The diagram on the right illustrates an on-chip microreactor platform as a prototype of fully integrated future design–synthesize–test instruments for drug discovery. The example depicts a module for reductive amination.

Figure 6.13.2 Examples of computationally

de novo

designed and chemically synthesized bioactive compounds, taken from recent publications.

Figure 6.13.3 Ligand design in computed fitness landscapes. The process starts with virtual compound enumeration (upper left) and visualization of the populated chemical space (upper right). Then, computed target activities are highlighted and suitable compounds identified, shown here for ligand selectivity for human sigma-1 and D4 receptors (lower right). In the landscapes, the coloring indicates regions of chemical space with a high (dark gray) and low (light gray) probability of finding ligands of the respective receptor.

Figure 6.13.4 From a known drug (Fasudil

4

) to a

de novo

generated mimetic agent by computational fragment assembly. The cartoon illustrates the complex between the computer-generated ligand

5

and its macromolecular target, death-associated protein kinase 3 (PDB-ID: 5a6n). Essential hydrogen bridges are shown as dashed lines.

Figure 6.13.5 From a complex natural product template to a synthetically easily accessible mimetic by computer-assisted

de novo

design. Morphing of the anticancer natural product (−)-englerin A into an isofunctional compound with a different scaffold enabled the discovery of a novel class of potent and selective inhibitors of transient receptor potential (TRP) M8 calcium channels.

Chapter 7: Computational Approaches in Agricultural Research

Figure 7.1 Chemical structures and superimposed X-ray coordinates of 1,2-diphenylethane (dark, CSD-code DIBENZ04) and benzyloxybenzene (bright, CSD-code MUYDOZ) indicating the different orientations of one phenyl ring induced by substitution of methylene with an ether function.

Figure 7.2 Superposition of a Protox inhibitor from pyridinedione type on a calculated protoporphyrinogen-like template (cyan). For reasons of clarity corresponding ring systems are indicated and hydrogen atoms are omitted. Atoms are color coded as follows: carbon gray, nitrogen blue, oxygen red, sulfur yellow, and chlorine green.

Figure 7.3 Common interaction pattern of potent Protox inhibitors from uracil (left) and pyridine type. Each molecule comprises two ring systems and electron-rich functions on both sides of the linked rings (blue and red colored).

Figure 7.4 Pharmacophore model of 318 Protox inhibitors (color code as indicated at Figure 7.2).

Figure 7.5 Graph indicates the correlation of experimental and predicted IC

50

values yielded by a “leave-one-out” cross-validation (

q

2

= 0.95) for the pharmacophore model shown in Figure 7.4.

Figure 7.6 Contour map derived from a 3D-QSAR study. Clouds indicate favorable space to be occupied by potent Protox inhibitors. While the highly active imidazolinone derivative (a) fits almost perfectly, the ethylcarboxylate residue of the weaker ligand protrudes the preferred region (b).

Figure 7.7 Pseudoreceptor model for insecticidal ryanodine derivates constructed with the program PrGen [4]. The binding site model is composed of seven amino acid residues and contains the structure of ryanodine [5]. Hydrogen bond interactions are indicated with dashed lines.

Figure 7.8 Protocol of a classical docking and scoring procedure. The binding site cavity is characterized via, for example, hydrophobic (circles), hydrogen bond donor (lines), and hydrogen bond acceptor properties (circle segment). Each compound of a database (or real library) is flexibly docked into the binding site, and the free binding energy [kJ/mol] for each of the derived poses is estimated by a mathematical scoring function.

Figure 7.9 X-ray crystallographically determined binding site of Protox [15] including the co-crystallized inhibitor INH (structural formula see Figure 7.10) and a part of the cofactor FAD. Highlighted is Arg98 at the entrance of the binding site cavity interacting with INH and almost all solutions of the FlexX approach via electrostatic and hydrogen bond interactions. Two docking poses representing a cluster of yielded solutions are indicated: one at the outside and one inside the binding site cavity (orange-colored carbon atoms).

Figure 7.10 Structural formula of INH and comparison of the poses derived from FlexX docking (single colored) and crystallization experiment (thick). Indicated is the crucial Arg98 that stabilizes all poses with the exception of the blue-colored solution, which interacts with the acid group (red-colored oxygen atoms) to the opposite side (i.e., Asn67).

Figure 7.11 Comparison of two docking solutions for BASF's uracil derivative UBTZ with the bound INH. UBTZ interacts with Arg98 over the carbonyl oxygen of uracil and a fluorine of the benzothiazole ring (a) or the nitrogen atom of the benzothiazole ring (b).

Figure 7.12 Docking solution for protoporphyrin IX in the Protox binding site. One propionic acid is close to Arg98, but does not form an explicit hydrogen bond. Asterisks indicate the proposed reaction centers C20 of protoporphyrin IX and N5 of FAD (see text for details).

Chapter 8: Chemoinformatics in Modern Regulatory Science

Figure 8.1 Parallel advances in science and computational technology over time.

Figure 8.2 The decision framework for the Threshold of Toxicological Concern.

Figure 8.3 Read-across tree, hierarchy of logical relationships.

Figure 8.4 Chemistry-aware 3-tiered architecture. (a) Typical chemistry-aware RDBS, (b) new technology.

Figure 8.5 Top-level diagram of the data model for the chemistry-centered toxicity database.

Figure 8.6 Typical coverage plot of ToxPrints against datasets. The solid and open circles represent the PAFA and Tox21 datasets respectively.

Figure 8.7 Histogram of structural hits matching with the chemotypes.

Figure 8.8 Hammett constant and the substitution effect on p

K

a

values.

Figure 8.9 Effect of substituents on the charges at the ring carbon atoms (ipso position) and on the p

K

a

values of substituted benzoic acids.

Figure 8.10 Histogram of reactions rules for different chemical inventories.

Figure 8.11 Linear fragments using graph theory and a depth-first search algorithm.

Figure 8.12 Effect of path length and annotation scheme on the number of unique linear paths generated from a set of 4400 compounds from the PAFA database. Annotation options are atom identity (AI), number of heavy-atom connections (

n

C), number of connected hydrogen atoms (

n

H), and atom partial charge (PC).

Figure 8.13 Comparison of the number of unique linear paths generated from 4400 compounds from two different datasets: PAFA and Tox21. Annotation scheme used was (AI,

n

C,

n

H, PC).

Figure 8.14 Chemical space comparison of four inventories by principal component analysis using ToxPrint chemotypes (a) and physicochemical properties (b).

Chapter 9: Chemometrics in Analytical Chemistry

Figure 9.1 Comparison of raw data with scaled data (a) and the effect of scaling on the calibration line (b).

Figure 9.2 Combination of SRD with ANOVA decomposes the effects of factors in an easily perceivable way. Data preprocessing methods: scl, range scaling; nor, normalization to unit length; rnk, rank transformation; std, standardization (autoscaling); type of tissue: digestive gland, circles; gills, boxes; haemolymph, rhombuses; comet assay evaluation methods: tail intensity (a), tail length (b), olive tail moment (c).

Figure 9.3 Matrix decomposition in principal component analysis. The loading matrix

P

′ contains the coefficients of the original variables in the principal components, while the score matrix

T

contains the principal component scores (values) of the samples.

E

is an error matrix in cases of

a

<

m

. (If

a

=

m

the

E

matrix is empty, contains zeros only).

Figure 9.4 Thirty chromatographic columns are grouped according to their various polarity metrics with hierarchical cluster analysis using Ward's method (as the linkage rule) and the Euclidean distance. Arbitrary horizontal lines at about 20 or 10 distance units define two or three clusters, respectively.

Figure 9.5 Self-organizing map of a dataset of Italian olive oils and its comparison with the map of Italy and the regions of origin of the olive oil samples. (Copyright 1994, with permission from Elsevier.)

Figure 9.6 LDA plots with confidence ellipsoids and separating lines. Note that the line separating two groups goes by definition through the intersections of the two ellipsoids by definition.

Figure 9.7 Transition between the three periods as defined by the coins' metal content.

Figure 9.8 A simple example of a classification tree. Each junction corresponds to a binary decision based on the value of a variable, while each leaf is a possible outcome of the classification. (Note that not necessarily just one leaf can classify the sample into a given group.)

Figure 9.9 Regularization parameter space of a support vector machine classification model. Color coding corresponds to the classification performance (the higher the better). Many combinations may produce the same model goodness.

Figure 9.10 An example of

n

-class ROC curves. Grey curves correspond to individual classes (1–3), while black is the average curve calculated with the Hanley formula. Dashed lines indicate ±1 standard deviation from the average.

Figure 9.11 Schematic representation of PLS regression.

Figure 9.12 Canonical correlation analysis distinguishes five classes (five plant variants b) from real NIR spectra (a). However, a randomization test clearly shows that five arbitrary classes (d) can be found also for random vectors (c). In the present case, canonical modeling could not pass the randomization test, and a serious decrease in the number of included variables (wavelengths) is necessary.

Figure 9.13 The scheme of repeated double cross-validation.

Chapter 10: Chemoinformatics in Food Science

Figure 10.1 Common structure representations of chemical structures to analyze chemical space. The representation of lipoic acid is used as an example.

Figure 10.2 Generative topographic mapping (GTM) visualization of the chemical space of 1477 generally recognized as safe (GRAS) compounds (blue), 2133 Everything Added to Food in the United States (EAFUS) (green), 1798 approved drugs from DrugBank (red), and 549 compounds tested as DNMT1 inhibitors (black). Molecules are represented using MACCS keys fingerprints (166-bits). The Figure was generated using compound databases prepared by Mariana González-Medina.

Figure 10.3 Schematic fingerprint-based representation of flavor descriptors. The Figure shows five representative flavor descriptors. The descriptors that are commonly obtained from sensory analysis can be encoded in a binary fingerprint. Molecules contained in foods such as humulene and lactose found in beer and milk, respectively, could be related to the flavor descriptors.

Figure 10.4 Example of a typical flavor cliff: pair of compounds with high structure similarity but very different flavor.

Figure 10.5 Chemical structures of food-related chemicals identified from computational-driven approaches mentioned in Section 10.6.

Figure 10.6 Chemical structures of compounds associated with inhibition of histone deacetylases and discussed in Section 10.6.2.1.

Figure 10.7 Box plots of six physicochemical properties of food-related chemical databases (GRAS and EAFUS), approved drugs (DrugBank), and inhibitors of DNA methyltransferases (DNMTs).

Figure 10.8 Top five most frequent molecular scaffolds identified in 1477 generally recognized as safe (GRAS) compounds and 2133 Everything Added to Food in the United States (EAFUS) chemicals. For reference, the most populated scaffolds in 1798 approved drugs (from DrugBank), and 549 compounds tested as inhibitors of DNA methyltransferase 1 are shown. For each scaffold, the frequency (and percentage) is indicated above the structure diagram.

Chapter 11: Computational Approaches to Cosmetics Products Discovery

Figure 11.1 Molecule structure of ectoine with the two mesomeric forms (a) and its hydrophilic surface colored according to the corresponding atomic partial charges (b).

Figure 11.2 Stick representation of atoms in the ectoine–water sphere. The gray-colored area of the ectoine–water cluster is presented at a higher resolution to illustrate the molecular composition of the cluster. The small picture corresponds to Figure 11.3, B1.

Figure 11.3 Molecular dynamics simulation of different models containing (A) water, (B) water and ectoine, and (C) water and glycerol. The pictures are taken at the beginning of the simulation (

t

= 0, A1, B1, C1) and after 200 (A2), 1000 (B2) and 500 ps (C2) at a constant temperature of 370 K. Water clusters around ectoine molecules remain stable for a long period of time, whereas the cluster of water and glycerol breaks down and water molecules diffuse out of the spheres. The pictures represent the number of water molecules counted during the dynamic simulation as shown in Table 11.1. The solutes are colored in green.

Figure 11.4 Cyclopeptide-5 consists of five amino acids. RGD, which is the one letter code for arginine (R), glycine (G), and aspartate (D) sequence. d-Phe represents d-phenylalanine and ACHA is the short name for aminocyclohexane carboxylic acid.

Figure 11.5 Cartoon of the extracellular part (Ectodomain) of αvβ3 integrin crystal structure in the complex with (a) fibronectin with RGD loop in cyan circle and (b) cyclopeptide-5 as a mimic of the RGD loop in fibronectin.

Figure 11.6 The canonical map of cell adhesion and ECM remodeling. Effects of cyclopeptide-5 treatment (0.5 μM) on gene expression in skin cells are visualized on the map as thermometer-like Figure Upward thermometers with red color indicate up-regulation and downward (blue) ones indicate down-regulation.

Chapter 12: Applications in Materials Science

Figure 12.1 Structural diversity of the nanoworld: zero-dimensional (point), one-dimensional (linear), fractal, two-dimensional, and three-dimensional nanoparticle fragments.

Figure 12.2 Measured and predicted Hela cell uptake (×10

−11

g/cell) of nanoparticles with dual ligands on their surfaces. (a) The linear model and (b) the nonlinear model. The training set is denoted by circles, and the test set by triangles.

Figure 12.3 Plots of measured versus predicted

T

g

value for a dataset consisting of 275 random copolymers in experiment 1 where the training error tolerance (TET) is set at 60 K and experiment 2 where TET is 30 K.

Figure 12.4 Predicted turnover numbers for 60,000 virtual cross-coupling reactions are plotted versus the first two PCs calculated for all the reaction descriptors. The first PC is correlated mainly with the Pd loading and the electronic descriptors of the organic residue on the alkene, while the second represents the ligand's electronic descriptors.

Figure 12.5 Predicted (QSPR) versus actual (derived from Monte Carlo simulations) for methane storage capacities during cross-validation of the 10,000 MOFs in the training set at 35 bar (a) and 100 bar (b).

Figure 12.6 Predicted (QSPR) versus actual (derived from Monte Carlo simulations) methane storage capacities for the 127,953 MOFs in the test set using the SVM models at 35 bar (a) and 100 bar (b).

Figure 12.7 Spotting of mixed comonomers onto a slide. After polymerization by UV light, the tiny polymer spots are sterilized and exposed to cells or bacteria in culture and the degree of attachment, and growth recorded after specific time intervals.

Figure 12.8 The predicted and measured adhesion of embryoid bodies to polymers in a library of acrylates. The performance of the model in predicting (a) the training set and (b) the test set.

Figure 12.9 Attachment of pathogenic

P. aeruginosa

to the polyacrylate library.

Figure 12.10 The performance of the three QSPR models for

P. aeruginosa

(a),

S. aureus

(b), and UPEC (c). The prediction of the attachment to polymers in the training set are shown in black circles, while test set predictions are in gray triangles. The attachments are on a log scale.

Figure 12.11 Performance of one of the cell division symmetry markers H2AFZ found by sparse chemoinformatics feature selection. Panels show cell nuclei labeled with DAPI (4′,6-diamidino-2-phenylindole, a fluorescent stain that binds strongly to the cell nucleus) in cells dividing symmetrically (a) and asymmetrically (b). Panels (c) and (d) show the same cells labeled with antibody to H2AFZ expression. In the asymmetric cell division case, only one cell is visible (the stem cell).

Chapter 13: Process Control and Soft Sensors

Figure 13.1 Basic concept of a soft sensor.

Figure 13.2 Detection of abnormal values of an analyzer using a soft sensor.

Figure 13.3 Flow of soft sensor analysis and problems involved at each stage.

Figure 13.4 Basic concepts of the degradation of a linear soft sensor model [10].

Figure 13.5 Case study for the EOSVR method.

Figure 13.6 Comparison of OSVR and EOSVR: time plots of the denitration outlet NH

3

.

Figure 13.7 Case study for database management: a distillation column.

Figure 13.8 Comparison of a process without database management (a) and with database management (b): time plots of measured and predicted

y

(MW).

Figure 13.9 Basic concepts of an inverse analysis of a soft sensor model.

List of Tables

Chapter 2: QSAR/QSPR

Table 2.1 Classification of molecular descriptors

Table 2.2 Commonly used data analysis methods in cheminformatics

Table 2.3 Cross-validation types depending on the value of

k

Table 2.4 Critical values for the

Z

-test

Chapter 3: Prediction of Physicochemical Properties of Compounds

Table 3.1 Experimental mean molecular polarizabilities and values calculated by Eq. (3.13)

Table 3.2 Group contributions to

C

p

°,

S

°, and Δ

H

f

° for ideal gases at 25 °C, 1 atm. for alkanes

Table 3.3 Allen's scheme: Substructures, notations, and contributions to heats of formation and heats of atomization (values in kJ/mol)

Chapter 4.3: Explorations into Biochemical Pathways

Table 4.1 The first 30 instances (from 65) of molecule names that contain the substring “glutamate”.

Table 4.2 Number of occurrences in the BioPath.Database

Table 4.3 Metabolites most frequently found as substrates or products in the database

Chapter 6.1: Drug Discovery: An Overview

Table 6.1.1 Chemoinformatics and bioinformatics methods used at various stages of the drug discovery process

Table 6.1.2 Drug targets and mechanisms of drug action

Chapter 6.2: Bridging Information on Drugs, Targets, and Diseases

Table 6.2.1 Selected key data resources focusing on the three pharma R&D relevant entities: targets, compounds, and diseases

Chapter 6.3: Chemoinformatics in Natural Product Research

Table 6.3.1 Selected chemoinformatics tools.

Table 6.3.2 Natural products databases.

Chapter 6.6: Pharmacophore Perception and Applications

Table 6.6.1 Classification of the abstraction levels of chemical features

Chapter 6.7: Prediction, Analysis, and Comparison of Active Sites

Table 6.7.1 Nine pocket detection methods together with the category they belong to, the datasets they have been evaluated on, and the respective prediction accuracies. Besides the general ability to find a location that somehow overlaps with the co-crystallized ligand, accuracies with respect to more precise correctness criteria are added, if available in the respective publications

Table 6.7.2 Pocket prediction success rates for 48 bound and 48 unbound proteins, sorted by publication year. Success rates present the percentage of cases in which the correct active site was calculated by the respective algorithms

Table 6.7.3 Exemplary (non-exhaustive) summary of different evaluation studies of methods for binding site comparison including information about the method category they belong to, the number of structures used for evaluation, and the results as stated in the respective publication

Chapter 6.8: Structure-Based Virtual Screening

Table 6.8.1 Successes in structure-based virtual screening

Chapter 6.9: Prediction of ADME Properties

Table 6.9.1 The prediction performance of solubility models based on calculated descriptors.

Table 6.9.2 Performance of classification models on blood–brain barrier permeability

Table 6.9.3 Performance of QSPR models on log

BB

Table 6.9.4 Performance of SPR and QSPR models on human intestinal absorption (HIA)

Table 6.9.5 Performance of SPR and QSPR models for the prediction of various ADME properties

Chapter 6.10: Prediction of Xenobiotic Metabolism

Table 6.10.1 Some common phase I biotransformation types

Table 6.10.2 Some of the important xenobiotic-metabolizing CYPs in humans

Table 6.10.3 Some common phase II biotransformation types

Table 6.10.4 First-generation biotransformation predictions from a Meteor Nexus analysis of 25B-NBOMe

Table 6.10.5 Selected metabolites from a Meteor Nexus analysis of 25B-NBOMe and their corresponding Derek Nexus hepatotoxicity alerts. Alerting substructures are shown in light grey

Chapter 7: Computational Approaches in Agricultural Research

Table 7.1 Comparison of all classifiers for mutagenicity applied from Hansen

et al

. [28]

Table 7.2 Statistical results for selected QSAR models predicting aquatic toxicity [30]

Table 7.3 Statistical performance of both carcinogenicity models classified by ANN [33]

Chapter 8: Chemoinformatics in Modern Regulatory Science

Table 8.1 Comparison of QSAR and structural rules

Table 8.2 log

NOEL

distributions and Cramer classes

Table 8.3 Central data elements in a chemistry-centered toxicity database

Table 8.4 Regulatory science-related chemical inventories

Table 8.5 p

K

a

values of benzoic acids

Table 8.6 Metabolic transformation reactions in human liver

Chapter 9: Chemometrics in Analytical Chemistry

Table 9.1 A selection of applications of chemometric methods in contemporary analytical chemistry literature

Chapter 10: Chemoinformatics in Food Science

Table 10.1 Examples of compounds and related databases of interest in food science.

Table 10.2 Examples of virtual screening of food chemical databases

Chapter 11: Computational Approaches to Cosmetics Products Discovery

Table 11.1 Results of the molecular dynamics simulation, with the number of water molecules at 370 K

Table 11.2 Subset of up- and down-regulated genes potentially connected to the effects of cyclopeptide-5 in cosmetic applications. These are presented as log 2 (ratio), where a value >0 represents an up-regulation and <0 a down-regulation, respectively

Chapter 12: Applications in Materials Science

Table 12.1 Common types of materials chemoinformatics studies [1]

Table 12.2 Statistical results of the best multiple linear regression with expectation maximization and Bayesian-regularized artificial neural networks with Gaussian prior models for the cervical cancer cellular (Hela) uptake of dual-ligand nanoparticles (

N

eff

is number of effective weights (adjusTable parameters) in the model)

Chapter 13: Process Control and Soft Sensors

Table 13.1 List of applications of soft sensors

Table 13.2 Characteristics of TD, MW, and JIT models [10]

Applied Chemoinformatics

Achievements and Future Opportunities

 

Edited by Thomas Engel and Johann Gasteiger

Editors

Dr. Thomas Engel

LMU Munich

Department of Chemistry

Butenandtstraße 5-13

81377 München

Germany

Prof. Dr. Johann Gasteiger

University of Erlangen-Nürnberg

Computer-Chemie-Centrum

Nägelsbachstr. 25

91052 Erlangen

Germany

Cover Design

Dr. Christian R. Wick

University of Erlangen-Nürnberg

Institute for Theoretical Physics I

Nägelsbachstr. 49b (EAM)

91052 Erlangen

Germany

 

All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Library of Congress Card No.: applied for

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.

© 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany

All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.

 

Print ISBN: 978-3-527-34201-3

ePDF ISBN: 978-3-527-80652-2

ePub ISBN: 978-3-527-80654-6

Mobi ISBN: 978-3-527-80655-3

oBook ISBN: 978-3-527-80653-9

Cover Design Grafik-Design Schulz, Fußgönheim, Germany

Thomas Engel

To my family especially Benedikt.

Johann Gasteiger

To all the friends, colleagues and coworkers that ventured with me into the exciting field of chemoinformatics.

And to my wife Uli for never complaining about my long working hours.

If you want to build a ship, don't drum up people to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.

Antoine de Saint-Exupéry

Foreword

Chemistry began with magic. Who but a wizard could, with a puff of smoke, turn one thing into another? The alchemists believed that the ability to transform materials was a valuable skill, so valuable in fact that they devised complex descriptions and alchemical symbols, known only to them, to represent their secret methods. Information was encoded and hidden, suffused with allegorical and religious symbolism, slowing progress. Medicinal chemists today may be particularly interested in a legendary stone called a Bezoar, found in the bodies of animals (if you knew which animal to dissect), that had universal curative properties. I'm still looking. However, to plagiarise a recent Nobel prize-winner for literature, the times they were a changin'. Departing from the secretive “alchemist” approach Berzelius (1779–1848) suggested compounds should be named from the elements which made them up and Archibald Scott-Couper (1831–1892) devised the “connections” between “atoms,” which gave rise to structural diagrams (1858). In 1887, the symbols created by Jean Henri Hassenfratz and Pierre Auguste Adet to complement the Methode de Nomenclature Chimique were a revolutionary approach to chemical information. A jumbled, confused and incorrect nomenclature was replaced by our modern day designations such as oxygen, hydrogen and sodium chloride. The new chemistry of Lavoisier was becoming systematised. The “Age of Enlightenment” created a new philosophy of science where information, validated by experiment, could be tested by an expanding community of “scientists” (a term coined by William Whewell in 1833), placing data at the core of chemistry.