65,65 €
Frontiers in Computational Chemistry (Volume 7) offers a comprehensive overview of the latest advances in molecular modeling techniques for drug discovery and development. This book focuses on key computational approaches such as rational drug design, adsorption studies, quantum mechanical calculations, and molecular interactions in drug development. It provides insights into lead generation, optimization, and the creation of novel chemical entities targeting various biological mechanisms, including inflammation.
The chapters explore modern computational tools and their applications, particularly in low—and middle-income countries (LMICs). The book is essential for researchers, academics, and professionals in computational chemistry, molecular modeling, and pharmaceutical sciences.
Readership: Students and researchers.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 464
Veröffentlichungsjahr: 2024
This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the book/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.
Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].
Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.
In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.
Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]
Computational Chemistry has evolved into a multifaceted discipline, encompassing a wide range of applications from understanding protein-ligand interactions to the development of large nano-carriers for drugs. Frontiers in Computational Chemistry aims to present comprehensive material on the application of computational techniques in biological and chemical processes. This includes computer-aided molecular design, drug discovery and delivery, lead generation and optimization, quantum and molecular mechanics, computer and molecular graphics, as well as the creation of new computational methods and efficient algorithms for simulating a wide range of biophysical and biochemical phenomena, particularly in analyzing biological or chemical activity.
In this volume, we explore five distinct perspectives on the application of simulation methods in drug design and discovery, biosensing, and the elucidation of cellular molecular interactions:
Chapter 1, "In Silico Tools to Leverage Rational Drug Design and Development in LMICs," underscores the significant impact of computational tools on drug discovery and development, especially in low and middle-income countries. This chapter highlights various strategies for drug target selection, optimization of novel drug candidates, and cost-effective drug repurposing.
Chapter 2, "Computational Chemistry in Adsorption Studies: The Cases of Drug Carriers and Biosensors," explores the role of computational methods in designing nanomaterials for drug carriers and biosensors. It provides an overview of adsorption processes, with examples of adsorbent materials (e.g., activated carbon) and the main interactions in adsorbate-adsorbent complex formation, supported by density functional theory.
Chapter 3, "Perspective on the Role of Quantum Mechanical Calculations on Cellular Molecular Interactions," examines how quantum mechanical calculations enhance our understanding of cellular interactions, including metal interactions and hydrogen bonding. The chapter emphasizes the importance of these calculations in studying the Arg-Gly-Asp (RGD) sequence, crucial for cellular binding to the extracellular matrix (ECM). Since cell adhesion to the ECM occurs via integrin-RGD binding, these calculations significantly impact our understanding of cellular adhesion and movement along the ECM.
Chapter 4, "Computational Approaches in Evaluating the 5-HT Subtype Receptor Mechanism of Action for Developing Novel Chemical Entities," focuses on molecular modeling techniques for studying G-protein coupled receptors (GPCRs) and 5-HT receptors related to neurological disorders.
Chapter 5, "Current Trends in Molecular Modeling to Discover New Anti-inflammatory Drugs Targeting mPGES1," highlights the latest advances in computational methods for designing anti-inflammatory drugs targeting mPGES1. Both chapters cover the application of various computational methods, including homology modeling, docking, dynamics, and quantum mechanical/molecular mechanical (QM/MM) approaches for their respective targets.
We hope this volume provides valuable insights and shares advancements in the field of computational chemistry, demonstrating its essential role in the ongoing quest for innovative solutions in drug design and development.
Drug discovery and development is a time-consuming, complex, and expensive process. Usually, it takes about 15 years in the best scenario since drug candidates have a high attrition rate. Therefore, drug development projects rarely take place in low and middle-income countries (LMICs). Traditionally, this process consists of four sequential stages: (1) target identification and early drug discovery, (2) preclinical studies, (3) clinical development, and (4) review, approval and monitoring by regulatory agencies.
During the last decades, computational tools have offered interesting opportunities for Research and Development (R & D) in LMICs, since these techniques are affordable, reduce wet lab experiments in the first steps of the drug discovery process, reduce animal testing by aiding experiment design, and also provide key knowledge involving clinical data management as well as statistical analysis.
This book chapter aims to highlight different computational tools to enable early drug discovery and preclinical studies in LMICs for different pathologies, including cancer. Several strategies for drug target selection are discussed: identification, prioritization and validation of therapeutic targets; particularly focusing on high-throughput analysis of different “omics” approaches using publicly available data sets. Next, strategies to identify and optimize novel drug candidates as well as computational tools for cost-effective drug repurposing are presented. In this stage, chemoinformatics is a key emerging technology. It is important to note that additional computational methods can be used to predict possible uses of identified human-aimed drugs for veterinary purposes.
Application of computational tools is also possible for predicting pharmacokinetics and pharmacodynamics as well as drug-drug interactions. Drug safety is a key issue and it has a profound impact on drug discovery success.
Finally, artificial intelligence (AI) has also served as a potential tool for drug design and discovery, expected to be a revolution for drug development in several diseases.
It is important to note that the development of drug discovery projects is feasible in LMICs and in silico tools are expected to potentiate novel therapeutic strategies in different diseases.
A typical drug discovery process is long, expensive and complex. It traditionally consists of four sequential stages: (1) target identification and early drug discovery, (2) preclinical studies, (3) clinical development and (4) review, approval and monitoring by regulatory agencies (Fig. 1).
Fig. (1)) The drug discovery process encompasses a variety of processes that include target identification and validation, early drug discovery, preclinical studies, clinical development and review, approval, and monitoring by regulatory agencies.The high costs and lengthy timelines associated with drug development have been well-documented in the literature. One of the latest reports estimates a mean value of $1.3 billion in R&D investment required to bring a new therapeutic agent to market, with significant variation by therapeutic area. For example, the cost per drug for nervous system agents rises to $765 million, while anticancer and immunomodulating agents could cost about $2.7 billion per drug [1]. Prior research has estimated that preclinical costs account for, on average, 42.9% of total capitalized drug development costs [2]. Additionally, the development time of a typical innovative drug is around 12 to 15 years, with 7 to 9 years typically spent in the early drug discovery and preclinical phases [3].
The complexity of the drug development process has also been well-established. This process requires expertise from various scientific fields, including chemistry, pharmacy, physics, biochemistry, and medicine [4]. Despite these challenges, there remains a compelling necessity to search for new therapeutic agents to address a plethora of unmet medical needs worldwide, including certain types of cancer, rare diseases, neglected tropical diseases, antibiotic resistance, immunological disorders, and neurodegenerative diseases.
Importantly, research has highlighted the imbalance between disease burden and global health research attention; with diseases more prevalent in high-income countries receiving significantly more research focus. This imbalance contributes to widening healthcare access inequalities globally [5, 6]. In line with this, it is important for low- and middle-income countries (LMICs) to pursue research projects addressing their own unmet medical needs that otherwise would not be tackled.
Overall, modern drug discovery has shifted towards more rational, knowledge-driven approaches that leveragecomputational tools, structural biology, and a deeper understanding of disease mechanisms. These strategies aim toimprove the efficiency and success rate of identifying promising therapeutic candidates. These include computationaland in silico techniques like virtual screening and molecular modeling, which allow researchers to rapidly evaluatelarge chemical libraries and identify promising compounds even before physical screening. Fragment-based drugdiscovery, which starts with smaller molecular fragments, and target-based approaches that design compounds tomodulate specific disease implicated proteins, have also become more prevalent. Additionally, drug repurposingstrategies that leverage existing approved drugs have proven to be a rational and efficient path to new indications.
This book chapter aims to highlight different valuable computational tools that can be used to accelerate research and reduce drug development costs, particularly focusing on early drug development stages. It provides an overview of rational drug discovery strategies, including computational techniques, fragment-based design, target-based screening, and drug repurposing. This chapter seeks to provide a guide for researchers, particularly those without extensive computational expertise, to explore the various rational tools and techniques that can be applied to advance their drug discovery projects, with a focus on anticancer drug development.
As mentioned before, the drug discovery process is long and complicated. Before starting a drug discovery project, there is a pre-discovery phase where a disease or condition with an unmet medical need is identified. In this regard, this phase entails the study of the underlying mechanisms of the disease. Usually, this part of the process includes a state-of-the-art revision and gathering detailed information about the molecular basis of the pathology, if available. This can lead to a hypothesis that the inhibition or activation of one or more components of this characterized mechanism will result in a change in cell behavior and affect disease progression in a particular patient population. This or these components become putative molecular targets [7].
As a first step of the drug discovery process, it is important to identify the most suitable molecular target. In one particular disease, there may be several putative targets, but many may be undruggable, while others may be involved in a particular disease-associated mechanism but its modulation may not have a direct effect on disease progression. Selecting the right target is extremely challenging. In fact, it has been shown that one of the main reasons for the lack of clinical efficacy of novel drugs involves poor target validation and selection [8]. Some key recommendations are defined by the Guidelines on Target Assessment for Innovative Therapeutics (GOT-IT) working group, in a valuable document that gives a robust framework for the process of target selection and prioritization, especially for academic research [9].
Upon target selection, the next step is to identify novel molecules able to modulate their activity to be able to interfere or prevent the progression of the disease. In this early drug discovery phase, there are several strategies that can be used and combined, such as high throughput screening (HTS) and computer-aided drug design (CADD), among others. After identifying a candidate, known as hit, this molecule suffers from systematic modifications in its structure to enhance potency, improve its physicochemical characteristics, and also to reduce the possibility of unwanted effects. After rounds of testing, the candidate drug is identified and the stage of preclinical testing starts. This promising candidate is evaluated in different in vitro and in vivo models, to determine preclinical efficacy, safety, tolerability, pharmacodynamics (PD) and pharmacokinetics (PK) and also possible biomarkers to be used in a putative clinical setting. If the candidate drug is effective and safe, it is eligible to be taken forward for clinical trials. In the clinical development stage, safety and tolerability are tested in Phase I; while in Phase II trials efficacy and dosing are determined. Finally, in Phase III, the efficacy of the drug candidate is evaluated in a larger patient population. Drug candidates that show therapeutic effectiveness, safety, and adequate pharmaceutical quality are revised by regulatory agencies and they may be approved. Finally, these approved new drugs are subjected to follow-up studies (Phase IV), which can change the labeling of the novel drug, or include other observations such as drug-drug interactions [10].
All the aforementioned rational approaches serve as a valuable roadmap for modern drug development projects. However, it is important to note that the discovery of many drugs known and used today was not always based on a pre-existing idea of specific targets and pharmacological mechanisms for a particular disease. Rather, a significant portion of drug development has historically followed a more empirical, serendipitous paradigm.
The rational approaches to drug discovery that have emerged in recent years offer several key advantages but also come with some potential drawbacks that must be carefully considered.
On the positive side, these rational methods have demonstrated improved efficiency and success rates in identifying viable lead compounds. By leveraging computational screening, fragment-based design, and a deeper understanding of disease biology and molecular targets, researchers can narrow down the pool of candidates before starting wet lab testing. This can accelerate the identification of mechanism-based therapeutics that are more likely to have favorable pharmacological profiles and reduced off-target effects. The detailed target information generated through rational approaches is also valuable for regulatory approval processes.
However, some notable disadvantages exist as well. Many of the computational models and algorithms underpinning rational drug design are highly complex, requiring specialized expertise and robust technological infrastructure that may not be readily available, especially in resource-constrained settings. There is also the potential for inherent biases or limitations in the data and assumptions used in these in-silico approaches. Additionally, there is a risk of overlooking serendipitous discoveries that could arise from more exploratory, phenotypic screening methods.
These key advantages and disadvantages highlight the importance of considering a balanced, multi-pronged strategy for drug discovery. While the rational approaches mentioned in this chapter can have a tremendously positive impact, they may need to be complemented by other empirical techniques such as manual or visual selection of drug candidates to maximize the chances of successful drug development.
The mining of available biomedical data and information has greatly boosted target discovery in the large-scale “omics” era. Therefore, computational tools in LMICs are affordable strategies that play a vital role in advancing biomedical research and addressing public health priorities in regions with high poverty, inequality, and disease burden indicators.
The process of finding novel drugs and biomarkers for human diseases starts with target discovery. In the biomedical field, the term “target” refers to a wide variety of biological events, including molecular functions, pathways, and phenotypes, that may involve different molecular entities such as genes, proteins, and RNAs. In this regard, data mining pertains to a bioinformatics methodology that blends biological principles with computer instruments or statistical techniques, mainly employed for target discovery, selection, and prioritization. In most drug discovery projects nowadays, physical (HTS), and computational-aided drug design (CADD) are frequently combined to enhance the drug development process success [11].
The process of target identification involves studying the mechanism and points of intervention in the disease or condition of interest and verifying that a potential component is important in initiating and/or maintaining the disease. Strategies and approaches encompass a wide range of technologies available to study diseases, such as molecular biology, functional assays, image analysis, and in vivo studies related to functional assessment, among others. Some techniques used in this phase include data mining [12], phenotype screening [13] and epigenetic, genomic, transcriptomic and proteomic methods [13, 14].
After identifying putative targets, prioritization approaches [14] serve as key tools to rank these targets based on their likelihood of being suitable targets in the context of a specific disease. There are several computational approaches availa- ble that include network-based approaches, phenome-wide association studies involving genetic evidence, machine learning methods, among others.
Finally, potential targets must then be validated to determine whether they limit disease progression or induction. Establishing a strong link between target and disease increases confidence in the scientific hypothesis and thus success and efficacy in later phases of the drug discovery process [15]. The validation of a particular target involves the technical assessment of whether a target plays a key role in a disease process and whether pharmacological modulation of the target could be effective in a defined patient population. Expression and enzymatic assays are commonly used techniques, and also the development of knockout/knock-in animal models using antisense/SiRNA and genomic strategies. Translation between humans and animals can also be an important feature to build confidence in the development of screening assays for lead molecule identification [16].
Regarding target identification, prioritization and validation, several computational methods have shown to be useful, faster, less biased and more informative since they are able to integrate and analyze available data systematically. However, it is important to take into consideration that the estimation of the predictive power of the method is necessary to be able to validate the performance of the model. In fact, target prediction methods can have a profound impact on the success of a drug discovery process [17]. Next, we describe some of these valuable tools regarding target selection.
Text mining has been widely applied to identify disease-associated entities (genes/proteins) and understand their roles in disease. In fact, one important strategy is to build a network of specific gene interactions that have shown to have a key role in a specific disease. The development of such a network commonly starts with a literature search of text articles stored in PubMed Central (PMC), based on dependency parsing and the support vector machine (SVM) method.
The databases with permissive licenses, are namely manually curated associations from Genetics Home Reference (GHR) [18] and UniProt Knowledgebase (UniProtKB) [19], genome-wide association studies (GWAS) [20] results from DistiLD, and mutation data from the Catalog of Somatic Mutations in Cancer (COSMIC) [21].
Using text mining, it is possible to assign curated bibliographic sources and comparable quality scores to each stored information, allowing it to be downloaded massively when necessary or required. The information will then be available as a web resource (e.g.http://diseases.jensenlab.org/) aimed at end users interested in individual diseases or genes.
An illustrative example of these approaches is reported by Pospisil et al. [22] where the combination of textual-structural mining of searches of PubMed abstracts, universal gene/protein database (UniProt, InterPro, NCBI Entrez) and pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) are used to identify putative enzymes in the extracellular space of different tumor types. In this regard, the LSGraph program, a popular tool for bibliographic mining, is utilized to extract entities from a curated database using keywords and Gene Ontology (GO) terms. These entities have been updated and expanded with relevant functional annotations, and then categorized based on cellular locations and biochemical functions within the Ingenuity knowledge base.
Another commonly used mining tool is GeneWays. It is designed to automatically analyze a large volume of full-text research articles in order to predict physical interactions (edges) between candidate disease genes (seed nodes) that are mentioned in the literature [23].
Text mining is a valuable tool for extracting biological entities and knowledge from a vast number of research articles. However, there are still some challenges that need to be addressed. One issue is the variability in terms used to describe biomedical concepts, which can result in incorrect associations between molecular biology and human diseases. Another limitation is the limited access to full-text articles and citation information, as more comprehensive and detailed information is often found in the full text rather than in abstracts. This can lead to an underestimation of the number of entities identified through text mining.
Microarray data mining involves the utilization of bioinformatics techniques to analyze microarray data and identify biological elements and pathways that characterize a specific phenotype, such as a human disease [24, 25]. With the generation of large amounts of microarray data, it has become increasingly important to address the challenges of data quality and standardization related to this technology [26]. Presently, microarray data mining has demonstrated its efficacy in identifying target genes linked to human diseases.
There are two common methods for in depth microarray data analysis, i.e., clustering and classification [27, 28]. Clustering is one of the unsupervised approaches to classify data into groups of genes or samples with similar patterns that are characteristic to the group. K-means clustering is a data mining/machine learning algorithm used to cluster observations into groups of related observations without any prior knowledge of those relationships. It is one of the simplest clustering techniques and it is commonly used in medical imaging and biometrics [29]. On the other hand, a self-organizing map (SOM) is a neural network-based non-hierarchical clustering approach.
Classification is known as class prediction or discriminant analysis. Generally, classification is a process of learning-from-examples. Given a set of pre-classified examples, the classifier learns to assign an unseen test case to one of the classes [30]. In a typical supervised analysis, the overall gene expression profiles of tissues or fluids associated to a certain disease will be compared with those of normal tissues or fluids (e.g., cancer vs. healthy tissues or fluids), from which a list of target genes or biological pathways that are important in diseases will be identified. In this type of analysis, it is important to complement with the use of supervised classification methods such as linear discriminant analysis, nearest neighborhood search and genetic algorithms [31].
Differentially expressed genes are the genes whose expression levels are significantly different between two groups of experiments [32].
Some computational methods are used to analyze microarray data. One of them is named Gene Set Enrichment Analysis (GSEA). GSEA is employed to determine the statistical significance and consistency of differences in gene expression between two biological states. Gene sets are constructed based on prior biological knowledge, including information on biochemical pathways, genes located in the same cytogenetic band, those sharing a common Gene Ontology category, or any user-defined set. The main aim of GSEA is to identify whether the genes within a set are predominantly found at the top or bottom of a ranked list, indicating a correlation with the phenotypic class distinction [33]. GeneCards (http://www.genecards.org/) is another tool commonly used on a daily basis where the retrieved genes are organized by putative function.
In spite of its advantages, microarray data mining also presents a number of limitations and challenges for target discovery [34]. First, data mining of a list of target genes is not the end of genomic analysis, and since gene expression levels do not always correlate with protein levels, follow-up experiments are required to validate protein expression levels and protein functions [35]. Second, microarray data exist at a variety of scales depending on the specific technology platform as well as the individual experimental procedures. Therefore, microarray data from different laboratories are not always directly comparable. Third, data availability and integration can be a challenge for microarray data mining. In the post-genomic era, the explosion of gene expression data requires timely data storage and updating of gene databases. Moreover, different data storage formats across databases have posed a great challenge for data mining and analysis [36].
Basic microarray data analysis tasks include the classification, clustering, and identification of differential genes using gene expression profiles exclusively. Nonetheless, the connection of gene expression profiles with external sources can facilitate the emergence of new findings and information.
Currently, driven by the exponential growth of microarray data in recent years, considerable effort has been made to develop microarray databases with timely public accessibility in a manner that facilitates target discovery. Indeed, the identification of functional elements such as transcription-factor binding sites (TFBS) on a whole-genome level is the next challenge for genome sciences and gene-regulation studies.
The Open Proteomic Database (OPD) and the EMBL Proteomic Database (PRIDE) have been released to the public, providing access to valuable proteomic datasets (Open Proteomic Database (OPD) (http://bioinformatics.icmb.utexas.edu/ OPD/) and EMBL Proteomics Identifications Database (PRIDE) (www.ebi.ac.uk/ pride/)).
In order to uncover diagnostic signature patterns, various mining methods such as Bayesian analysis, rule-based analysis, and similarity scoring have been suggested to these computational methods [37].
These databases serve as a valuable resource for researchers working in the field of proteomics, enabling them to access and analyze mass spectrometry data for diverse biological studies (http://bioinformatics.icmb.utexas.edu/OPD/).
Chemogenomic data mining is an emerging approach in the field of data mining. This innovative technology focuses on interpreting chemical genomics data and analyzing various phenotypes of interest, including viability, cell morphology, behavior, and gene expression profiles. By utilizing small molecule chemical libraries in conjunction with cell libraries, researchers are able to generate a 2D matrix that represents the chemical library on one axis and the library of different cell types on the other axis [38]. However, the process of chemogenomic data mining presents certain challenges, which have prompted the development of specialized mining tools and methods. These tools aim to systematically profile and analyze the data [39]. To achieve this, several supervised or unsupervised clustering algorithms have been proposed. These algorithms help identify a subset of genes within the entire dataset that possess significant functions.
The identification of potential targets in the field of medicine is a challenging task, mainly due to the intricate nature of human diseases and the diverse range of biological data available. It is widely acknowledged that no single data mining approach can fully comprehend the complex cellular mechanisms and reconstruct biological networks, both now and in the foreseeable future. Therefore, in order to enhance the discovery of valuable targets, it is imperative to integrate and analyze data from various sources and disciplines, while considering the strengths and limitations of each approach. Among the most commonly employed strategies is the combination of text mining with high-throughput data analysis, such as genomic, proteomic, or chemogenomic data [40]. This integration has proven to be instrumental in the identification of disease markers and potential drug targets.
Publicly available curated databases such as Gene Expression Omnibus (GEO), GWAS central (https://www.gwascentral.org/) and GWAS catalog (https://www.ebi.ac.uk/gwas/), Clinvar (https://www.ncbi.nlm.nih.gov/clinvar/), Disgenet (https://www.disgenet.org/) and genomic datasets are key tools to extract and analyze population-based and evidence-based targets associated with different diseases [41, 42].
As an interesting example, an in silico molecular study of asthma-associated targets in LMICs has been reported. This research leveraged computational techniques to evaluate thousands of asthma-associated molecular targets and patient expression datasets to identify the most relevant therapeutic targets. In this study, the authors used a proprietary tool called Ontosight® Discover (https://ontosight.ai/) (US20200090789A1) to annotate asthma-associated genes and proteins. In addition, they also collected and evaluated asthma-related patients’ datasets through bioinformatics- and machine learning-based approaches to identify the most suitable targets. However, the disadvantage is that they are not free, the users should pay to work with them. Even though this report used licensed software, it is an important example of target identification using valuable computational tools.
The implementation of various of the aforementioned bioinformatic tools leads to the identification of several putative targets, but computer-based target prioritization methods have been less studied so far [43]. However, several useful tools have been described. Of particular interest, these are platforms that systematically integrate and harmonize different databases comprising different information types to finally prioritize targets within a particular disease. The underlying strategy varies depending on the platform used, as well as the scoring criteria and the visualization interphase.
One example is DisGeNET, an open-access platform that integrates databases with text-mined data and then features a score based on the supporting evidence to prioritize gene-disease associations [44]. Additionally, GuiltyTargets presents an association approach that uses a genome-wide protein-protein interaction network annotated with disease-specific differential gene expression and uses positive-unlabeled machine learning for candidate ranking [43]. Of interest, the OpenTagets platform is a public-private partnership to establish an informatics platform that associates targets and diseases, providing tools to prioritize these target-disease hypotheses [45]. Other examples of interesting resources for target prioritization are PHAROS [46] and TargetMine [47], among others.
Target validation is a key part of target selection. Once the target is identified, it is important to confirm the causal link between a potential target and a specific disease phenotype. For this purpose, there are many datasets used to establish the association of the target with the disease using a computational approach [48]. The application of advanced molecular techniques has led to an increased amount of genomic and lately proteomic data [49]. Therefore, bioinformatic tools play an increasingly important role in the target validation process, where biomedical knowledge leading to biological functions of putative targets can be mined from different databases.
In silico analysis involves, as a part of the analysis, the evaluation of differential expression patterns in groups of patients or between normal or diseased cells or tissues. There are several platforms, mainly derived from microarray data, that gather information about expression levels, polymorphisms, mutations, etc. Alongside this, it is important to address where the expression of the putative molecular target is localized. Large-scale omics data also serves as a key input to analyze this issue.
Some important bioinformatic tools to aid this important issue include but are not limited to, the Drug Gene Interaction database (DGIdb) [50]. Therapeutic Target Database [51], and many of the databases cited before include Gene Expression Omnibus (GEO) [52], OpenTagets [53], and TargetMine [54], among others.
Lastly, it is imperative in the validation process to include functional assays to substantiate the target´s role in a disease mechanism or biological effect, using relevant in vitro and in vivo models and high-quality patient tissue samples.
Computational tools have become a standard approach for accelerating drug development and discovery in the pharmaceutical industry, mainly in LMIC countries. In this regard, a plethora of chemoinformatic tools have emerged in the last decades. “Chemoinformatics” refers to the integration of software for chemical information processing. It bridges the fields of chemistry and computer science. Chemoinformatic can be divided into the following areas of analysis:
1- Selection of biological targets and collection of possible ligand datasets: this involves characterization of the compound by searching freely available data sources, extracting molecule information from databases as well as substructure and similarity search. This allows the extraction of substructure fragments or other chemical descriptors [55].
2- Selection and prioritization of chemical characteristics: finding chemical fingerprints that represent chemical characteristics of the compound, which allows comparing different compounds based on shared chemical characteristics.
3- Study and evaluation of model prediction: the identification of chemical fingerprints in the previous point can be used in several machine learning models to predict other chemical and physicochemical properties in the QSAR/QSPR (further explained below) analysis from the three-dimensional chemical structure [56, 57].
4- Compound optimization and hit identification: The structural features of the 3D compounds identified allow for determining their level of chemical similarity. Finally, the 3D Tanimoto index, a widely used 3D similarity metric, calculates the ratio of shared molecular volumes between two ligands [58]. Statistical models can be used to train such models, which can then make inferences from the training data by comparison [59].
Finally, this workflow allows the identification and optimization of potential drug candidates using virtual screening, dynamic simulation, and docking simulation experiments or a combination of these approaches [59, 60].
Virtual screening involves rapidly screening large chemical libraries to identify compounds with desirable characteristics. On the other hand, molecular dynamics simulations predict the behavior and interactions of molecules in large biological systems, while docking simulations predict the binding affinity of small compounds to their target proteins. These methods are responsible for greatly accelerating the process of new drug discovery and increasing the rate at which promising candidates are found [61] and are further described below.
Despite the precise and rigorous nature of the drug discovery process, these types of projects come with their share of challenges. In fact, the attrition rate is high. Many compounds that start the journey do not make it to the end due to various reasons, including failure in efficacy, unexpected side effects, or commercial considerations. Therefore, several alternatives are used to reduce the failure risk, being drug repurposing strategies the most relevant, particularly in LMICs. Drug repurposing involves exploring new uses for existing drugs, a faster and more cost-effective approach. Repositioning schemes for already existing drugs are interesting since the use of validated, toxicologically safe, and approved pharmaceuticals can increase the success rate of drug development [62-64].
A drug repurposing strategy usually consists of three steps: identification of a candidate drug for a given indication, mechanistic assessment of the drug effect in preclinical models; and further evaluation of efficacy in clinical trials. For the first step, computational approaches are typically used and usually involve systematic analysis of large-scale data.
The hypothesis of identifying a potential drug for a given disease can be based on signature matching. This approach is based on the comparison of a particular drug candidate against another drug (drug-drug similarity), disease (estimating drug-disease similarity), or clinical condition using transcriptomic, proteomic, metabolomic data; chemical structures or adverse effects. In this regard, it is important to mention the connectivity Map (cMap) resource, a popular platform designed for data-driven drug repositioning using a large transcriptomic compendium [65]. Other approaches widely used are molecular docking, pathway mapping, retrospective clinical analysis, etc. [64]. Some of these tools are described below.
Once the target is identified, several bioinformatics tools can provide its predicted structure. Commonly used targets are proteins; in this regard, there are multiscale models that take into consideration structural and thermodynamic features that can be used to model the 3D structure of the protein target [66]. For this purpose, some key tools are listed in Table 1.
Once the target´s structure is available, identifying putative binding sites is necessary to properly carry out the drug design phase. For binding site identification and analysis, several tools are available, such as SiteHound [67] and fPocket [68].
After defining the desired drug binding site, virtual screening can be used to discover new ligands on the basis of the biological structure of the target protein [69]. This computer-aided approach significantly reduces the possible number of candidates for in vitro and in vivo testing, providing key opportunities for drug discovery in LMICs.
There are several drug design methods, but there are mainly two approaches widely used: structure-based drug design and ligand-based drug design. In the first approach, experimental structural data provided by crystallography, NMR, etc. enable prediction methods of the protein 3D structure with high precision. Molecular modeling software can be used to analyze the physicochemical properties of the selected drug binding sites on the target protein, analyzing key residues, electrostatic and hydrophobic fields, and hydrogen bonds [70]. On the other hand, ligand-based drug design relies on known ligands that bind to the selected target. Using this molecule, different computational models are used to design novel molecules that interact with the target.
Virtual screening is a relevant component of computer-aided drug design. The main purpose is to predict which compound could present pharmacological activity, reducing the amount of compounds actually being tested experimentally. Some common methods of virtual screening include molecular docking, pharmacophore modeling, and quantitative structure-activity relationship (QSAR).
In molecular docking, the software predicts the interaction patterns between the target and small molecules or peptides taking into consideration spatial shape, energy matching, and molecule conformation. The list of the freely accessible is extensive and many are web-based resources. Some of them include AutoDock [71], AutoDock Vina [72], RosettaLigand [73], GlamDock [74], and EDock [75]. Many other platforms are reviewed in Glaab E. in a thorough report [76]. Interestingly, in the last years, several open-source platforms have been developed to carry out virtual screening of ultra-large libraries of compounds, such as VirtualFlow [77].
Pharmacophore modeling is a technique that identifies the essential 3D features of a molecule that are necessary for its biological activity. It involves abstracting the key structural and electronic characteristics of a set of active compounds into an idealized 3D model [78]. Some useful open-access platforms for virtual screening based on pharmacophores are Pharmit [79], DrugOn [80] and ZINCPharmer [81] among others.
Finally, QSAR (Quantitative Structure-Activity Relationship) modeling is used to develop predictive models that can correlate chemical structure with biological activity. These QSAR models are then applied to the virtual screening of large chemical databases to identify promising compounds for further experimental testing. There are several freely available tools to derive QSAR models for virtual screening of the biological activity of chemical compounds, such as DPubChem [82], the workflow described by Mansouri et al. [83], among many others [84].
There is a wide variety of chemical databases containing small molecules available for virtual screening approaches. Comparative studies have shown that there is a substantial overlap among the different chemical collections nevertheless each database has unique features that can be more suitable for a certain project. Already prepared virtual libraries can be used, but users can also generate their own. Some of them are freely available and have already been filtered regarding “drug-likeness” while others encompass a wider part of the chemical space. Some of these collections are directly available from the vendor ligand catalogs, assuring the possibility of being purchased right after hit selection. Many databases include novel structures, while other libraries comprise natural products [85] or approved drugs for repurposing projects [86]. It is important to mention that many of these databases need to be prepared and filtered before virtual screening methodologies. Some of them are listed in Table 2.
Once a hit is identified, computational chemistry and molecular modeling play an important role during the hit-to-lead (H2L) stage by both suggesting putative optimizations and decreasing the number of compounds to be experimentally evaluated.
Typically, H2L involves chemical modifications of the validated hit to optimize its affinity for the target to become a lead compound. Usually, this stage of the drug discovery process can be time-consuming and expensive. Trial-and-error strategies involve cycles of compound synthesis, evaluation, and selection or rejection with the aim of reaching a suitable affinity and maintaining its selectivity. Different computer-aided approaches can be used to improve H2L that have already been described, such as QSAR, molecular docking, and pharmacophore screening, among others.
Several authors have described some of the most important algorithms to tackle H2L optimization, such as LigBuilder [100], AILDE [101, 102], and ChemoDOTS [102], among others [103].
Diverse methods have been utilized to successfully implement chemoinformatics in drug discovery. Many of them have been described before, but there are many other key tools that can be used for data mining, visual screening, and structure-activity relationship studies. We have listed some of these tools below:
ChemDraw is a Macintosh and Microsoft Windows program first developed by David A. Evans and Stewart Rubenstein in 1985 and later by the chemoinformatics company CambridgeSoft. This is a molecular editor tool that, along with Chem3D and ChemFinder, is part of the ChemOffice suite of programs [104].ChemReader is a fully automated tool that extracts chemical structure information from images in research articles and translates that information into standard chemical formats that can be searched and analyzed [105].ChemSketch is a molecular modeling program that allows the drawing and modification of structures of chemical compounds and structural analysis that includes understanding chemical bonds and functional groups.ChemWindow is a program developed by Bio-Rad Laboratories, Inc. that allows drawing chemical structures, 3D visualization, and database searching.Chemistry Development Kit (CDK) is a JAVA software for use in bioinformatics and chemoinformatics available for Windows, Linux, and Macintosh. The program allows 2D molecular generation, 3D geometry generation, descriptors, and fingerprints calculation and supports various chemical structure formats [106].ChemmineR is an R language chemoinformatics program for analyzing small molecule drug-like compounds data and enables similarity searching, clustering, and classification of chemical compounds using a wide range of algorithms [107].JME molecular editor is a JAVA applet that allows the creation and modification of chemical compounds and reactions and can display molecules within an HTML page [108].Molecular Operating Environment is a scientific vector language-based software program the applications of which include structure- and fragment-based design, pharmacophore synthesis, protein and molecular modeling and simulations in addition to cheminformatics and QSAR.Open Babel is a software that is used for the interconversion of chemical file formats. It also allows substructure searching as well as fingerprint calculation [108]. It is available for Windows, Linux, and Macintosh.OpenEye is a drug discovery and design software kit and its areas of application include the generation of chemical structures, docking, shape comparison, cheminformatics, and visualization. OpenEye toolkits are available in multiple programming languages that are C++, JAVA, and Python [109].Chemaxon provides various chemoinformatics software programs, applications, and services for drawing structures of chemical compounds and their visualization, searching and management of chemical databases, clustering of chemical compounds, and drug discovery and design [110].Online Chemical Modeling Environment (OCHEM) is a web-based platform designed to automate and simplify the typical steps required for QSAR modeling. It aims to provide users with a comprehensive tool for data storage, model development, and the publication of chemical information. OCHEM offers features such as estimating the accuracy of predictions, providing applicability domain assessment, and allowing users to seamlessly integrate predictions with other approaches. The primary objective of OCHEM is to consolidate a comprehensive range of chemoinformatics tools into a single, accessible, and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu [111].Various other tools such as PowerMV, PaDEL, CDD (Collaborative Drug Discovery), RDKit, 3D-e-chem, MedChem Studio, MedChem Designer, Mol2Mol, Chimera, VMD, ArgusLab, ChemTK, Premier Biosoft and many others are also widely used for chemoinformatics applications.
The pharmacokinetic (PK) profiles of new chemical entities are key determinants of efficacy and toxicity. These profiles are governed by absorption, distribution, metabolism, and excretion (ADME). Each ADME parameter can be assessed experimentally using certain experimental settings. Absorption is evaluated mainly using solubility and membrane permeability, while distribution can be tested using protein binding, tissue binding, and P-glycoprotein binding. Metabolism can be determined by using liver microsomes or by hepatic clearance and finally, excretion can be established by using renal clearance and urinary excretion rates.
Interestingly, ADME prediction using in silico models has become essential in the drug discovery process. There are several IT companies that have developed robust predictive platforms for various ADME parameters, however, due to high licensing fees there is limited access to these commercial software [112]. Importantly, freely available prediction platforms have been developed, and some of them are listed in Table 3.