35,08 €
This reference provides a comprehensive overview of computational modelling and simulation for theoretical and practical biomedical research. The book explains basic concepts of computational biology and data modelling for learners and early career researchers. Chapters cover these topics:
1. An introduction to computational tools in biomedical research
2. Computational analysis of biological data
3. Algorithm development for computational modelling and simulation
4. The roles and application of protein modelling in biomedical research
5. Dynamics of biomolecular ligand recognition
Key features include a simple, easy-to-understand presentation, detailed explanation of important concepts in computational modeling and simulations and references.
Readership
Undergraduates and graduates in life sciences, bioinformatics, data science, computer science and biomedical engineering courses. Early career researchers.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 283
Veröffentlichungsjahr: 2024
This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the book/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.
Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].
Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.
In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.
Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]
The structure and function of biological macromolecules and their ligands are crucial for understanding physiology, pathological physiology, molecular pharmacology, and drug design. The fact that molecules consist of atoms and the theory of chemical bonding developed slowly. Now, it is clear that understanding the structure and function of biological macromolecules requires experimental and computational approaches that are tightly interconnected. Currently, no high-resolution protein and nucleic acid structures can be obtained without refinement that, by definition, involves biomolecular simulation.
A Greek philosopher, Democritus (~460-370 B.C.E.), first gave the idea that all matter consists of atoms, the smallest particles that cannot be subdivided into smaller substances. In the 1800s, the theories of molecular structures were modeled using equal and unequal spheres. The first physical molecular model by August Wilhelm von Hofmann in the year 1865 showed that the size of the carbon atom was different from that of hydrogen or chlorine atoms using methane, ethane, and methyl chloride as models. After the introduction of stereochemistry, Jacobus Henricus van’t Hoff built the three-dimensional model of tetrahedral molecules in the year 1874. Some 20 years later, electron was discovered by Joseph John Thomson, who was later awarded the Nobel Prize in Physics in 1906 “for his theoretical and experimental investigations on the conduction of electricity by gases”, which explain the various sizes of different atoms. Later, physicist Ernest Rutherford proved his discovery of the existence of protons and published an article titled “Collision of α particles with light atoms. IV. An anomalous effect in nitrogen” in Philosophical Magazine Series 6, 37 (1919) 581-587. His work earned him the Nobel Prize in Chemistry in 1908 “for his investigations of the disintegration of the elements and the chemistry of radioactive substances”. The Nobel Prize in Physics (1933) was awarded to Erwin Schrödinger and Paul Adrien Maurice Dirac “for the discovery of new productive forms of atomic theory”. The key results from Schrödinger and Dirac’s study have made the landmark in modern-day’s quantum mechanics calculation. Berni J. Alder and Tom E. Wainwright performed the first molecular dynamics (MD) simulation of simple gases. The MD simulation of the first protein was reported in 1977.
When a molecular system is too large to represent explicitly all electrons in the calculation, one must proceed with molecular mechanics, and on top of that, thermal averaging by molecular dynamics simulation is typically necessary. Molecular mechanics uses classical mechanics to calculate a system of molecules and apply the algebraic expression for the total energy of a system without computing the total electron density. It is thus feasible and faster with the assumption that the Born-Oppenheimer approximation is valid. This has led to the simulation of a 58 amino acids bovine pancreatic trypsin inhibitor that showed this protein has an internal motion of a fluid-like nature. One of the important conclusions has been made- the dynamic fluctuation of the protein needs to be included and applied in the biological processes modeling. The fundamentals that enable the protein simulations were recognized by the Nobel Prize in Chemistry (2013), which was awarded to Martin Karplus, Michael Levitt and Arieh Warshel “for the development of multiscale models for complex chemical systems”. Earlier, in relation to simulation, the Nobel Prize in Chemistry (1998) was awarded to Walter Kohn and John A. Pople “for his development of the density-functional theory” and “for his development of computational methods in quantum chemistry”, respectively. The revolution of computational simulations to mirror real life has thus become crucial for the consequences of advancements in chemistry nowadays.
All living things are made of cells with the basic chemical elements i.e., carbon, hydrogen and nitrogen, that account for most of the mass in an organism. Hence, the study of chemical processes within the cell elucidates the molecular basis of biological activities i.e., molecular interactions/recognition, mechanisms, modification and synthesis. Biological macromolecules that are made up of monomers of amino acids, lipids, nucleotides and sugars are involved in such biological activities to carry out life processes. The conformation of these macromolecules determines their biological activities especially proteins that perform a vast array of the organism’s functions. The discovery of X-rays in the year 1895 by Wilhelm Conrad Röntgen that earned him the Nobel Prize in Physics 1901 provided crystallographers with a powerful tool to “see” atoms from crystals. The three-dimensional structure of the protein has been extensively studied. Max Ferdinand Perutz and John Cowdery Kendrew were awarded the Nobel Prize for Chemistry in 1962 “for their studies on the structures of globular proteins”. In the same year, the Nobel Prize in Physiology or Medicine was also awarded to Francis Harry Compton Crick, James Dewey Watson and Maurice Hugh Frederick Wilkins “for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer in living material”. Their discovery of the double helix molecular structure of nucleic acids has since been used to postulate the base pairing of nucleic acids. Dorothy Crowfoot Hodgkin received the Nobel Prize in Chemistry in 1964 “for her determinations by X-ray techniques of the structures of important biochemical substances”. Their works have since initiated the development of techniques to improve the structure solving of biological macromolecules. The experimentally solved biological molecules have thus been deposited in the Protein Data Bank (PDB), which was established in the year 1971.
Considering the challenges in the crystallization process, nuclear magnetic resonance (NMR) spectroscopy of biological molecules in solution was developed, and Kurt Wüthrich was awarded the Nobel Prize in Chemistry in 2002 “for his development of nuclear magnetic resonance spectroscopy for determining the three-dimensional structure of biological macromolecules in solution”. With technology progression, large biological complexes that cannot be captured by X-ray crystallography are now able to be imaged using cryo-electron microscopy (cryo-EM) technology pioneered by Jacques Dubochet, Joachim Frank and Richard Henderson, who were awarded the Nobel Prize in Chemistry in 2017 “for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution”. As of May 2023, all the above-mentioned experimentally solved biological molecules have accumulated to more than 200,000 structures in RCSB PDB- the worldwide three-dimensional structure repository database.
Together with the biological processes modeling, these biological structures' determination has laid the groundwork for computation simulation on the biological system to close the sequence-structure gap whereby the behavior of biological macromolecules can be more accurately predicted. The combination of wet laboratory experiments and dry laboratory calculations enables the structure-property relationship to be hypothesized, thus revolutionizing the design of new medicines and novel medical interventions. Years back, biomedical research was an experimental investigation that could only be performed by in vitro and in vivo means. Nowadays, breakthroughs have been made in the processing, analysis and interpretation of biomedical data when biology is combined with the disciplines of chemistry, physics, computer science, information technology, mathematics and statistics. Converting biomedical data into meaningful information involves scripting and executing programs. These programs were developed from theoretical calculation together with artificial intelligence, thus enabling the access and management of various biological information, including raw sequence data, structures and images. The algorithm's development has also allowed the relationship measurement amongst biological information in order to increase the understanding of biological processes.
The life activities of an individual start from the transcription of DNA into RNA to protein translation and expression, whichleads toward complex biological pathways and networks. Hence, any alteration in the processes can lead to various medical diseases and affect the life of the individual. Additional external factors, such as pathogens, further worsen the conditions due to multifactorial diseases, thus challenging the management and treatment of the diseases. Biomedical research driven by computational approaches has demonstrated its practical value as analysis without bias can be structured to narrow down the knowledge gaps. As the year progresses, the techniques also progress. The advances in the future will dwarf those of the past. The advances in biomolecular structure determination and simulation techniques have significantly improved the quality of life in the past century and will continue to make sense in the process of life in the coming centuries.
This book first narrates the computational tools that were generally used in biomedical research in sequence and structure studies. The computational tools in biological cellular activity and system biology are also summarized. The overall picture of molecular modeling in related biomedical research with examples has shown increased practicality in studying a wide variety of problems.
In the following chapter, the big data interpretation is included. The genomic data analysis on expression data, circular RNAs, RNA interference, and microglial brain-derived neurotrophic factor are explained in this chapter. A list of publicly available databases for genomic data is also listed. Several diseases are used as examples in data analysis, specifically RNAs that aid the targeted therapies.
The development of computer-based algorithms and protocols has accelerated the pace of closing the gap in sequence-structure knowledge of biomolecules. Thus, the theory of developing various algorithms for solving the 3D structure of biomolecules is also described in this book. Besides, the description of tertiary structure prediction protocol, protein structure refinement via molecular dynamics simulation, and quality assessment of predicted structure is detailed. Other than that, the algorithms for DNA sequence computing are compared. The algorithm for docking and molecular dynamics simulations are also included in this book.
Proteins, ranging from antibodies, enzymes, and hormones to contractile/storage/ storage/transport proteins, are essential parts of an organism and are involved in practically every process within the cells. The chapter on the application of protein modeling in this book overviews protein structure determination and protein modeling. The application of protein modeling is also reviewed to highlight the importance of structural elucidation in the structure-function relationship of protein in the biomedical field.
Besides, proteins are not rigid molecules and their conformational flexibility has various degrees. The conformation flexibility, especially in the binding site, is vital for the catalysis activities and molecular recognition with their binders. This book also includes a chapter on the dynamics of biomoleculars, hence further providing the applications of molecular recognition in drug development process.
Atomistic biomolecular simulation can be very helpful in studying the biomedical system when coupled to its function. Other than biomedical systems, this approach is increasingly being used in various fields. The molecular modeling and simulation have altered the research tactic by choosing the experiment with the highest probability of success rate prior to the actual experiment. Although molecular modeling complements experiments and can provide detailed time-based information at the molecular level, all techniques have limitations. These limitations/problems need to be addressed clearly prior to simulation. The computing of total entropies and entropy differences i.e., protein-inhibitor binding or protein folding, is yet to be solved. Insufficient experimental data is also one of the limitations through which the simulation predictions are validated. The development of a force field is usually limited to a class or type of molecules or environment; the choice of force field in the simulation of globular protein in solution is different than that of membrane protein in lipid bilayer simulation. In addition, biological events range from femto- to mili-seconds and seconds for the atomic fluctuation and side chain motion to the protein folding/unfolding. The search and sampling of a system is, therefore, another limitation in the simulation of biological systems. The challenges of biological simulation are the balance between force field, sampling and computational power and the ability to model the phenomena of interest that is dependent on the type of process.
This book not only targets researchers in industry and academics but also serves as a guide for graduates and undergraduates who wish to apply computational approaches in their biomedical research. This book covers the basic to detailed description and application of computational modeling and simulation in biomedical research and thus may be useful as reference material for learning important research topics.
I would like to thank Professor Dr. Janez Mavri from the Laboratory of Computational Biochemistry and Drug Design, National Institute of Chemistry, Ljubljana, Slovenia, for stimulating the discussion of this preface.
The digital revolution has significantly impacted worldwide technologies over the past few decades. Biomedical research is one of the most impacted fields with the advancement of computational power and data processing. The human genome sequencing project has generated an enormous amount of information, which is challenging to be stored and interpreted without the aid of computer programs. The development of computational algorithms has, therefore, greatly eased and reduced the time to study and analyze the human genome information. This has directly improved our understanding of the complex genome structure such as the presence of different regulatory regions and non-coding regions that code RNA like microRNA or long non-coding RNA (lncRNA). In addition, many computational tools have been developed to improve our understanding of the biomedical field. This covers the areas from the study of biomolecule structures and interactions, dynamicity of biomolecules, cellular activity, to system biology. This chapter thus provides a brief introduction to various computational tools in these areas and their importance.
Over decades of biomedical research, knowledge of health science has been expanding rapidly. This also resulted in the generation of massive information from various disciplines related to biomedical study i.e., genomics, metabolomics, metagenomics, phenomics, proteomics, and transcriptomics [1]. Hence, there is a need for computational tools to store and analyze enormous amount of information (Fig. 1). However, storing and analyzing biological information require different computational tools. The information is annotated and stored electronically in the computer system, and this is known as the database [2]. The database can keep a record of biological information such as DNA, RNA, and protein sequences [2].
Fig. (1)) Applications of computational tools in biomedical research.Storing the research findings in hardcopy is not a good option as it is prone to degradation and difficulty in distributing among other researchers. This can hinder the synchronization of scientific discovery and understanding among different researchers. Proper storage has thus become of utmost importance. It ensures that the knowledge can be easily retrieved and shared efficiently among the population [2]. Therefore, digital records have been used to store and retrieve biomedical knowledge [3]. They can also distribute the latest research findings worldwide easily as long as the computer is accessible along with an internet connection [4]. The genomic sequences of various organisms and information on genes and proteins are nowadays stored in various databases [5-7]. These databases are freely available to the public. Thus, any researcher can freely retrieve the relevant information for further analysis and use in scientific research.
However, analyzing the huge amount of biological information in the database can be daunting and time consuming. A gene can have a few hundred bases to thousands of kilobases (kb) of nucleotides. Aligning the gene alone to other sequences using manpower to check their sequence identity can seem nearly impossible. This is especially true if it involves the eukaryotic genes, where alternative splicing is present [8]. Thus, various computational applications are invented to analyze the information in the databases [9]. These computational applications can be available as downloadable software or in a web server. The downloadable software can be installed and run on a personal computer. The applications in the web server require the researcher to upload the relevant information, such as the protein sequence or structure, to conduct the analysis. The computational applications have greatly increased the efficiency of analyzing the information in the database. The basic local alignment search tool (BLAST) on the NCBI website can align an unknown sequence to all the known sequences in the database within a few minutes [10]. Besides, the Translate tool in Expasy can translate the nucleotide sequence to protein sequence and identify the potential reading frame [11]. In terms of structural similarities, PyMOL and UCSF Chimera can identify the structural deviation of protein structures [12, 13]. This has highlighted the need for highly sophisticated scientific calculations to analyze biomedical knowledge, which can be satisfied by using computational applications.
In summary, the use of computational tools has hence eased the storage, analysis, and annotation of biomedical information [1, 2]. Different computational algorithms have been designed to store and analyze biomedical information. Depending on the objectives of the analysis, it can range from comparing the sequences or structures to predicting the structures, functions, or effects of certain changes such as mutations. Technologies such as artificial intelligence (AI) may also help to simplify and accelerate the study and analysis of biomedical phenomena [14]. The widespread biomedical applications of computational tools have thus served as an invaluable method to understand biology. It may even become an alternative to in vitro experiments should its accuracy improve in the future.
The complex nature of the genomes, genes, and proteins, especially those in eukaryotes, has necessitated the use of computational tools to decode them. The sophisticated transcriptional control elements, translational regulation, and post-translational modifications in each gene and protein would be difficult to detect and interpret without the computational tools. By comparing the sequence and structural information of the biomolecules, the evolutionary conserved regions or functions of the unknown biomolecule can be extrapolated and predicted from the known biomolecules. A study has used quasi-alignments to compare the 16S rRNA sequences from different genomes [15]. The authors have reported that this method can detect the conserved regions in the aligned sequences with less computational resources and time [15].
Besides the sequence information, computational tools are widely used to analyze and predict the 3D crystal structure of biomolecules. This allows the structural information and the relative coordinates of each atom in the biomolecule to be recorded in real time. As more biomolecule structures are deposited in the structural databases, this also enables the development of increasingly accurate computational algorithms [16]. Thus far, the structures of unknown biomolecules can be predicted by using algorithms such as comparative modeling, threading, and machine learning [17]. MODELLER and Robetta servers are among the computational tools that employ these algorithms in predicting the protein structure [18, 19]. Recently, protein structure predicted by the neural network-based model AlphaFold has also been reported to show a median accuracy of 0.96 Å root mean square deviation median on the Cα backbone [20]. It was developed using a deep learning algorithm by combining both physical and biological information of protein structure into the prediction algorithm.
By comparing different structures in the databases, we can also improve our understanding of protein folding. Understanding how a protein folds into its final structure can enable us to predict the structure based solely on the amino acid sequence [21]. This is known as the ab initio prediction approach. A study has developed an ab initio prediction tool known as QUARK [22]. The sequence of the protein was first split into 1-20 residues to separately predict the structures. Then, the final structure was assembled from the individual fragments of structures using Monte Carlo simulations. The authors have reported that QUARK can predict the folding correctly in 30% of proteins with length up to 100 residues [22]. In summary, these have demonstrated that computational tools can reduce both the time and cost for protein structure determination compared to experimental means.
Besides, computational tools can also be applied to study the interactions between biomolecules or ligands. The structure of biomolecules can be docked against another ligand or biomolecule to predict the complex conformation and interactions using tools such as Autodock or Haddock [23-25]. This will extensively reduce the need for experimental screening and validation.
Computational tools can be used to study the structures and interactions of antibodies against the antigen. For example, the structures of the isolated single chain fragment variable (scFv) antibodies against murine double minute 2 (MDM2) protein were first predicted using MODELLER [26]. The structure of MDM2 antigen was predicted using MODELLER, Robetta, Phyre2, (PS)2-V2, QUARK, Psipred and i-Tasser [18, 19, 22, 27-30]. The authors then docked single fragment chain variables (scFvs) against MDM2 using the ClusPro 2.0 server [31, 32]. The interactions between scFv and MDM2 were finally analyzed using jsPISA [33]. This example underlines the importance of computational structure prediction and docking tools in biomedical studies. This is particularly true when time is important.
The determination of the 3D structure of the SARS-CoV2 proteins can also accelerate the design of a drug/vaccine for COVID-19 management. One of the studies has reported the screening of potential drugs against papain-like proteinase (PLpro), 3-chymotrypsin-like protease (3CLpro), RNA-directed RNA polymerase (RdRP), helicase, exonuclease (ExoN), nonstructural uridylate-specific endoribonuclease (NendoU) and 2’-O-methyltransferase (2’-O-MT) proteins in SARS-CoV2 proteins [34]. The authors have conducted high-throughput virtual screening (HTVS) against the active site of the SARS-CoV2 proteins using a library containing 5903 approved and investigational drugs [34]. About 290 potential inhibitors have been found from the HTVS. Among the potential inhibitors, the authors have confirmed that the protein kinase C isoforms inhibitor, bisindolylmaleimide IX (BIM IX), was able to inhibit 80% of 3CLpro enzymatic activity at 50 µM [34]. Hence, this has substantially reduced the time required to screen every single drug in the library against all the SARS-CoV2 proteins.
Although computational tools can be used to predict the interactions between biomolecules, the predictions have not been accurate. Previous studies on the structure and function relationship have suggested that the structure determined from the in vitro experiment can only represent a subset of the real-life structure of the biomolecules [35, 36]. The structure of biomolecules is now considered dynamic, and it can exist in a few structural conformations [35, 36]. Thus, computational prediction using these static averaged structures cannot directly infer the functional aspects of the biomolecules. Rather, the averaged structure only represents a single snapshot of the interactions between the biomolecules or ligands [37]. The binding between biomolecules or ligands is now known to be driven by the overall movements and interactions of atoms in the biomolecules and ligands, otherwise, known as the dynamicity of the biomolecules [38].
Here, a computational technique known as molecular dynamics (MD) simulation can be used to calculate and simulate the interactions between biomolecules and ligands [39]. The atoms in the biomolecules are assigned with different forces using force field parameters such as Amber, CHARMM, or GAFF force fields, while the surrounding environment of the biomolecules can be simulated by adding different solvents or solutes to better mimic the realistic interactions of biomolecules [40-42]. Then, calculations can then be performed to study the dynamic interactions of the atoms in the system.
Furthermore, there are different methods of conducting MD simulation, such as the hybrid of quantum mechanics/molecular mechanics (QM/MM) method or constant pH molecular dynamic simulation (CpHMD), to serve different needs. QM/MM is the combination of quantum mechanics and molecular mechanics in simulation [43, 44]. It is commonly used to study the enzymatic reaction, where the charge movement in the enzymatic reaction is simulated using quantum mechanics. In the meanwhile, the atoms not involved in the enzymatic reaction are simulated using the molecular mechanics approach [43, 44]. This method has the advantage of reducing computational resources and time needed to simulate the enzyme reaction.
On the other hand, CpHMD is the simulation of interactions and biological systems at different pH [45]. The residues such as aspartic acid and histidine can exhibit different protonation states at different pH [46]. When the protonation states of the residues change, it can affect the structures and interactions of the protein. Thus, CpHMD can study the changes in interactions of biological systems when the pH of the surroundings changes [46].