59,99 €
Proteins: Structure and Function is a comprehensive introduction to the study of proteins and their importance to modern biochemistry. Each chapter addresses the structure and function of proteins with a definitive theme designed to enhance student understanding. Opening with a brief historical overview of the subject the book moves on to discuss the ‘building blocks’ of proteins and their respective chemical and physical properties. Later chapters explore experimental and computational methods of comparing proteins, methods of protein purification and protein folding and stability.
The latest developments in the field are included and key concepts introduced in a user-friendly way to ensure that students are able to grasp the essentials before moving on to more advanced study and analysis of proteins.
An invaluable resource for students of Biochemistry, Molecular Biology, Medicine and Chemistry providing a modern approach to the subject of Proteins.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1118
Veröffentlichungsjahr: 2013
Contents
Preface
1 An Introduction to protein structure and function
A brief and very selective historical perspective
The biological diversity of proteins
Proteins and the sequencing of the human and other genomes
Why study proteins?
2 Amino acids: the building blocks of proteins
The 20 amino acids found in proteins
The acid–base properties of amino acids
Stereochemical representations of amino acids
Peptide bonds
The chemical and physical properties of amino acids
Detection, identification and quantification of amino acids and proteins
Stereoisomerism
Non-standard amino acids
Summary
Problems
3 The three-dimensional structure of proteins
Primary structure or sequence
Secondary structure
Tertiary structure
Quaternary structure
The globin family and the role of quaternary structure in modulating activity
Immunoglobulins
Cyclic proteins
Summary
Problems
4 The structure and function of fibrous proteins
The amino acid composition and organization of fibrous proteins
Keratins
Fibroin
Collagen
Summary
Problems
5 The structure and function of membrane proteins
The molecular organization of membranes
Membrane protein topology and function seen through organization of the erythrocyte membrane
Bacteriorhodopsin and the discovery of seven transmembrane helices
The structure of the bacterial reaction centre
Oxygenic photosynthesis
Photosystem I
Membrane proteins based on transmembrane β barrels
Respiratory complexes
Complex III, the ubiquinol-cytochrome c oxidoreductase
Complex IV or cytochrome oxidase
The structure of ATP synthetase
ATPase family
Summary
Problems
6 The diversity of proteins
Prebiotic synthesis and the origins of proteins
Evolutionary divergence of organisms and its relationship to protein structure and function
Protein sequence analysis
Protein databases
Gene fusion and duplication
Secondary structure prediction
Genomics and proteomics
Summary
Problems
7 Enzyme kinetics, structure, function, and catalysis
Enzyme nomenclature
Enzyme co-factors
Chemical kinetics
The transition state and the action of enzymes
The kinetics of enzyme action
Catalytic mechanisms
Enzyme structure
Lysozyme
The serine proteases
Triose phosphate isomerase
Tyrosyl tRNA synthetase
EcoRI restriction endonuclease
Enzyme inhibition and regulation
Irreversible inhibition of enzyme activity
Allosteric regulation
Covalent modification
Isoenzymes or isozymes
Summary
Problems
8 Protein synthesis, processing and turnover
Cell cycle
The structure of Cdk and its role in the cell cycle
Cdk–cyclin complex regulation
DNA replication
Transcription
Eukaryotic transcription factors: variation on a ‘basic’ theme
The spliceosome and its role in transcription
Translation
Transfer RNA (tRNA)
The composition of prokaryotic and eukaryotic ribosomes
A structural basis for protein synthesis
An outline of protein synthesis
Antibiotics provide insight into protein synthesis
Affinity labelling and RNA ‘footprinting’
Structural studies of the ribosome
Post-translational modification of proteins
Protein sorting or targeting
The nuclear pore assembly
Protein turnover
Apoptosis
Summary
Problems
9 Protein expression, purification and characterization
The isolation and characterization of proteins
Recombinant DNA technology and protein expression
Purification of proteins
Centrifugation
Solubility and ‘salting out’ and ‘salting in’
Chromatography
Dialysis and ultrafiltration
Polyacrylamide gel electrophoresis
Mass spectrometry
How to purify a protein?
Summary
Problems
10 Physical methods of determining the three-dimensional structure of proteins
Introduction
The use of electromagnetic radiation
X-ray crystallography
Nuclear magnetic resonance spectroscopy
Cryoelectron microscopy
Neutron diffraction
Optical spectroscopic techniques
Vibrational spectroscopy
Raman spectroscopy
ESR and ENDOR
Summary
Problems
11 Protein folding in vivo and in vitro
Introduction
Factors determining the protein fold
Factors governing protein stability
Folding problem and Levinthal’s paradox
Models of protein folding
Amide exchange and measurement of protein folding
Kinetic barriers to refolding
In vivo protein folding
Membrane protein folding
Protein misfolding and the disease state
Summary
Problems
12 Protein structure and a molecular approach to medicine
Introduction
Sickle cell anaemia
Viruses and their impact on health as seen through structure and function
HIV and AIDS
The influenza virus
p53 and its role in cancer
Emphysema and α1-antitrypsin
Summary
Problems
Epilogue
Glossary
Appendices
Bibliography
References
Index
Copyright © 2005
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777
Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], or faxed to (+44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-471-49893-9 HBISBN 0-471-49894-7 PB
For my parents,Elizabeth and Percy Whitford,to whom I owe everything
Preface
When I first started studying proteins as an undergraduate I encountered for the first time complex areas of biochemistry arising from the pioneering work of Pauling, Sumner, Kendrew, Perutz, Anfinsen, together with other scientific ‘giants’ too numerous to describe at length in this text. The area seemed complete. How wrong I was and how wrong an undergraduate’s perception can be! The last 30 years have seen an explosion in the area of protein biochemistry so that my 1975 edition of Biochemistry by Albert Lehninger remains, perhaps, of historical interest only. The greatest change has occurred through the development of molecular biology where fragments of DNA are manipulated in ways previously unimagined. This has enabled DNA to be sequenced, cloned, manipulated and expressed in many different cells. As a result areas of recombinant DNA technology and protein engineering have evolved rapidly to become specialist disciplines in their own right. Almost any protein whose primary sequence is known can be produced in large quantity via the expression of cloned or synthetic genes in recombinant host cells. Not only is the method allowing scientists to study some proteins for the first time but the increased amount of protein derived from recombinant DNA technology is also allowing the application of new and continually advancing structural techniques. In this area X-ray crystallography has remained at the forefront for over 40 years as a method of determining protein structure but it is now joined by nuclear magnetic resonance (NMR) spectroscopy and more recently by cryoelectron microscopy whilst other methods such as circular dichroism, infrared and Raman spectroscopy, electron spin resonance spectroscopy, mass spectrometry and fluorescence provide more limited, yet often vital and complementary, structural data. In many instances these methods have become established techniques only in the last 20 years and are consequently absent in many of those familiar textbooks occupying the shelves of university libraries.
An even greater impact on biochemistry has occurred with the rapid development of cost-effective, powerful, desktop computers with performance equivalent to the previous generation of supercomputers. Many experimental techniques relied on the codevelopment of computer hardware but software has also played a vital role in protein biochemistry. We can now search databases comparing proteins at the level of DNA or amino acid sequences, building up patterns of homology and relationships that provide insight into origin and possible function. In addition we use computers routinely to calculate properties such as isoelectric point, number of hydrophobic residues or secondary structure – something that would have been extraordinarily tedious, time consuming and problematic 20 years ago. Computers have revolutionized all aspects of protein biochemistry and there is little doubt that their influence will continue to increase in the forthcoming decades. The new area of bioinformatics reflects these advances in computing.
In my attempt to construct an introductory yet extensive text on proteins I have, of necessity, been circumspect in my description of the subject area. I have often relied on qualitative rather than quantitative descriptions and I have attempted to minimise the introduction of unwieldy equations or formulae. This does not reflect my own interests in physical biochemistry because my research, I hope, was often quantitative. In some cases particularly the chapters on enzymes and physical methods the introduction of equations is unavoidable but also necessary to an initial description of the content of these chapters. I would be failing in my duty as an educator if I omitted some of these equations and I hope students will keep going at these ‘difficult’ points or failing that just omit them entirely on first reading this book. However, in general I wish to introduce students to proteins by describing principles governing their structure and function and to avoid over-complication in this presentation through rigorous and quantitative treatment. This book is firmly intended to be a broad introductory text suitable for undergraduate and postgraduate study, perhaps after an initial exposure to the subject of protein biochemistry, whilst at the same time introducing specialist areas prior to future advanced study. I hope the following chapters will help to direct students to the amazing beauty and complexity of protein systems.
The present text should be suitable for all introductory modes of biochemistry, molecular biology, chemistry, medicine and dentistry. In the UK this generally means the book is suitable for all undergraduates between years 1 and 3 and this book has stemmed from lectures given as parts of biochemistry courses to students of biochemistry, chemistry, medicine and dentistry in all 3 years. Where possible each chapter is structured to increase progressively in complexity. For purely introductory courses as would occur in years 1 or 2 it is sufficient to read only the first parts, or selected sections, of each chapter. More advanced courses may require thorough reading of each chapter together with consultation of the bibliography and secondly the list of references given at the end of the book.
In the last ten years the world wide web (WWW) has transformed information available to students. It provides a new and useful medium with which to deliver lecture notes and an exciting and new teaching resource for all. Consequently within this book URLs direct students to learning resources and a list of important addresses is included in the appendix. In an effort to exploit the power of the internet this book is associated with ‘web-based’ tutorials, problems and content and is accessed from the following URL http://www.wiley.com/go/whitfordproteins. These ‘pages’ are continually updated and point the interested reader towards new areas as they emerge. The Bibliography points interested readers towards further study material suitable for a first introduction to a subject whilst the list of references provides original sources for many areas covered in each of the twelve chapters.
For the problems included at the end of each chapter there are approximately 10 questions that aim to build on the subject matter discussed in the preceding text. Often the questions will increase in difficulty although this is not always the case. In this book I have limited the bibliography to broad reviews or accessible journal papers and I have deliberately restricted the number of ‘high-powered’ (difficult!) articles since I believe this organization is of greater use to students studying these subjects for the first time. To aid the learning process the web edition has multiple-choice questions for use as a formative assessment exercise. I should certainly like to hear of all mistakes or omissions encountered in this text and my hope is that educators and students will let me know via the e-mail address at the end of this section of any required corrections or additions.
Proteins are three-dimensional (3D) objects that are inadequately represented on book pages. Consequently many proteins are best viewed as molecular images using freely available software. Here, real-time manipulation of coordinate files is possible and will prove helpful to understanding aspects of structure and function. The importance of viewing, manipulating and even changing the representation of proteins to comprehending structure and function cannot be underestimated. Experience has suggested that the use of computers in this area can have a dramatic effect on student’s understanding of protein structures. The ability to visualize in 3D conveys so much information – far more than any simple 2D picture in this book could ever hope to portray. Alongside many figures I have written the Protein DataBank files (e.g. PDB: 1HKO) used to produce diagrams. These files can be obtained from databases at several permanent sites based around the world such as http://www.rscb.org/pdb or one of the many ‘mirrors’ that exist (for example, in the UK this data is found at http://pdb.ccdc.cam.ac.uk). For students with Internet access each PDB file can be retrieved and manipulated independently to produce comparable images to those shown in the text. To explore these macromolecular images with reasonable efficiency does not require the latest ‘all-powerful’ desktop computer. A computer with a Pentium III (or later) based processor, a clock speed of 200 MHz or greater, 32–64 MB RAM, hard disks of 10 GB, a graphics video card with at least 8 MB memory and a connection to the internet are sufficient to view and store a significant number of files together with representative images. Of course things are easier with a computer with a surfeit of memory (>256 MB) and a high ‘clock’ speed (>2 GHz) but it is not obligatory to see ‘on-line’ content or to manipulate molecular images. This book was started on a 700 MHz Pentium III based processor equipped with 256 MB RAM and 16 MB graphics card.
This book will address the structure and function of proteins in 12 subsequent chapters each with a definitive theme. After an initial chapter describing why one would wish to study proteins and a brief historical background the second chapter deals with the ‘building blocks’ of proteins, namely the amino acids together with their respective chemical and physical properties. No attempt is made at any point to describe the metabolism connected with these amino acids and the reader should consult general textbooks for descriptions of the synthesis and degradation of amino acids. This is a major area in its own right and would have lengthened the present book too much. However, I would like to think that students will not avoid these areas because they remain an equally important subject that should be covered at some point within the undergraduate curriculum. Chapter 3 covers the assembly of amino acids into polypeptide chains and levels of organizational structure found within proteins. Almost all detailed knowledge of protein structure and function has arisen through studies of globular proteins but the presence of fibrous proteins with different structures and functional properties necessitated a separate chapter devoted to this area (Chapter 4). Within this class the best understood structures are those belonging to the collagen class of proteins, the keratins and the extended β sheet structures such as silk fibroin. The division between globular proteins and fibrous proteins was made at a time when the only properties one could compare readily were a protein’s amino acid composition and hydrodynamic radius. It is now apparent that other proteins exist with properties intermediate between globular and fibrous proteins that do not lend themselves to simple classification. However, the ‘old’ schemes of identification retain their value and serve to emphasize differences in proteins.
Membrane proteins represent a third group with different composition and properties. Most of these proteins are poorly understood, but there have been spectacular successes from the initial low-resolution structure of bacteriorhodopsin to the highly defined structure of bacterial photosynthetic reaction centres. These advances paved the way towards structural studies of G proteins and G-protein coupled receptors, the respiratory complexes from aerobic bacteria and the structure of ATP synthetases.
Chapter 6 focuses both on experimental and computational methods of comparing proteins where in silico methods have become increasingly important as a vital tool to assist with modern protein biochemistry. Chapter 7 focuses on enzymes and by discussing basic reaction rate theories and kinetics the chapter leads to a discussion of enzyme-catalysed reactions. Enzymes catalyse reactions through a variety of mechanisms including acid–base catalysis, nucleophilic driven chemistry and transition state stabilization. These and other mechanisms are described along with the principles of regulation, active site chemistry and binding.
The involvement of proteins in the cell cycle, transcription, translation, sorting and degradation of proteins is described in Chapter 8. In 50 years we have progressed from elucidating the structure of DNA to uncovering how this information is converted into proteins. The chapter is based around the structure of two macromolecular systems: the ribosome devoted towards accurate and efficient synthesis and the proteasome designed to catalyse specific proteolysis. Chapter 9 deals with the methods of protein purification. Very often, biochemistry textbooks describe techniques without placing the technique in the correct context. As a result, in Chapter 9 I have attempted to describe equipment as well as techniques so that students may obtain a proper impression of this area.
Structural methods determine the topology or fold of proteins. With an elucidation of structure at atomic levels of resolution comes an understanding of biological function. Chapter 10 addresses this area by describing different techniques. X-ray crystallography remains at the forefront of research with new variations of the basic principle allowing faster determination of structure at improved resolution. NMR methods yield structures of comparable resolution to crystallography for small soluble proteins. In ideal situations these methods provide complete structural determination of all heavy atoms but they are complemented by other spectroscopic methods such as absorbance and fluorescence methods, mass spectrometry and infrared spectroscopy. These techniques provide important ancillary information on tertiary structure such as the helical content of the protein, the proportion and environment of aromatic residues within a protein as well as secondary structure content.
Chapter 11 describes protein folding and stability – a subject that has generated intense research interest with the recognition that disease states arise from aberrant folding or stability. The mechanism of protein folding is illustrated by in vitro and in vivo studies. Whilst the broad concepts underlying protein folding were deduced from studies of ‘model’ proteins such as ribonuclease, analysis of cell folding pathways has highlighted specialised proteins, chaperones, with a critical function to the overall process. The GroES–GroEL complex is discussed to highlight the integrated process of synthesis and folding in vivo.
The final chapter builds on the preceding 11 chapters using a restricted set of well-studied proteins (case studies) with significant impact on molecular medicine. These proteins include haemoglobin, viral proteins, p53, prions and α1-antitrypsin. Although still a young subject area this branch of protein science will expand in the next few years and will rely on the techniques, knowledge and principles elucidated in Chapters 1–11. The examples emphasize the impact of protein science and molecular medicine on the quality of human life.
I am indebted to all research students and post-docs who shared my laboratories at the Universities of London and Oxford during the last 15 years in many cases acting as ‘test subjects’ for teaching ideas. I should like to thank Drs Roger Hewson, Richard Newbold and Susan Manyusa whose comments throughout my research and teaching career were always valued. I would also like to thank individuals, too numerous to name, with whom I interacted at King’s College London, Imperial College of Science, Technology and Medicine and the University of Oxford. In this context I should like to thank Dr John Russell, formerly of Imperial College London whose goodwill, humour and fantastic insight into the history of science, the scientific method and ‘day to day’ experimentation prevented absolute despair.
During preparation of this book many individuals read and contributed valuable comments to the manuscript’s content, phrasing and ideas. In particular I wish to thank these unnamed and some times unknown individuals who read one or more of the chapters of this book. As is often said by most authors at this point despite their valuable contributions all of the remaining errors and deficiencies in the current text are my responsibility. In this context I could easily have spent more months attempting to perfect the current text. I am very aware that this text has deficiencies but I hope these defects will not detract from its value. In addition my wish to try other avenues, other roads not taken, dictates that this manuscript is completed without delay.
Writing and producing a textbook would not be possible without the support of a good publisher. I should like to thank all the staff at John Wiley & Sons, Chichester, UK. This exhaustive list includes particularly Andrew Slade as senior Publishing Editor who helped smooth the bumpy route towards production of this book, Lisa Tickner who first initiated events leading to commissioning this book, Rachel Ballard who supervised day to day business on this book, replacing every form I lost without complaint and monitoring tactfully and gently about possible completion dates, Robert Hambrook who translated my text and diagrams into a beautiful book, and the remainder of the production team of John Wiley and Sons. Together we inched our way towards the painfully slow production of this text, although the pace was entirely attributable to the author.
Lastly I must also thank Susan who tolerated the protracted completion of this book, reading chapters and offering support for this project throughout whilst coping with the arrival of Alexandra and Ethan effortlessly (unlike their father).
David Whitford
April [email protected]
1
An Introduction to protein structure and function
Biochemistry has exploded as a major scientific endeavour over the last one hundred years to rival previously established disciplines such as chemistry and physics. This occurred with the recognition that living systems are based on the familiar elements of organic chemistry (carbon, oxygen, nitrogen and hydrogen) together with the occasional involvement of inorganic chemistry and elements such as iron, copper, sodium, potassium and magnesium. More importantly the laws of physics including those concerning thermodynamics, electricity and quantum physics are applicable to biochemical systems and no ‘vital’ force distinguishes living from non-living systems. As a result the laws of chemistry and physics are successfully applied to biochemistry and ideas from physics and chemistry have found widespread application, frequently revolutionizing our understanding of complex systems such as cells.
This book focuses on one major component of all living systems – the proteins. Proteins are found in all living systems ranging from bacteria and viruses through the unicellular and simple eukaryotes to vertebrates and higher mammals such as humans. Proteins make up over 50 percent of the dry weight of cells and are present in greater amounts than any other biomolecule. Proteins are unique amongst the macromolecules in underpinning every reaction occurring in biological systems. It goes without saying that one should not ignore the other components of living systems since they have indispensable roles, but in this text we will consider only proteins.
With the vast accumulation of knowledge about proteins over the last 50 years it is perhaps surprising to discover that the term protein was introduced nearly 170 years ago. One early description was by Gerhardus Johannes Mulder in 1839 where his studies on the composition of animal substances, chiefly fibrin, albumin and gelatin, showed the presence of carbon, hydrogen, oxygen and nitrogen. In addition he recognized that sulfur and phosphorus were present sometimes in ‘animal substances’ that contained large numbers of atoms. In other words, he established that these ‘substances’ were macromolecules. Mulder communicated his results to Jöns Jakob Berzelius and it is suggested the term protein arose from this interaction where the origin of the word protein has been variously ascribed to derivation from the Latin word primarius or from the Greek god Proteus. The definition of proteins was timely since in 1828 Friedrich Wohler had shown that heating ammonium cyanate resulted in isomerism and the formation of urea (Figure 1.1). Organic compounds characteristic of living systems, such as urea, could be derived from simple inorganic chemicals. For many historians this marks the beginning of biochemistry and it is appropriate that the discovery of proteins occurred at the same period.
Figure 1.1 The decomposition of ammonium cyanate yields urea
The development of biochemistry and the study of proteins was assisted by analysis of their composition and structure by Heinrich Hlasiwetz and Josef Habermann around 1873 and the recognition that proteins were made up of smaller units called amino acids. They established that hydrolysis of casein with strong acids or alkali yielded glutamic acid, aspartic acid, leucine, tyrosine and ammonia whilst the hydrolysis of other proteins yielded a different group of products. Importantly their work suggested that the properties of proteins depended uniquely on the constituent parts – a theme that is equally relevant today in modern biochemical study.
Another landmark in the study of proteins occurred in 1902 with Franz Hofmeister establishing the constituent atoms of the peptide bond with the polypeptide backbone derived from the condensation of free amino acids. Five years earlier Eduard Buchner revolutionized views of protein function by demonstrating that yeast cell extracts catalysed fermentation of sugar into ethanol and carbon dioxide. Previously it was believed that only living systems performed this catalytic function. Emil Fischer further studied biological catalysis and proposed that components of yeast, which he called enzymes, combined with sugar to produce an intermediate compound. With the realization that cells were full of enzymes 100 years of research has developed and refined these discoveries. Further landmarks in the study of proteins could include Sumner’s crystallization of the first enzyme (urease) in 1926 and Pauling’s description of the geometry of the peptide bond; however, extensive discussion of these advances and many other important discoveries in protein biochemistry are best left to history of science textbooks.
A brief look at the award of the Nobel Prizes for Chemistry, Physiology and Medicine since 1900 highlighted in Table 1.1 reveals the involvement of many diverse areas of science in protein biochemistry. At first glance it is not obvious why William and Lawrence Bragg’s discovery of the diffraction of X-rays by sodium chloride crystals is relevant, but diffraction by protein crystals is the main route towards biological structure determination. Their discovery was the first step in the development of this technique. Discoveries in chemistry and physics have been implemented rapidly in the study of proteins. By 1958 Max Perutz and John Kendrew had determined the first protein structure and this was soon followed by the larger, multiple subunit, structure of haemoglobin and the first enzyme, lysozyme. This remarkable advance in knowledge extended from initial understanding of the atomic composition of proteins around 1900 to the determination of the three-dimensional structure of proteins in the 1960s and represents a major chapter of modern biochemistry. However, advances have continued with new areas of molecular biology proving equally important to understanding protein structure and function.
Life may be defined as the ordered interaction of proteins and all forms of life from viruses to complex, specialized, mammalian cells are based on proteins made up of the same building blocks or amino acids. Proteins found in simple unicellular organisms such as bacteria are identical in structure and function to those found in human cells illustrating the evolutionary lineage from simple to complex organisms.
Molecular biology starts with the dramatic elucidation of the structure of the DNA double helix by James Watson, Francis Crick, Rosalind Franklin and Maurice Wilkins in 1953. Today, details of DNA replication, transcription into RNA and the synthesis of proteins (translation) are extensive. This has established an enormous body of knowledge representing a whole new subject area. All cells encode the information content of proteins within genes, or more accurately the order of bases along the DNA strand, yet it is the conversion of this information or expression into proteins that represents the tangible evidence of a living system or life.
Table 1.1 Selected landmarks in the study of protein structure and function from 1900–2002 as seen by the award of the Nobel Prize for Chemistry, Physiology or Medicine
Date
Discoverer + Discovery
1901
Wilhelm Conrad Röntgen ‘in recognition of the…discovery of the remarkable rays subsequently named after him’
1907
Eduard Buchner ‘cell-free fermentation’
1914
Max von Laue ‘for his discovery of the diffraction of X-rays by crystals’
1915
William Henry Bragg and William Lawrence Bragg ‘for their services in the analysis of crystal structure by…X-rays‘
1923
Frederick Grant Banting and John James Richard Macleod ‘for the discovery of insulin‘
1930
Karl Landsteiner ‘for his discovery of human blood groups‘
1946
James Batcheller Sumner ‘for his discovery that enzymes can be crystallized‘ John Howard Northrop and Wendell Meredith Stanley ‘for their preparation of enzymes and virus proteins in pure form‘
1948
Arne Wilhelm Kaurin Tiselius ‘for his research on electrophoresis and adsorption analysis, especially for his discoveries concerning the complex nature of the serum proteins‘
1952
Archer John Porter Martin and Richard Laurence Millington Synge ‘for their invention of partition chromatography‘
1952
Felix Bloch and Edward Mills Purcell ‘for their development of new methods for nuclear magnetic precision measurements and discoveries in connection therewith‘
1954
Linus Carl Pauling ‘for his research into the nature of the chemical bond and…to the elucidation of…complex substances‘
1958
Frederick Sanger ‘for his work on the structure of proteins, especially that of insulin‘
1959
Severo Ochoa and Arthur Kornberg ‘for their discovery of the mechanisms in the biological synthesis of ribonucleic acid and deoxyribonucleic acid‘
1962
Max Ferdinand Perutz and John Cowdery Kendrew ‘for their studies of the structures of globular proteins‘
1962
Francis Harry Compton Crick, James Dewey Watson and Maurice Hugh Frederick Wilkins ‘for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer in living material‘
1964
Dorothy Crowfoot Hodgkin ‘for her determinations by X-ray techniques of the structures of important biochemical substances‘
1965
François Jacob, André Lwoff and Jacques Monod ‘for discoveries concerning genetic control of enzyme and virus synthesis’
1968
Robert W. Holley, Har Gobind Khorana and Marshall W. Nirenberg ‘for…the genetic code and its function in protein synthesis’
1969
Max Delbrück, Alfred D. Hershey and Salvador E. Luria ‘for their discoveries concerning the replication mechanism and the genetic structure of viruses’
1972
Christian B. Anfinsen ‘for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation’ Stanford Moore and William H. Stein ‘for their contribution to the understanding of the connection between chemical structure and catalytic activity of…ribonuclease molecule’
1972
Gerald M. Edelman and Rodney R. Porter ‘for their discoveries concerning the chemical structure of antibodies’
1975
John Warcup Cornforth ‘for his work on the stereochemistry of enzyme-catalyzed reactions’Vladimir Prelog ‘for his research into the stereochemistry of organic molecules and reactions’
1975
David Baltimore, Renato Dulbecco and Howard Martin Temin ‘for their discoveries concerning the interaction between tumour viruses and the genetic material of the cell’
1978
Werner Arber, Daniel Nathans and Hamilton O. Smith ‘for the discovery of restriction enzymes and their application to problems of molecular genetics’
1980
Paul Berg ‘for his fundamental studies of the biochemistry of nucleic acids, with particular regard to recombinant-DNA’ Walter Gilbert and Frederick Sanger ‘for their contributions concerning the determination of base sequences in nucleic acids’
1982
Aaron Klug ‘development of crystallographic electron microscopy and structural elucidation of nucleic acid-protein complexes’
1984
Robert Bruce Merrifield ‘for his development of methodology for chemical synthesis on solid matrix’
1984
Niels K. Jerne, Georges J.F. Köhler and César Milstein ‘for theories concerning the specificity in development and control of the immune system and the discovery of the principle for production of monoclonal antibodies’
1988
Johann Deisenhofer, Robert Huber and Hartmut Michel ‘for the determination of the structure of photosynthetic reaction centre’
1989
J. Michael Bishop and Harold E. Varmus ‘for their discovery of the cellular origin of retroviral oncogenes’
1991
Richard R. Ernst ‘for…the methodology of high resolution nuclear magnetic resonance spectroscopy’
1992
Edmond H. Fischer and Edwin G. Krebs ‘for their discoveries concerning reversible protein phosphorylation as biological regulatory mechanism’
1993
Kary B. Mullis ‘for his invention of the polymerase chain reaction (PCR) method’ and Michael Smith ‘for his fundamental contributions to the establishment of oligonucleotide-based, site-directed mutagenesis’
1994
Alfred G. Gilman and Martin Rodbell ‘for their discovery of G-proteins and the role of these proteins in signal transduction’
1997
Paul D. Boyer and John E. Walker ‘for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP)’Jens C. Skou ‘for the first discovery of an ion-transporting enzyme, Na+,K+-ATPase’
1997
Stanley B. Prusiner ‘for his discovery of prions new biological principle of infection’
1999
Günter Blobel ‘for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell’
2000
Arvid Carlsson, Paul Greengard and Eric Kandel ‘signal transduction in the nervous system’
2001
Paul Nurse, Tim Hunt and Leland Hartwill ‘for discoveries of key regulators of the cell cycle’
2002
Kurt Wuthrich, ‘for development of NMR spectroscopy as method of determining biological macromolecules structure in solution. John B. Fenn and Koichi Tanaka ‘for their development of soft desorption ionization methods for mass spectrometric analyses of biological macromolecules’ Sydney Brenner, H. Robert Horvitz and John E. Sulston ‘for their discoveries concerning genetic regulation of organ development and programmed cell death’
Cells divide, synthesize new products, secrete unwanted products, generate chemical energy to sustain these processes via specific chemical reactions, and in all of these examples the common theme is the mediation of proteins.
In 1944 the physicist Erwin Schrödinger posed the question ‘What is Life?’ in an attempt to understand the physical properties of a living cell. Schrödinger suggested that living systems obeyed all laws of physics and should not be viewed as exceptional but instead reflected the statistical nature of these laws. More importantly, living systems are amenable to study using many of the techniques familiar to chemistry and physics. The last 50 years of biochemistry have demonstrated this hypothesis emphatically with tools developed by physicists and chemists rapidly employed in biological studies. A casual perusal of Table 1.1 shows how quickly methodologies progress from discovery to application.
Proteins have diverse biological functions ranging from DNA replication, forming cytoskeletal structures, transporting oxygen around the bodies of multicellular organisms to converting one molecule into another. The types of functional properties are almost endless and are continually being increased as we learn more about proteins. Some important biological functions are outlined in Table 1.2 but it is to be expected that this rudimentary list of properties will expand each year as new proteins are characterized. A formal demarcation of proteins into one class should not be pursued too far since proteins can have multiple roles or functions; many proteins do not lend themselves easily to classification schemes. However, for all chemical reactions occurring in cells a protein is involved intimately in the biological process. These proteins are united through their composition based on the same group of 20 amino acids. Although all proteins are composed of the same group of 20 amino acids they differ in their composition – some contain a surfeit of one amino acid whilst others may lack one or two members of the group of 20 entirely. It was realized early in the study of proteins that variation in size and complexity is common and the molecular weight and number of subunits (polypeptide chains) show tremendous diversity. There is no correlation between size and number of polypeptide chains. For example, insulin has a relative molecular mass of 5700 and contains two polypeptide chains, haemoglobin has a mass of approximately 65 000 and contains four polypeptide chains, and hexokinase is a single polypeptide chain with an overall mass of ~ 100000 (see Table 1.3).
Table 1.2 A selective list of some functional roles for proteins within cells
Function
Examples
Enzymes or catalytic proteins
Trypsin, DNA polymerases and ligases,
Contractile proteins
Actin, myosin, tubulin, dynein,
Structural or cytoskeletal proteins
Tropocollagen, keratin,
Transport proteins
Haemoglobin, myoglobin, serum albumin, ceruloplasmin, transthyretin
Effector proteins
Insulin, epidermal growth factor, thyroid stimulating hormone,
Defence proteins
Ricin, immunoglobulins, venoms and toxins, thrombin,
Electron transfer proteins
Cytochrome oxidase, bacterial photosynthetic reaction centre, plastocyanin, ferredoxin
Receptors
CD4, acetycholine receptor,
Repressor proteins
Jun, Fos, Cro,
Chaperones (accessory folding proteins)
GroEL, DnaK
Storage proteins
Ferritin, gliadin,
Table 1.3 The molecular masses of proteins together with the number of subunits. The term ‘subunit’ is synonymous with the number of polypeptide chains and is used interchangeably
Protein
Molecular mass
Subunits
Insulin
5700
2
Haemoglobin
64500
4
Tropocollagen
285000
3
Subtilisin
27500
1
Ribonuclease
12600
1
Aspartate transcarbamoylase
310000
12
Bacteriorhodopsin
26800
1
Hexokinase
102000
1
Proteins are joined covalently and non-covalently with other biomolecules including lipids, carbohydrates, nucleic acids, phosphate groups, flavins, heme groups and metal ions. Components such as hemes or metal ions are often called prosthetic groups. Complexes formed between lipids and proteins are lipoproteins, those with carbohydrates are called glycoproteins, whilst complexes with metal ions lead to metalloproteins, and so on. The complexes formed between metal ions and proteins increases the involvement of elements of the periodic table beyond that expected of typical organic molecules (namely carbon, hydrogen, nitrogen and oxygen). Inspection of the periodic table (Figure 1.2) shows that at least 20 elements have been implicated directly in the structure and function of proteins (Table 1.4). Surprisingly elements such as aluminium and silicon that are very abundant in the Earth’s crust (8.1 and 25.7 percent by weight, respectively) do not occur in high concentration within cells. Aluminium is rarely, if ever, found as part of proteins whilst the role of silicon is confined to biomineralization where it is the core component of shells. The involvement of carbon, hydrogen, oxygen, nitrogen, phosphorus and sulfur is clear although the role of other elements, particularly transition metals, has been difficult to establish. Where transition metals occur in proteins there is frequently only one metal atom per mole of protein and led in the past to a failure to detect metal. Other elements have an inferred involvement from growth studies showing that depletion from the diet leads to an inhibition of normal cellular function. For metalloproteins the absence of the metal can lead to a loss of structure and function.
Metals such as Mo, Co and Fe are often found associated with organic co-factors such as pterin, flavins, cobalamin and porphyrin (Figure 1.3). These organic ligands hold metal centres and are often tightly associated to proteins.
Table 1.4 The involvement of trace elements in the structure and function of proteins
Element
Functional role
Sodium
Principal intracellular ion, osmotic balance
Potassium
Principal intracellular ion, osmotic balance
Magnesium
Bound to ATP/GTP in nucleotide binding proteins, found as structural component of hydrolase and isomerase enzymes
Calcium
Activator of calcium binding proteins such as calmodulin
Vanadium
Bound to enzymes such as chloroperoxidase.
Manganese
Bound to pterin co-factor in enzymes such as xanthine oxidase or sulphite oxidase. Also found in nitrogenase and as component of water splitting enzyme in higher plants.
Iron
Important catalytic component of heme enzymes involved in oxygen transport as well as electron transfer. Important examples are haemoglobin, cytochrome oxidase and catalase.
Cobalt
Metal component of vitamin B12 found in many enzymes.
Nickel
Co-factor found in hydrogenase enzymes
Copper
Involved as co-factor in oxygen transport systems and electron transfer proteins such as haemocyanin and plastocyanin.
Zinc
Catalytic component of enzymes such as carbonic anhydrase and superoxide dismutase.
Chlorine
Principal intracellular anion, osmotic balance
Iodine
Iodinated tyrosine residues form part of hormone thyroxine and bound to proteins
Selenium
Bound at active centre of glutathione peroxidase
Figure 1.2 The periodic table showing the elements highlighted in red known to have involvement in the structure and/or function of proteins. The involvement of some elements is contentious tungsten and cadmium are claimed to be associated with proteins yet these elements are also known to be toxic
Figure 1.3 Organic co-factors found in proteins. These co-factors are pterin, the isoalloxine ring found as part of flavin in FAD and FMN, the pyridine ring of NAD and its close analogue NADP and the porphyrin skeletons of heme and chlorophyll. R represents the remaining part of the co-factor whilst M and V signify methyl and vinyl side chains
Recognition of the diverse roles of proteins in biological systems increased largely as a result of the enormous amount of sequencing information generated via the Human Genome Mapping project. Similar schemes aimed at deciphering the genomes of Escherichia coli, yeast (Sacharromyces cerevisiae), and mouse provided related information. With the completion of the first draft of the human genome mapping project in 2001 human chromosomes contain approximately 25–30 000 genes. This allows a conservative estimate of the number of polypeptides making up most human cells as ~25 000, although alternative splicing of genes and variations in subunit composition increase the number of proteins further. Despite sequencing the human genome it is an unfortunate fact that we do not know the role performed by most proteins. Of those thousands of polypeptides we know the structures of only a small number, emphasizing a large imbalance between the abundance of sequence data and the presence of structure/function information. An analysis of protein databases suggests about 1000 distinct structures or folds have been determined for globular proteins. Many proteins are retained within cell membranes and we know virtually nothing about the structures of these proteins and only slightly more about their functional roles. This observation has enormous consequences for understanding protein structure and function.
This question is often asked not entirely without reason by many undergraduates during their first introduction to the subject. Perhaps the best reply that can be given is that proteins underpin every aspect of biological activity. This is particularly important in areas where protein structure and function have an impact on human endeavour such as medicine. Advances in molecular genetics reveal that many diseases stem from specific protein defects. A classic example is cystic fibrosis, an inherited condition that alters a protein, called the cystic fibrosis transmembrane conductance regulator (CFTR), involved in the transport of sodium and chloride across epithelial cell membranes. This defect is found in Caucasian populations at a ratio of ~1 in 20, a surprisingly high frequency. With 1 in 20 of the population ‘carrying’ a single defective copy of the gene individuals who inherit defective copies of the gene from each parent suffer from the disease. In the UK the incidence of cystic fibrosis is approximately 1 in 2000 live births, making it one of the most common inherited disorders. The disease results in the body producing a thick, sticky mucus that blocks the lungs, leading to serious infection, and inhibits the pancreas, stopping digestive enzymes from reaching the intestines where they are required to digest food. The severity of cystic fibrosis is related to CFTR gene mutation, and the most common mutation, found in approximately 65 percent of all cases, involves the deletion of a single amino acid residue from the protein at position 508. A loss of one residue out of a total of nearly 1500 amino acid residues results in a severe decrease in the quality of life with individuals suffering from this disease requiring constant medical care and supervision.
Figure 1.4 The shape of erythrocytes in normal and sickle cell anemia arises from mutations to haemoglobin found within the red blood cell. (Reproduced with permission from Voet, D, Voet, J.G and Pratt, C.W. Fundamentals of Biochemistry. John Wiley & Sons Inc.)
Further examples emphasize the need to understand more about proteins. The pioneering studies of Vernon Ingram in the 1950s showed that sickle cell anemia arose from a mutation in the β chain of haemoglobin. Haemoglobin is a tetrameric protein containing 2α and 2β chains. In each of the β chains a mutation is found that involves the change of the sixth amino acid residue from a glutamic acid to a valine. The alteration of two residues out of 574 leads to a drastic change in the appearance of red blood cells from their normal biconcave disks to an elongated sickle shape (Figure 1.4).
As the name of the disease suggests individuals are anaemic showing decreased haemoglobin content in red blood cells from approximately 15 g per 100 ml to under half that figure, and show frequent illness. Our understanding of cystic fibrosis and of sickle cell anaemia has advanced in parallel with our understanding of protein structure and function although at best we have very limited and crude means of treating these diseases.
However, perhaps the greatest impetus to understand protein structure and function lies in the hope of overcoming two major health issues confronting the world in the 21st century. The first of these is cancer. Cancer is the uncontrolled proliferation of cells that have lost their normal regulated cell division often in response to a genetic or environmental trigger. The development of cancer is a multistep, multifactorial process often occurring over decades but the precise involvement of specific proteins has been demonstrated in some instances. One of the best examples is a protein called p53, normally present at low levels in cells, that ‘switches on’ in response to cellular damage and as a transcription factor controls the cell cycle process. Mutations in p53 alter the normal cycle of events leading eventually to cancer and several tumours including lung, colorectal and skin carcinomas are attributed to molecular defects in p53. Future research on p53 will enable its physicochemical properties to be thoroughly appreciated and by understanding the link between structure, folding, function and regulation comes the prospect of unravelling its role in tumour formation and manipulating its activity via therapeutic intervention. Already some success is being achieved in this area and the future holds great promise for ‘halting’ cancer by controlling the properties of p53 and similar proteins.
A second major problem facing the world today is the estimated number of people infected with the human immunodeficiency virus (HIV). In 2003 the World Health Organization (WHO) estimated that over 40 million individuals are infected with this virus in the world today. For many individuals, particularly those in the ‘Third World’, the prospect of prolonged good health is unlikely as the virus slowly degrades the body’s ability to fight infection through damage to the immune response mechanism and in particular to a group of cells called cytotoxic T cells. HIV infection encompasses many aspects of protein structure and function, as the virus enters cells through the interaction of specific viral coat proteins with receptors on the surface of white blood cells. Once inside cells the virus ‘hides’ but is secretly replicating and integrating genetic material into host DNA through the action of specific enzymes (proteins). Halting the destructive influence of HIV relies on understanding many different, yet inter-related, aspects of protein structure and function. Again, considerable progress has been made since the 1980s when the causative agent of the disease was recognized as a retrovirus. These advances have focussed on understanding the structure of HIV proteins and in designing specific inhibitors of, for example, the reverse transcriptase enzyme. Although in advanced health care systems these drugs (inhibitors) prolong life expectancy, the eradication of HIV’s destructive action within the body and hence an effective cure remains unachieved. Achieving this goal should act as a timely reminder for all students of biology, chemistry and medicine that success in this field will have a dramatic impact on the quality of human life in the forthcoming decades.
Central to success in treating any of the above diseases are the development of new medicines, many based on proteins. The development of new therapies has been rapid during the last 20 years with the list of new treatments steadily increasing and including minimizing serious effects of different forms of cancer via the use of specific proteins including monoclonal antibodies, alleviating problems associated with diabetes by the development of improved recombinant ‘insulins’ and developing ‘clot-busting’ drugs (proteins) for the management of strokes and heart attacks. This highly selective list is the productive result of understanding protein structure and function and has contributed to a marked improvement in disease management. For the future these advances will need to be extended to other diseases and will rely on an extensive and thorough knowledge of proteins of increasing size and complexity. We will need to understand the structure of proteins, their interaction with other biomolecules, their roles within different biological systems and their potential manipulation by genetic or chemical methods. The remaining chapters in this book represent an attempt to introduce and address some of these issues in a fundamental manner helpful to students.
2
Amino acids: the building blocks of proteins
Despite enormous functional diversity all proteins consist of a linear arrangement of amino acid residues assembled together into a polypeptide chain. Amino acids are the ‘building blocks’ of proteins and in order to understand the properties of proteins we must first describe the properties of the constituent 20 amino acids. All amino acids contain carbon, hydrogen, nitrogen and oxygen with two of the 20 amino acids also containing sulfur. Throughout this book a colour scheme based on the CPK model (after Corey, Pauling and Kultun, pioneers of ‘space-filling’ representations of molecules) is used. This colouring scheme shows nitrogen atoms in blue, oxygen atoms in red, carbon atoms are shown in light grey (occasionally black), sulfur is shown in yellow, and hydrogen, when shown, is either white, or to enhance viewing on a white background, a lighter shade of grey. To avoid unnecessary complexity ‘ball and stick’ representations of molecular structures are often shown instead of space-filling models. In other instances cartoon representations of structure are shown since they enhance visualization of organization whilst maintaining clarity of presentation.
In their isolated state amino acids are white crystalline solids. It is surprising that crystalline materials form the building blocks for proteins since these latter molecules are generally viewed as ‘organic’. The crystalline nature of amino acids is further emphasized by their high melting and boiling points and together these properties are atypical of most organic molecules. Organic molecules are not commonly crystalline nor do they have high melting and boiling points. Compare, for example, alanine and propionic acid – the former is a crystalline amino acid and the other is a volatile organic acid. Despite similar molecular weights (89 and 74) their respective melting points are 314°C and –20.8°C. The origin of these differences and the unique properties of amino acids resides in their ionic and dipolar nature.
Amino acids are held together in a crystalline lattice by charged interactions and these relatively strong forces contribute to high melting and boiling points. Charge groups are also responsible for electrical conductivity in aqueous solutions (amino acids are electrolytes), their relatively high solubility in water and the large dipole moment associated with crystalline material. Consequently amino acids are best viewed as charged molecules that crystallize from solutions containing dipolar ions. These dipolar ions are called zwitterions. A proper representation of amino acids reflects amphoteric behaviour and amino acids are always represented as the zwitterionic state in this textbook as opposed to the undissociated form. For 19 of the twenty amino acids commonly found in proteins a general structure for the zwitterionic state has charged amino (NH3+) and carboxyl (COO–) groups attached to a central carbon atom called the α carbon. The remaining atoms connected to the α carbon are a single hydrogen atom and the R group or side chain (Figure 2.1).
Figure 2.1 A skeletal model of a generalized amino acid showing the amino (blue) carboxyl (red) and R groups attached to a central or α carbon
At pH 7 the amino and carboxyl groups are charged but over a pH range from 1 to 14 these groups exhibit a series of equilibria involving binding and dissociation of a proton. The binding and dissociation of a proton reflects the role of these groups as weak acids or weak bases. The acid–base behaviour of amino acids is important since it influences the eventual properties of proteins, permits methods of identification for different amino acids and dictates their reactivity. The amino group, characterized by a basic pK value of approximately 9, is a weak base. Whilst the amino group ionizes around pH 9.0 the carboxyl group remains charged until a pH of ~2.0 is reached. At this pH a proton binds neutralizing the charge of the carboxyl group. In each case the carboxyl and amino groups ionize according to the equilibrium
(2.1)
where HA, the proton donor, is either –COOH or –NH3+ and A– the proton acceptor is either –COO– or –NH2. The extent of ionization depends on the equilibrium constant
(2.2)
and it becomes straightforward to derive the relationship
(2.3)
known as the Henderson–Hasselbalch equation (see appendix). For a simple amino acid such as alanine a biphasic titration curve is observed when a solution of the amino acid (a weak acid) is titrated with sodium hydroxide (a strong base). The titration curve shows two zones where the pH changes very slowly after additions of small amounts of acid or alkali (Figure 2.2). Each phase reflects different pK values associated with ionizable groups.
Figure 2.2 Titration curve for alanine showing changes in pH with addition of sodium hydroxide
Figure 2.3 The three major forms of alanine occurring in titrations between pH 1 and 14
Amino acids lacking charged side chains show similar values for pK1 of about 2.3 that are significantly lower than the corresponding values seen in simple organic acids such as acetic acid (pK1 ~4.7). Amino acids are stronger acids than acetic acid as a result of the electrophilic properties of the α amino group that increase the tendency for the carboxyl hydrogen to dissociate.
Although an amino acid is represented by the skeletal diagram of Figure 2.1 it is more revealing, and certainly more informative, to impose a stereochemical view on the arrangement of atoms. In these views an attempt is made to represent the positions in space of each atom. The amino, carboxyl, hydrogen and R groups are arranged tetrahedrally around the central α carbon (Figure 2.4).
Table 2.1 The pK values for the α-carboxyl, α-amino groups and side chains found in the individual amino acids
Figure 2.4 The spatial arrangement of atoms in the amino acid alanine
The nitrogen atom (blue) is part of the amino (–NH3+) group, the oxygen atoms (red) are part of the carboxyl (–COO–) group. The remaining groups joined to the α carbon are one hydrogen atom and the R group.
The R group is responsible for the different properties of individual amino acids. As amino acids make up proteins the properties of the R group contribute considerably to the physical properties of proteins. Nineteen of the 20 amino acids found in proteins have the arrangement shown by Figure 2.4 but for the remaining amino acid, proline, an unusual cyclic ring is formed by the side chain bonding directly to the amide nitrogen (Figure 2.5).
Figure 2.5 The structure of proline – an unusual amino acid containing a five-membered pyrrolidine ring
A glance at the structures of the 20 different side chains reveals major differences in, for example, size, charge and hydrophobicity although the R group is always attached to the α carbon (C2 carbon). From the α carbon subsequent carbon atoms in the side chains are designated as β, γ, δ, ε and ζ. In some databases of protein structures the Cβ is written as CB, the Cδ as CD, Cζ as CZ, etc. Both nomenclatures are widely used. The nomenclature is generally unambiguous but care needs to be exercised when describing the atoms of the side chain of isoleucine. Isoleucine has a branched side chain in which the Cγ or CG is either a methyl group or a methylene group. In this instance the two groups are distinguished by the use of a subscript 1 and 2, i.e. CG1 and CG2. A similar line of reasoning applies to the carbon atoms of aromatic rings. In phenylalanine, for example, the aromatic ring is linked to the Cβ atom by the Cγ atom and contains two Cδ and Cε atoms (Cδ1 and Cδ2, Cε1 and Cε2) before completing ring at the Cζ (or CZ) atom.
Amino acids are joined together by the formation of a peptide bond where the amino group of one molecule reacts with the carboxyl group of the other. The reaction is described as a condensation resulting in the elimination of water and the formation of a dipeptide (Figure 2.6).
