Proteins - David Whitford - E-Book

Proteins E-Book

David Whitford

0,0
59,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Proteins: Structure and Function is a comprehensive introduction to the study of proteins and their importance to modern biochemistry. Each chapter addresses the structure and function of proteins with a definitive theme designed to enhance student understanding. Opening with a brief historical overview of the subject the book moves on to discuss the ‘building blocks’ of proteins and their respective chemical and physical properties. Later chapters explore experimental and computational methods of comparing proteins, methods of protein purification and protein folding and stability.

The latest developments in the field are included and key concepts introduced in a user-friendly way to ensure that students are able to grasp the essentials before moving on to more advanced study and analysis of proteins.

An invaluable resource for students of Biochemistry, Molecular Biology, Medicine and Chemistry providing a modern approach to the subject of Proteins.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1118

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Preface

1 An Introduction to protein structure and function

A brief and very selective historical perspective

The biological diversity of proteins

Proteins and the sequencing of the human and other genomes

Why study proteins?

2 Amino acids: the building blocks of proteins

The 20 amino acids found in proteins

The acid–base properties of amino acids

Stereochemical representations of amino acids

Peptide bonds

The chemical and physical properties of amino acids

Detection, identification and quantification of amino acids and proteins

Stereoisomerism

Non-standard amino acids

Summary

Problems

3 The three-dimensional structure of proteins

Primary structure or sequence

Secondary structure

Tertiary structure

Quaternary structure

The globin family and the role of quaternary structure in modulating activity

Immunoglobulins

Cyclic proteins

Summary

Problems

4 The structure and function of fibrous proteins

The amino acid composition and organization of fibrous proteins

Keratins

Fibroin

Collagen

Summary

Problems

5 The structure and function of membrane proteins

The molecular organization of membranes

Membrane protein topology and function seen through organization of the erythrocyte membrane

Bacteriorhodopsin and the discovery of seven transmembrane helices

The structure of the bacterial reaction centre

Oxygenic photosynthesis

Photosystem I

Membrane proteins based on transmembrane β barrels

Respiratory complexes

Complex III, the ubiquinol-cytochrome c oxidoreductase

Complex IV or cytochrome oxidase

The structure of ATP synthetase

ATPase family

Summary

Problems

6 The diversity of proteins

Prebiotic synthesis and the origins of proteins

Evolutionary divergence of organisms and its relationship to protein structure and function

Protein sequence analysis

Protein databases

Gene fusion and duplication

Secondary structure prediction

Genomics and proteomics

Summary

Problems

7 Enzyme kinetics, structure, function, and catalysis

Enzyme nomenclature

Enzyme co-factors

Chemical kinetics

The transition state and the action of enzymes

The kinetics of enzyme action

Catalytic mechanisms

Enzyme structure

Lysozyme

The serine proteases

Triose phosphate isomerase

Tyrosyl tRNA synthetase

EcoRI restriction endonuclease

Enzyme inhibition and regulation

Irreversible inhibition of enzyme activity

Allosteric regulation

Covalent modification

Isoenzymes or isozymes

Summary

Problems

8 Protein synthesis, processing and turnover

Cell cycle

The structure of Cdk and its role in the cell cycle

Cdk–cyclin complex regulation

DNA replication

Transcription

Eukaryotic transcription factors: variation on a ‘basic’ theme

The spliceosome and its role in transcription

Translation

Transfer RNA (tRNA)

The composition of prokaryotic and eukaryotic ribosomes

A structural basis for protein synthesis

An outline of protein synthesis

Antibiotics provide insight into protein synthesis

Affinity labelling and RNA ‘footprinting’

Structural studies of the ribosome

Post-translational modification of proteins

Protein sorting or targeting

The nuclear pore assembly

Protein turnover

Apoptosis

Summary

Problems

9 Protein expression, purification and characterization

The isolation and characterization of proteins

Recombinant DNA technology and protein expression

Purification of proteins

Centrifugation

Solubility and ‘salting out’ and ‘salting in’

Chromatography

Dialysis and ultrafiltration

Polyacrylamide gel electrophoresis

Mass spectrometry

How to purify a protein?

Summary

Problems

10 Physical methods of determining the three-dimensional structure of proteins

Introduction

The use of electromagnetic radiation

X-ray crystallography

Nuclear magnetic resonance spectroscopy

Cryoelectron microscopy

Neutron diffraction

Optical spectroscopic techniques

Vibrational spectroscopy

Raman spectroscopy

ESR and ENDOR

Summary

Problems

11 Protein folding in vivo and in vitro

Introduction

Factors determining the protein fold

Factors governing protein stability

Folding problem and Levinthal’s paradox

Models of protein folding

Amide exchange and measurement of protein folding

Kinetic barriers to refolding

In vivo protein folding

Membrane protein folding

Protein misfolding and the disease state

Summary

Problems

12 Protein structure and a molecular approach to medicine

Introduction

Sickle cell anaemia

Viruses and their impact on health as seen through structure and function

HIV and AIDS

The influenza virus

p53 and its role in cancer

Emphysema and α1-antitrypsin

Summary

Problems

Epilogue

Glossary

Appendices

Bibliography

References

Index

Copyright © 2005

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England

 

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], or faxed to (+44) 1243 770620.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 0-471-49893-9 HBISBN 0-471-49894-7 PB

For my parents,Elizabeth and Percy Whitford,to whom I owe everything

Preface

When I first started studying proteins as an undergraduate I encountered for the first time complex areas of biochemistry arising from the pioneering work of Pauling, Sumner, Kendrew, Perutz, Anfinsen, together with other scientific ‘giants’ too numerous to describe at length in this text. The area seemed complete. How wrong I was and how wrong an undergraduate’s perception can be! The last 30 years have seen an explosion in the area of protein biochemistry so that my 1975 edition of Biochemistry by Albert Lehninger remains, perhaps, of historical interest only. The greatest change has occurred through the development of molecular biology where fragments of DNA are manipulated in ways previously unimagined. This has enabled DNA to be sequenced, cloned, manipulated and expressed in many different cells. As a result areas of recombinant DNA technology and protein engineering have evolved rapidly to become specialist disciplines in their own right. Almost any protein whose primary sequence is known can be produced in large quantity via the expression of cloned or synthetic genes in recombinant host cells. Not only is the method allowing scientists to study some proteins for the first time but the increased amount of protein derived from recombinant DNA technology is also allowing the application of new and continually advancing structural techniques. In this area X-ray crystallography has remained at the forefront for over 40 years as a method of determining protein structure but it is now joined by nuclear magnetic resonance (NMR) spectroscopy and more recently by cryoelectron microscopy whilst other methods such as circular dichroism, infrared and Raman spectroscopy, electron spin resonance spectroscopy, mass spectrometry and fluorescence provide more limited, yet often vital and complementary, structural data. In many instances these methods have become established techniques only in the last 20 years and are consequently absent in many of those familiar textbooks occupying the shelves of university libraries.

An even greater impact on biochemistry has occurred with the rapid development of cost-effective, powerful, desktop computers with performance equivalent to the previous generation of supercomputers. Many experimental techniques relied on the codevelopment of computer hardware but software has also played a vital role in protein biochemistry. We can now search databases comparing proteins at the level of DNA or amino acid sequences, building up patterns of homology and relationships that provide insight into origin and possible function. In addition we use computers routinely to calculate properties such as isoelectric point, number of hydrophobic residues or secondary structure – something that would have been extraordinarily tedious, time consuming and problematic 20 years ago. Computers have revolutionized all aspects of protein biochemistry and there is little doubt that their influence will continue to increase in the forthcoming decades. The new area of bioinformatics reflects these advances in computing.

In my attempt to construct an introductory yet extensive text on proteins I have, of necessity, been circumspect in my description of the subject area. I have often relied on qualitative rather than quantitative descriptions and I have attempted to minimise the introduction of unwieldy equations or formulae. This does not reflect my own interests in physical biochemistry because my research, I hope, was often quantitative. In some cases particularly the chapters on enzymes and physical methods the introduction of equations is unavoidable but also necessary to an initial description of the content of these chapters. I would be failing in my duty as an educator if I omitted some of these equations and I hope students will keep going at these ‘difficult’ points or failing that just omit them entirely on first reading this book. However, in general I wish to introduce students to proteins by describing principles governing their structure and function and to avoid over-complication in this presentation through rigorous and quantitative treatment. This book is firmly intended to be a broad introductory text suitable for undergraduate and postgraduate study, perhaps after an initial exposure to the subject of protein biochemistry, whilst at the same time introducing specialist areas prior to future advanced study. I hope the following chapters will help to direct students to the amazing beauty and complexity of protein systems.

Target audience

The present text should be suitable for all introductory modes of biochemistry, molecular biology, chemistry, medicine and dentistry. In the UK this generally means the book is suitable for all undergraduates between years 1 and 3 and this book has stemmed from lectures given as parts of biochemistry courses to students of biochemistry, chemistry, medicine and dentistry in all 3 years. Where possible each chapter is structured to increase progressively in complexity. For purely introductory courses as would occur in years 1 or 2 it is sufficient to read only the first parts, or selected sections, of each chapter. More advanced courses may require thorough reading of each chapter together with consultation of the bibliography and secondly the list of references given at the end of the book.

The world wide web

In the last ten years the world wide web (WWW) has transformed information available to students. It provides a new and useful medium with which to deliver lecture notes and an exciting and new teaching resource for all. Consequently within this book URLs direct students to learning resources and a list of important addresses is included in the appendix. In an effort to exploit the power of the internet this book is associated with ‘web-based’ tutorials, problems and content and is accessed from the following URL http://www.wiley.com/go/whitfordproteins. These ‘pages’ are continually updated and point the interested reader towards new areas as they emerge. The Bibliography points interested readers towards further study material suitable for a first introduction to a subject whilst the list of references provides original sources for many areas covered in each of the twelve chapters.

For the problems included at the end of each chapter there are approximately 10 questions that aim to build on the subject matter discussed in the preceding text. Often the questions will increase in difficulty although this is not always the case. In this book I have limited the bibliography to broad reviews or accessible journal papers and I have deliberately restricted the number of ‘high-powered’ (difficult!) articles since I believe this organization is of greater use to students studying these subjects for the first time. To aid the learning process the web edition has multiple-choice questions for use as a formative assessment exercise. I should certainly like to hear of all mistakes or omissions encountered in this text and my hope is that educators and students will let me know via the e-mail address at the end of this section of any required corrections or additions.

Proteins are three-dimensional (3D) objects that are inadequately represented on book pages. Consequently many proteins are best viewed as molecular images using freely available software. Here, real-time manipulation of coordinate files is possible and will prove helpful to understanding aspects of structure and function. The importance of viewing, manipulating and even changing the representation of proteins to comprehending structure and function cannot be underestimated. Experience has suggested that the use of computers in this area can have a dramatic effect on student’s understanding of protein structures. The ability to visualize in 3D conveys so much information – far more than any simple 2D picture in this book could ever hope to portray. Alongside many figures I have written the Protein DataBank files (e.g. PDB: 1HKO) used to produce diagrams. These files can be obtained from databases at several permanent sites based around the world such as http://www.rscb.org/pdb or one of the many ‘mirrors’ that exist (for example, in the UK this data is found at http://pdb.ccdc.cam.ac.uk). For students with Internet access each PDB file can be retrieved and manipulated independently to produce comparable images to those shown in the text. To explore these macromolecular images with reasonable efficiency does not require the latest ‘all-powerful’ desktop computer. A computer with a Pentium III (or later) based processor, a clock speed of 200 MHz or greater, 32–64 MB RAM, hard disks of 10 GB, a graphics video card with at least 8 MB memory and a connection to the internet are sufficient to view and store a significant number of files together with representative images. Of course things are easier with a computer with a surfeit of memory (>256 MB) and a high ‘clock’ speed (>2 GHz) but it is not obligatory to see ‘on-line’ content or to manipulate molecular images. This book was started on a 700 MHz Pentium III based processor equipped with 256 MB RAM and 16 MB graphics card.

Organization of this book

This book will address the structure and function of proteins in 12 subsequent chapters each with a definitive theme. After an initial chapter describing why one would wish to study proteins and a brief historical background the second chapter deals with the ‘building blocks’ of proteins, namely the amino acids together with their respective chemical and physical properties. No attempt is made at any point to describe the metabolism connected with these amino acids and the reader should consult general textbooks for descriptions of the synthesis and degradation of amino acids. This is a major area in its own right and would have lengthened the present book too much. However, I would like to think that students will not avoid these areas because they remain an equally important subject that should be covered at some point within the undergraduate curriculum. Chapter 3 covers the assembly of amino acids into polypeptide chains and levels of organizational structure found within proteins. Almost all detailed knowledge of protein structure and function has arisen through studies of globular proteins but the presence of fibrous proteins with different structures and functional properties necessitated a separate chapter devoted to this area (Chapter 4). Within this class the best understood structures are those belonging to the collagen class of proteins, the keratins and the extended β sheet structures such as silk fibroin. The division between globular proteins and fibrous proteins was made at a time when the only properties one could compare readily were a protein’s amino acid composition and hydrodynamic radius. It is now apparent that other proteins exist with properties intermediate between globular and fibrous proteins that do not lend themselves to simple classification. However, the ‘old’ schemes of identification retain their value and serve to emphasize differences in proteins.

Membrane proteins represent a third group with different composition and properties. Most of these proteins are poorly understood, but there have been spectacular successes from the initial low-resolution structure of bacteriorhodopsin to the highly defined structure of bacterial photosynthetic reaction centres. These advances paved the way towards structural studies of G proteins and G-protein coupled receptors, the respiratory complexes from aerobic bacteria and the structure of ATP synthetases.

Chapter 6 focuses both on experimental and computational methods of comparing proteins where in silico methods have become increasingly important as a vital tool to assist with modern protein biochemistry. Chapter 7 focuses on enzymes and by discussing basic reaction rate theories and kinetics the chapter leads to a discussion of enzyme-catalysed reactions. Enzymes catalyse reactions through a variety of mechanisms including acid–base catalysis, nucleophilic driven chemistry and transition state stabilization. These and other mechanisms are described along with the principles of regulation, active site chemistry and binding.

The involvement of proteins in the cell cycle, transcription, translation, sorting and degradation of proteins is described in Chapter 8. In 50 years we have progressed from elucidating the structure of DNA to uncovering how this information is converted into proteins. The chapter is based around the structure of two macromolecular systems: the ribosome devoted towards accurate and efficient synthesis and the proteasome designed to catalyse specific proteolysis. Chapter 9 deals with the methods of protein purification. Very often, biochemistry textbooks describe techniques without placing the technique in the correct context. As a result, in Chapter 9 I have attempted to describe equipment as well as techniques so that students may obtain a proper impression of this area.

Structural methods determine the topology or fold of proteins. With an elucidation of structure at atomic levels of resolution comes an understanding of biological function. Chapter 10 addresses this area by describing different techniques. X-ray crystallography remains at the forefront of research with new variations of the basic principle allowing faster determination of structure at improved resolution. NMR methods yield structures of comparable resolution to crystallography for small soluble proteins. In ideal situations these methods provide complete structural determination of all heavy atoms but they are complemented by other spectroscopic methods such as absorbance and fluorescence methods, mass spectrometry and infrared spectroscopy. These techniques provide important ancillary information on tertiary structure such as the helical content of the protein, the proportion and environment of aromatic residues within a protein as well as secondary structure content.

Chapter 11 describes protein folding and stability – a subject that has generated intense research interest with the recognition that disease states arise from aberrant folding or stability. The mechanism of protein folding is illustrated by in vitro and in vivo studies. Whilst the broad concepts underlying protein folding were deduced from studies of ‘model’ proteins such as ribonuclease, analysis of cell folding pathways has highlighted specialised proteins, chaperones, with a critical function to the overall process. The GroES–GroEL complex is discussed to highlight the integrated process of synthesis and folding in vivo.

The final chapter builds on the preceding 11 chapters using a restricted set of well-studied proteins (case studies) with significant impact on molecular medicine. These proteins include haemoglobin, viral proteins, p53, prions and α1-antitrypsin. Although still a young subject area this branch of protein science will expand in the next few years and will rely on the techniques, knowledge and principles elucidated in Chapters 1–11. The examples emphasize the impact of protein science and molecular medicine on the quality of human life.

Acknowledgements

I am indebted to all research students and post-docs who shared my laboratories at the Universities of London and Oxford during the last 15 years in many cases acting as ‘test subjects’ for teaching ideas. I should like to thank Drs Roger Hewson, Richard Newbold and Susan Manyusa whose comments throughout my research and teaching career were always valued. I would also like to thank individuals, too numerous to name, with whom I interacted at King’s College London, Imperial College of Science, Technology and Medicine and the University of Oxford. In this context I should like to thank Dr John Russell, formerly of Imperial College London whose goodwill, humour and fantastic insight into the history of science, the scientific method and ‘day to day’ experimentation prevented absolute despair.

During preparation of this book many individuals read and contributed valuable comments to the manuscript’s content, phrasing and ideas. In particular I wish to thank these unnamed and some times unknown individuals who read one or more of the chapters of this book. As is often said by most authors at this point despite their valuable contributions all of the remaining errors and deficiencies in the current text are my responsibility. In this context I could easily have spent more months attempting to perfect the current text. I am very aware that this text has deficiencies but I hope these defects will not detract from its value. In addition my wish to try other avenues, other roads not taken, dictates that this manuscript is completed without delay.

Writing and producing a textbook would not be possible without the support of a good publisher. I should like to thank all the staff at John Wiley & Sons, Chichester, UK. This exhaustive list includes particularly Andrew Slade as senior Publishing Editor who helped smooth the bumpy route towards production of this book, Lisa Tickner who first initiated events leading to commissioning this book, Rachel Ballard who supervised day to day business on this book, replacing every form I lost without complaint and monitoring tactfully and gently about possible completion dates, Robert Hambrook who translated my text and diagrams into a beautiful book, and the remainder of the production team of John Wiley and Sons. Together we inched our way towards the painfully slow production of this text, although the pace was entirely attributable to the author.

Lastly I must also thank Susan who tolerated the protracted completion of this book, reading chapters and offering support for this project throughout whilst coping with the arrival of Alexandra and Ethan effortlessly (unlike their father).

David Whitford

April [email protected]

1

An Introduction to protein structure and function

Biochemistry has exploded as a major scientific endeavour over the last one hundred years to rival previously established disciplines such as chemistry and physics. This occurred with the recognition that living systems are based on the familiar elements of organic chemistry (carbon, oxygen, nitrogen and hydrogen) together with the occasional involvement of inorganic chemistry and elements such as iron, copper, sodium, potassium and magnesium. More importantly the laws of physics including those concerning thermodynamics, electricity and quantum physics are applicable to biochemical systems and no ‘vital’ force distinguishes living from non-living systems. As a result the laws of chemistry and physics are successfully applied to biochemistry and ideas from physics and chemistry have found widespread application, frequently revolutionizing our understanding of complex systems such as cells.

This book focuses on one major component of all living systems – the proteins. Proteins are found in all living systems ranging from bacteria and viruses through the unicellular and simple eukaryotes to vertebrates and higher mammals such as humans. Proteins make up over 50 percent of the dry weight of cells and are present in greater amounts than any other biomolecule. Proteins are unique amongst the macromolecules in underpinning every reaction occurring in biological systems. It goes without saying that one should not ignore the other components of living systems since they have indispensable roles, but in this text we will consider only proteins.

A brief and very selective historical perspective

With the vast accumulation of knowledge about proteins over the last 50 years it is perhaps surprising to discover that the term protein was introduced nearly 170 years ago. One early description was by Gerhardus Johannes Mulder in 1839 where his studies on the composition of animal substances, chiefly fibrin, albumin and gelatin, showed the presence of carbon, hydrogen, oxygen and nitrogen. In addition he recognized that sulfur and phosphorus were present sometimes in ‘animal substances’ that contained large numbers of atoms. In other words, he established that these ‘substances’ were macromolecules. Mulder communicated his results to Jöns Jakob Berzelius and it is suggested the term protein arose from this interaction where the origin of the word protein has been variously ascribed to derivation from the Latin word primarius or from the Greek god Proteus. The definition of proteins was timely since in 1828 Friedrich Wohler had shown that heating ammonium cyanate resulted in isomerism and the formation of urea (Figure 1.1). Organic compounds characteristic of living systems, such as urea, could be derived from simple inorganic chemicals. For many historians this marks the beginning of biochemistry and it is appropriate that the discovery of proteins occurred at the same period.

Figure 1.1 The decomposition of ammonium cyanate yields urea

The development of biochemistry and the study of proteins was assisted by analysis of their composition and structure by Heinrich Hlasiwetz and Josef Habermann around 1873 and the recognition that proteins were made up of smaller units called amino acids. They established that hydrolysis of casein with strong acids or alkali yielded glutamic acid, aspartic acid, leucine, tyrosine and ammonia whilst the hydrolysis of other proteins yielded a different group of products. Importantly their work suggested that the properties of proteins depended uniquely on the constituent parts – a theme that is equally relevant today in modern biochemical study.

Another landmark in the study of proteins occurred in 1902 with Franz Hofmeister establishing the constituent atoms of the peptide bond with the polypeptide backbone derived from the condensation of free amino acids. Five years earlier Eduard Buchner revolutionized views of protein function by demonstrating that yeast cell extracts catalysed fermentation of sugar into ethanol and carbon dioxide. Previously it was believed that only living systems performed this catalytic function. Emil Fischer further studied biological catalysis and proposed that components of yeast, which he called enzymes, combined with sugar to produce an intermediate compound. With the realization that cells were full of enzymes 100 years of research has developed and refined these discoveries. Further landmarks in the study of proteins could include Sumner’s crystallization of the first enzyme (urease) in 1926 and Pauling’s description of the geometry of the peptide bond; however, extensive discussion of these advances and many other important discoveries in protein biochemistry are best left to history of science textbooks.

A brief look at the award of the Nobel Prizes for Chemistry, Physiology and Medicine since 1900 highlighted in Table 1.1 reveals the involvement of many diverse areas of science in protein biochemistry. At first glance it is not obvious why William and Lawrence Bragg’s discovery of the diffraction of X-rays by sodium chloride crystals is relevant, but diffraction by protein crystals is the main route towards biological structure determination. Their discovery was the first step in the development of this technique. Discoveries in chemistry and physics have been implemented rapidly in the study of proteins. By 1958 Max Perutz and John Kendrew had determined the first protein structure and this was soon followed by the larger, multiple subunit, structure of haemoglobin and the first enzyme, lysozyme. This remarkable advance in knowledge extended from initial understanding of the atomic composition of proteins around 1900 to the determination of the three-dimensional structure of proteins in the 1960s and represents a major chapter of modern biochemistry. However, advances have continued with new areas of molecular biology proving equally important to understanding protein structure and function.

Life may be defined as the ordered interaction of proteins and all forms of life from viruses to complex, specialized, mammalian cells are based on proteins made up of the same building blocks or amino acids. Proteins found in simple unicellular organisms such as bacteria are identical in structure and function to those found in human cells illustrating the evolutionary lineage from simple to complex organisms.

Molecular biology starts with the dramatic elucidation of the structure of the DNA double helix by James Watson, Francis Crick, Rosalind Franklin and Maurice Wilkins in 1953. Today, details of DNA replication, transcription into RNA and the synthesis of proteins (translation) are extensive. This has established an enormous body of knowledge representing a whole new subject area. All cells encode the information content of proteins within genes, or more accurately the order of bases along the DNA strand, yet it is the conversion of this information or expression into proteins that represents the tangible evidence of a living system or life.

Table 1.1 Selected landmarks in the study of protein structure and function from 1900–2002 as seen by the award of the Nobel Prize for Chemistry, Physiology or Medicine

Date

Discoverer + Discovery

1901

Wilhelm Conrad Röntgen ‘in recognition of the…discovery of the remarkable rays subsequently named after him’

1907

Eduard Buchner ‘cell-free fermentation’

1914

Max von Laue ‘for his discovery of the diffraction of X-rays by crystals’

1915

William Henry Bragg and William Lawrence Bragg ‘for their services in the analysis of crystal structure by…X-rays‘

1923

Frederick Grant Banting and John James Richard Macleod ‘for the discovery of insulin‘

1930

Karl Landsteiner ‘for his discovery of human blood groups‘

1946

James Batcheller Sumner ‘for his discovery that enzymes can be crystallized‘ John Howard Northrop and Wendell Meredith Stanley ‘for their preparation of enzymes and virus proteins in pure form‘

1948

Arne Wilhelm Kaurin Tiselius ‘for his research on electrophoresis and adsorption analysis, especially for his discoveries concerning the complex nature of the serum proteins‘

1952

Archer John Porter Martin and Richard Laurence Millington Synge ‘for their invention of partition chromatography‘

1952

Felix Bloch and Edward Mills Purcell ‘for their development of new methods for nuclear magnetic precision measurements and discoveries in connection therewith‘

1954

Linus Carl Pauling ‘for his research into the nature of the chemical bond and…to the elucidation of…complex substances‘

1958

Frederick Sanger ‘for his work on the structure of proteins, especially that of insulin‘

1959

Severo Ochoa and Arthur Kornberg ‘for their discovery of the mechanisms in the biological synthesis of ribonucleic acid and deoxyribonucleic acid‘

1962

Max Ferdinand Perutz and John Cowdery Kendrew ‘for their studies of the structures of globular proteins‘

1962

Francis Harry Compton Crick, James Dewey Watson and Maurice Hugh Frederick Wilkins ‘for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer in living material‘

1964

Dorothy Crowfoot Hodgkin ‘for her determinations by X-ray techniques of the structures of important biochemical substances‘

1965

François Jacob, André Lwoff and Jacques Monod ‘for discoveries concerning genetic control of enzyme and virus synthesis’

1968

Robert W. Holley, Har Gobind Khorana and Marshall W. Nirenberg ‘for…the genetic code and its function in protein synthesis’

1969

Max Delbrück, Alfred D. Hershey and Salvador E. Luria ‘for their discoveries concerning the replication mechanism and the genetic structure of viruses’

1972

Christian B. Anfinsen ‘for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation’ Stanford Moore and William H. Stein ‘for their contribution to the understanding of the connection between chemical structure and catalytic activity of…ribonuclease molecule’

1972

Gerald M. Edelman and Rodney R. Porter ‘for their discoveries concerning the chemical structure of antibodies’

1975

John Warcup Cornforth ‘for his work on the stereochemistry of enzyme-catalyzed reactions’Vladimir Prelog ‘for his research into the stereochemistry of organic molecules and reactions’

1975

David Baltimore, Renato Dulbecco and Howard Martin Temin ‘for their discoveries concerning the interaction between tumour viruses and the genetic material of the cell’

1978

Werner Arber, Daniel Nathans and Hamilton O. Smith ‘for the discovery of restriction enzymes and their application to problems of molecular genetics’

1980

Paul Berg ‘for his fundamental studies of the biochemistry of nucleic acids, with particular regard to recombinant-DNA’ Walter Gilbert and Frederick Sanger ‘for their contributions concerning the determination of base sequences in nucleic acids’

1982

Aaron Klug ‘development of crystallographic electron microscopy and structural elucidation of nucleic acid-protein complexes’

1984

Robert Bruce Merrifield ‘for his development of methodology for chemical synthesis on solid matrix’

1984

Niels K. Jerne, Georges J.F. Köhler and César Milstein ‘for theories concerning the specificity in development and control of the immune system and the discovery of the principle for production of monoclonal antibodies’

1988

Johann Deisenhofer, Robert Huber and Hartmut Michel ‘for the determination of the structure of photosynthetic reaction centre’

1989

J. Michael Bishop and Harold E. Varmus ‘for their discovery of the cellular origin of retroviral oncogenes’

1991

Richard R. Ernst ‘for…the methodology of high resolution nuclear magnetic resonance spectroscopy’

1992

Edmond H. Fischer and Edwin G. Krebs ‘for their discoveries concerning reversible protein phosphorylation as biological regulatory mechanism’

1993

Kary B. Mullis ‘for his invention of the polymerase chain reaction (PCR) method’ and Michael Smith ‘for his fundamental contributions to the establishment of oligonucleotide-based, site-directed mutagenesis’

1994

Alfred G. Gilman and Martin Rodbell ‘for their discovery of G-proteins and the role of these proteins in signal transduction’

1997

Paul D. Boyer and John E. Walker ‘for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP)’Jens C. Skou ‘for the first discovery of an ion-transporting enzyme, Na+,K+-ATPase’

1997

Stanley B. Prusiner ‘for his discovery of prions new biological principle of infection’

1999

Günter Blobel ‘for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell’

2000

Arvid Carlsson, Paul Greengard and Eric Kandel ‘signal transduction in the nervous system’

2001

Paul Nurse, Tim Hunt and Leland Hartwill ‘for discoveries of key regulators of the cell cycle’

2002

Kurt Wuthrich, ‘for development of NMR spectroscopy as method of determining biological macromolecules structure in solution. John B. Fenn and Koichi Tanaka ‘for their development of soft desorption ionization methods for mass spectrometric analyses of biological macromolecules’ Sydney Brenner, H. Robert Horvitz and John E. Sulston ‘for their discoveries concerning genetic regulation of organ development and programmed cell death’

Cells divide, synthesize new products, secrete unwanted products, generate chemical energy to sustain these processes via specific chemical reactions, and in all of these examples the common theme is the mediation of proteins.

In 1944 the physicist Erwin Schrödinger posed the question ‘What is Life?’ in an attempt to understand the physical properties of a living cell. Schrödinger suggested that living systems obeyed all laws of physics and should not be viewed as exceptional but instead reflected the statistical nature of these laws. More importantly, living systems are amenable to study using many of the techniques familiar to chemistry and physics. The last 50 years of biochemistry have demonstrated this hypothesis emphatically with tools developed by physicists and chemists rapidly employed in biological studies. A casual perusal of Table 1.1 shows how quickly methodologies progress from discovery to application.

The biological diversity of proteins

Proteins have diverse biological functions ranging from DNA replication, forming cytoskeletal structures, transporting oxygen around the bodies of multicellular organisms to converting one molecule into another. The types of functional properties are almost endless and are continually being increased as we learn more about proteins. Some important biological functions are outlined in Table 1.2 but it is to be expected that this rudimentary list of properties will expand each year as new proteins are characterized. A formal demarcation of proteins into one class should not be pursued too far since proteins can have multiple roles or functions; many proteins do not lend themselves easily to classification schemes. However, for all chemical reactions occurring in cells a protein is involved intimately in the biological process. These proteins are united through their composition based on the same group of 20 amino acids. Although all proteins are composed of the same group of 20 amino acids they differ in their composition – some contain a surfeit of one amino acid whilst others may lack one or two members of the group of 20 entirely. It was realized early in the study of proteins that variation in size and complexity is common and the molecular weight and number of subunits (polypeptide chains) show tremendous diversity. There is no correlation between size and number of polypeptide chains. For example, insulin has a relative molecular mass of 5700 and contains two polypeptide chains, haemoglobin has a mass of approximately 65 000 and contains four polypeptide chains, and hexokinase is a single polypeptide chain with an overall mass of ~ 100000 (see Table 1.3).

Table 1.2 A selective list of some functional roles for proteins within cells

Function

Examples

Enzymes or catalytic proteins

Trypsin, DNA polymerases and ligases,

Contractile proteins

Actin, myosin, tubulin, dynein,

Structural or cytoskeletal proteins

Tropocollagen, keratin,

Transport proteins

Haemoglobin, myoglobin, serum albumin, ceruloplasmin, transthyretin

Effector proteins

Insulin, epidermal growth factor, thyroid stimulating hormone,

Defence proteins

Ricin, immunoglobulins, venoms and toxins, thrombin,

Electron transfer proteins

Cytochrome oxidase, bacterial photosynthetic reaction centre, plastocyanin, ferredoxin

Receptors

CD4, acetycholine receptor,

Repressor proteins

Jun, Fos, Cro,

Chaperones (accessory folding proteins)

GroEL, DnaK

Storage proteins

Ferritin, gliadin,

Table 1.3 The molecular masses of proteins together with the number of subunits. The term ‘subunit’ is synonymous with the number of polypeptide chains and is used interchangeably

Protein

Molecular mass

Subunits

Insulin

5700

2

Haemoglobin

64500

4

Tropocollagen

285000

3

Subtilisin

27500

1

Ribonuclease

12600

1

Aspartate   transcarbamoylase

310000

12

Bacteriorhodopsin

26800

1

Hexokinase

102000

1

Proteins are joined covalently and non-covalently with other biomolecules including lipids, carbohydrates, nucleic acids, phosphate groups, flavins, heme groups and metal ions. Components such as hemes or metal ions are often called prosthetic groups. Complexes formed between lipids and proteins are lipoproteins, those with carbohydrates are called glycoproteins, whilst complexes with metal ions lead to metalloproteins, and so on. The complexes formed between metal ions and proteins increases the involvement of elements of the periodic table beyond that expected of typical organic molecules (namely carbon, hydrogen, nitrogen and oxygen). Inspection of the periodic table (Figure 1.2) shows that at least 20 elements have been implicated directly in the structure and function of proteins (Table 1.4). Surprisingly elements such as aluminium and silicon that are very abundant in the Earth’s crust (8.1 and 25.7 percent by weight, respectively) do not occur in high concentration within cells. Aluminium is rarely, if ever, found as part of proteins whilst the role of silicon is confined to biomineralization where it is the core component of shells. The involvement of carbon, hydrogen, oxygen, nitrogen, phosphorus and sulfur is clear although the role of other elements, particularly transition metals, has been difficult to establish. Where transition metals occur in proteins there is frequently only one metal atom per mole of protein and led in the past to a failure to detect metal. Other elements have an inferred involvement from growth studies showing that depletion from the diet leads to an inhibition of normal cellular function. For metalloproteins the absence of the metal can lead to a loss of structure and function.

Metals such as Mo, Co and Fe are often found associated with organic co-factors such as pterin, flavins, cobalamin and porphyrin (Figure 1.3). These organic ligands hold metal centres and are often tightly associated to proteins.

Table 1.4 The involvement of trace elements in the structure and function of proteins

Element

Functional role

Sodium

Principal intracellular ion, osmotic balance

Potassium

Principal intracellular ion, osmotic balance

Magnesium

Bound to ATP/GTP in nucleotide binding proteins, found as structural component of hydrolase and isomerase enzymes

Calcium

Activator of calcium binding proteins such as calmodulin

Vanadium

Bound to enzymes such as chloroperoxidase.

Manganese

Bound to pterin co-factor in enzymes such as xanthine oxidase or sulphite oxidase. Also found in nitrogenase and as component of water splitting enzyme in higher plants.

Iron

Important catalytic component of heme enzymes involved in oxygen transport as well as electron transfer. Important examples are haemoglobin, cytochrome oxidase and catalase.

Cobalt

Metal component of vitamin B12 found in many enzymes.

Nickel

Co-factor found in hydrogenase enzymes

Copper

Involved as co-factor in oxygen transport systems and electron transfer proteins such as haemocyanin and plastocyanin.

Zinc

Catalytic component of enzymes such as carbonic anhydrase and superoxide dismutase.

Chlorine

Principal intracellular anion, osmotic balance

Iodine

Iodinated tyrosine residues form part of hormone thyroxine and bound to proteins

Selenium

Bound at active centre of glutathione peroxidase

Figure 1.2 The periodic table showing the elements highlighted in red known to have involvement in the structure and/or function of proteins. The involvement of some elements is contentious tungsten and cadmium are claimed to be associated with proteins yet these elements are also known to be toxic

Figure 1.3 Organic co-factors found in proteins. These co-factors are pterin, the isoalloxine ring found as part of flavin in FAD and FMN, the pyridine ring of NAD and its close analogue NADP and the porphyrin skeletons of heme and chlorophyll. R represents the remaining part of the co-factor whilst M and V signify methyl and vinyl side chains

Proteins and the sequencing of the human and other genomes

Recognition of the diverse roles of proteins in biological systems increased largely as a result of the enormous amount of sequencing information generated via the Human Genome Mapping project. Similar schemes aimed at deciphering the genomes of Escherichia coli, yeast (Sacharromyces cerevisiae), and mouse provided related information. With the completion of the first draft of the human genome mapping project in 2001 human chromosomes contain approximately 25–30 000 genes. This allows a conservative estimate of the number of polypeptides making up most human cells as ~25 000, although alternative splicing of genes and variations in subunit composition increase the number of proteins further. Despite sequencing the human genome it is an unfortunate fact that we do not know the role performed by most proteins. Of those thousands of polypeptides we know the structures of only a small number, emphasizing a large imbalance between the abundance of sequence data and the presence of structure/function information. An analysis of protein databases suggests about 1000 distinct structures or folds have been determined for globular proteins. Many proteins are retained within cell membranes and we know virtually nothing about the structures of these proteins and only slightly more about their functional roles. This observation has enormous consequences for understanding protein structure and function.

Why study proteins?

This question is often asked not entirely without reason by many undergraduates during their first introduction to the subject. Perhaps the best reply that can be given is that proteins underpin every aspect of biological activity. This is particularly important in areas where protein structure and function have an impact on human endeavour such as medicine. Advances in molecular genetics reveal that many diseases stem from specific protein defects. A classic example is cystic fibrosis, an inherited condition that alters a protein, called the cystic fibrosis transmembrane conductance regulator (CFTR), involved in the transport of sodium and chloride across epithelial cell membranes. This defect is found in Caucasian populations at a ratio of ~1 in 20, a surprisingly high frequency. With 1 in 20 of the population ‘carrying’ a single defective copy of the gene individuals who inherit defective copies of the gene from each parent suffer from the disease. In the UK the incidence of cystic fibrosis is approximately 1 in 2000 live births, making it one of the most common inherited disorders. The disease results in the body producing a thick, sticky mucus that blocks the lungs, leading to serious infection, and inhibits the pancreas, stopping digestive enzymes from reaching the intestines where they are required to digest food. The severity of cystic fibrosis is related to CFTR gene mutation, and the most common mutation, found in approximately 65 percent of all cases, involves the deletion of a single amino acid residue from the protein at position 508. A loss of one residue out of a total of nearly 1500 amino acid residues results in a severe decrease in the quality of life with individuals suffering from this disease requiring constant medical care and supervision.

Figure 1.4 The shape of erythrocytes in normal and sickle cell anemia arises from mutations to haemoglobin found within the red blood cell. (Reproduced with permission from Voet, D, Voet, J.G and Pratt, C.W. Fundamentals of Biochemistry. John Wiley & Sons Inc.)

Further examples emphasize the need to understand more about proteins. The pioneering studies of Vernon Ingram in the 1950s showed that sickle cell anemia arose from a mutation in the β chain of haemoglobin. Haemoglobin is a tetrameric protein containing 2α and 2β chains. In each of the β chains a mutation is found that involves the change of the sixth amino acid residue from a glutamic acid to a valine. The alteration of two residues out of 574 leads to a drastic change in the appearance of red blood cells from their normal biconcave disks to an elongated sickle shape (Figure 1.4).

As the name of the disease suggests individuals are anaemic showing decreased haemoglobin content in red blood cells from approximately 15 g per 100 ml to under half that figure, and show frequent illness. Our understanding of cystic fibrosis and of sickle cell anaemia has advanced in parallel with our understanding of protein structure and function although at best we have very limited and crude means of treating these diseases.

However, perhaps the greatest impetus to understand protein structure and function lies in the hope of overcoming two major health issues confronting the world in the 21st century. The first of these is cancer. Cancer is the uncontrolled proliferation of cells that have lost their normal regulated cell division often in response to a genetic or environmental trigger. The development of cancer is a multistep, multifactorial process often occurring over decades but the precise involvement of specific proteins has been demonstrated in some instances. One of the best examples is a protein called p53, normally present at low levels in cells, that ‘switches on’ in response to cellular damage and as a transcription factor controls the cell cycle process. Mutations in p53 alter the normal cycle of events leading eventually to cancer and several tumours including lung, colorectal and skin carcinomas are attributed to molecular defects in p53. Future research on p53 will enable its physicochemical properties to be thoroughly appreciated and by understanding the link between structure, folding, function and regulation comes the prospect of unravelling its role in tumour formation and manipulating its activity via therapeutic intervention. Already some success is being achieved in this area and the future holds great promise for ‘halting’ cancer by controlling the properties of p53 and similar proteins.

A second major problem facing the world today is the estimated number of people infected with the human immunodeficiency virus (HIV). In 2003 the World Health Organization (WHO) estimated that over 40 million individuals are infected with this virus in the world today. For many individuals, particularly those in the ‘Third World’, the prospect of prolonged good health is unlikely as the virus slowly degrades the body’s ability to fight infection through damage to the immune response mechanism and in particular to a group of cells called cytotoxic T cells. HIV infection encompasses many aspects of protein structure and function, as the virus enters cells through the interaction of specific viral coat proteins with receptors on the surface of white blood cells. Once inside cells the virus ‘hides’ but is secretly replicating and integrating genetic material into host DNA through the action of specific enzymes (proteins). Halting the destructive influence of HIV relies on understanding many different, yet inter-related, aspects of protein structure and function. Again, considerable progress has been made since the 1980s when the causative agent of the disease was recognized as a retrovirus. These advances have focussed on understanding the structure of HIV proteins and in designing specific inhibitors of, for example, the reverse transcriptase enzyme. Although in advanced health care systems these drugs (inhibitors) prolong life expectancy, the eradication of HIV’s destructive action within the body and hence an effective cure remains unachieved. Achieving this goal should act as a timely reminder for all students of biology, chemistry and medicine that success in this field will have a dramatic impact on the quality of human life in the forthcoming decades.

Central to success in treating any of the above diseases are the development of new medicines, many based on proteins. The development of new therapies has been rapid during the last 20 years with the list of new treatments steadily increasing and including minimizing serious effects of different forms of cancer via the use of specific proteins including monoclonal antibodies, alleviating problems associated with diabetes by the development of improved recombinant ‘insulins’ and developing ‘clot-busting’ drugs (proteins) for the management of strokes and heart attacks. This highly selective list is the productive result of understanding protein structure and function and has contributed to a marked improvement in disease management. For the future these advances will need to be extended to other diseases and will rely on an extensive and thorough knowledge of proteins of increasing size and complexity. We will need to understand the structure of proteins, their interaction with other biomolecules, their roles within different biological systems and their potential manipulation by genetic or chemical methods. The remaining chapters in this book represent an attempt to introduce and address some of these issues in a fundamental manner helpful to students.

2

Amino acids: the building blocks of proteins

Despite enormous functional diversity all proteins consist of a linear arrangement of amino acid residues assembled together into a polypeptide chain. Amino acids are the ‘building blocks’ of proteins and in order to understand the properties of proteins we must first describe the properties of the constituent 20 amino acids. All amino acids contain carbon, hydrogen, nitrogen and oxygen with two of the 20 amino acids also containing sulfur. Throughout this book a colour scheme based on the CPK model (after Corey, Pauling and Kultun, pioneers of ‘space-filling’ representations of molecules) is used. This colouring scheme shows nitrogen atoms in blue, oxygen atoms in red, carbon atoms are shown in light grey (occasionally black), sulfur is shown in yellow, and hydrogen, when shown, is either white, or to enhance viewing on a white background, a lighter shade of grey. To avoid unnecessary complexity ‘ball and stick’ representations of molecular structures are often shown instead of space-filling models. In other instances cartoon representations of structure are shown since they enhance visualization of organization whilst maintaining clarity of presentation.

The 20 amino acids found in proteins

In their isolated state amino acids are white crystalline solids. It is surprising that crystalline materials form the building blocks for proteins since these latter molecules are generally viewed as ‘organic’. The crystalline nature of amino acids is further emphasized by their high melting and boiling points and together these properties are atypical of most organic molecules. Organic molecules are not commonly crystalline nor do they have high melting and boiling points. Compare, for example, alanine and propionic acid – the former is a crystalline amino acid and the other is a volatile organic acid. Despite similar molecular weights (89 and 74) their respective melting points are 314°C and –20.8°C. The origin of these differences and the unique properties of amino acids resides in their ionic and dipolar nature.

Amino acids are held together in a crystalline lattice by charged interactions and these relatively strong forces contribute to high melting and boiling points. Charge groups are also responsible for electrical conductivity in aqueous solutions (amino acids are electrolytes), their relatively high solubility in water and the large dipole moment associated with crystalline material. Consequently amino acids are best viewed as charged molecules that crystallize from solutions containing dipolar ions. These dipolar ions are called zwitterions. A proper representation of amino acids reflects amphoteric behaviour and amino acids are always represented as the zwitterionic state in this textbook as opposed to the undissociated form. For 19 of the twenty amino acids commonly found in proteins a general structure for the zwitterionic state has charged amino (NH3+) and carboxyl (COO–) groups attached to a central carbon atom called the α carbon. The remaining atoms connected to the α carbon are a single hydrogen atom and the R group or side chain (Figure 2.1).

Figure 2.1 A skeletal model of a generalized amino acid showing the amino (blue) carboxyl (red) and R groups attached to a central or α carbon

The acid–base properties of amino acids

At pH 7 the amino and carboxyl groups are charged but over a pH range from 1 to 14 these groups exhibit a series of equilibria involving binding and dissociation of a proton. The binding and dissociation of a proton reflects the role of these groups as weak acids or weak bases. The acid–base behaviour of amino acids is important since it influences the eventual properties of proteins, permits methods of identification for different amino acids and dictates their reactivity. The amino group, characterized by a basic pK value of approximately 9, is a weak base. Whilst the amino group ionizes around pH 9.0 the carboxyl group remains charged until a pH of ~2.0 is reached. At this pH a proton binds neutralizing the charge of the carboxyl group. In each case the carboxyl and amino groups ionize according to the equilibrium

(2.1)

where HA, the proton donor, is either –COOH or –NH3+ and A– the proton acceptor is either –COO– or –NH2. The extent of ionization depends on the equilibrium constant

(2.2)

and it becomes straightforward to derive the relationship

(2.3)

known as the Henderson–Hasselbalch equation (see appendix). For a simple amino acid such as alanine a biphasic titration curve is observed when a solution of the amino acid (a weak acid) is titrated with sodium hydroxide (a strong base). The titration curve shows two zones where the pH changes very slowly after additions of small amounts of acid or alkali (Figure 2.2). Each phase reflects different pK values associated with ionizable groups.

Figure 2.2 Titration curve for alanine showing changes in pH with addition of sodium hydroxide

Figure 2.3 The three major forms of alanine occurring in titrations between pH 1 and 14

Amino acids lacking charged side chains show similar values for pK1 of about 2.3 that are significantly lower than the corresponding values seen in simple organic acids such as acetic acid (pK1 ~4.7). Amino acids are stronger acids than acetic acid as a result of the electrophilic properties of the α amino group that increase the tendency for the carboxyl hydrogen to dissociate.

Stereochemical representations of amino acids

Although an amino acid is represented by the skeletal diagram of Figure 2.1 it is more revealing, and certainly more informative, to impose a stereochemical view on the arrangement of atoms. In these views an attempt is made to represent the positions in space of each atom. The amino, carboxyl, hydrogen and R groups are arranged tetrahedrally around the central α carbon (Figure 2.4).

Table 2.1 The pK values for the α-carboxyl, α-amino groups and side chains found in the individual amino acids

Figure 2.4 The spatial arrangement of atoms in the amino acid alanine

The nitrogen atom (blue) is part of the amino (–NH3+) group, the oxygen atoms (red) are part of the carboxyl (–COO–) group. The remaining groups joined to the α carbon are one hydrogen atom and the R group.

The R group is responsible for the different properties of individual amino acids. As amino acids make up proteins the properties of the R group contribute considerably to the physical properties of proteins. Nineteen of the 20 amino acids found in proteins have the arrangement shown by Figure 2.4 but for the remaining amino acid, proline, an unusual cyclic ring is formed by the side chain bonding directly to the amide nitrogen (Figure 2.5).

Figure 2.5 The structure of proline – an unusual amino acid containing a five-membered pyrrolidine ring

A glance at the structures of the 20 different side chains reveals major differences in, for example, size, charge and hydrophobicity although the R group is always attached to the α carbon (C2 carbon). From the α carbon subsequent carbon atoms in the side chains are designated as β, γ, δ, ε and ζ. In some databases of protein structures the Cβ is written as CB, the Cδ as CD, Cζ as CZ, etc. Both nomenclatures are widely used. The nomenclature is generally unambiguous but care needs to be exercised when describing the atoms of the side chain of isoleucine. Isoleucine has a branched side chain in which the Cγ or CG is either a methyl group or a methylene group. In this instance the two groups are distinguished by the use of a subscript 1 and 2, i.e. CG1 and CG2. A similar line of reasoning applies to the carbon atoms of aromatic rings. In phenylalanine, for example, the aromatic ring is linked to the Cβ atom by the Cγ atom and contains two Cδ and Cε atoms (Cδ1 and Cδ2, Cε1 and Cε2) before completing ring at the Cζ (or CZ) atom.

Peptide bonds

Amino acids are joined together by the formation of a peptide bond where the amino group of one molecule reacts with the carboxyl group of the other. The reaction is described as a condensation resulting in the elimination of water and the formation of a dipeptide (Figure 2.6).