Pharmaceutical Biotechnology - Gary Walsh - E-Book

Pharmaceutical Biotechnology E-Book

Gary Walsh

0,0
66,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Pharmaceutical Biotechnology offers students taking Pharmacy and related Medical and Pharmaceutical courses a comprehensive introduction to the fast-moving area of biopharmaceuticals. With a particular focus on the subject taken from a pharmaceutical perspective, initial chapters offer a broad introduction to protein science and recombinant DNA technology- key areas that underpin the whole subject. Subsequent chapters focus upon the development, production and analysis of these substances. Finally the book moves on to explore the science, biotechnology and medical applications of specific biotech products categories. These include not only protein-based substances but also nucleic acid and cell-based products.

  • introduces essential principles underlining modern biotechnology- recombinant DNA technology and protein science
  • an invaluable introduction to this fast-moving subject aimed specifically at pharmacy and medical students
  • includes specific ‘product category chapters’ focusing on the pharmaceutical, medical and therapeutic properties of numerous biopharmaceutical products.
  • entire chapter devoted to the principles of genetic engineering and how these drugs are developed.
  • includes numerous relevant case studies to enhance student understanding
  • no prior knowledge of protein structure is assumed 

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 856

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Preface

Acronyms

1 Pharmaceuticals, biologics and biopharmaceuticals

1.1 Introduction to pharmaceutical products

1.2 Biopharmaceuticals and pharmaceutical biotechnology

1.3 History of the pharmaceutical industry

1.4 The age of biopharmaceuticals

1.5 Biopharmaceuticals: current status and future prospects

2 Protein structure

2.1 Introduction

2.2 Overview of protein structure

2.3 Higher level structure

2.4 Protein stability and folding

2.5 Protein post-translational modification

3 Gene manipulation and recombinant DNA technology

3.1 Introduction

3.2 Nucleic acids: function and structure

3.3 Recombinant production of therapeutic proteins

3.4 Classical gene cloning and identification

4 The drug development process

4.1 Introduction

4.2 Discovery of biopharmaceuticals

4.3 The impact of genomics and related technologies upon drug discovery

4.4 Gene chips

4.5 Proteomics

4.6 Structural genomics

4.7 Pharmacogenetics

4.8 Initial product characterization

4.9 Patenting

4.10 Delivery of biopharmaceuticals

4.11 Preclinical studies

4.12 Pharmacokinetics and pharmacodynamics

4.13 Toxicity studies

4.14 The role and remit of regulatory authorities

4.15 Conclusion

5 Sources and upstream processing

5.1 Introduction

5.2 Sources of biopharmaceuticals

5.3 Upstream processing

6 Downstream processing

6.1 Introduction

6.2 Initial product recovery

6.3 Cell disruption

6.4 Removal of nucleic acid

6.5 Initial product concentration

6.6 Chromatographic purification

6.7 High-performance liquid chromatography of proteins

6.8 Purification of recombinant proteins

6.9 Final product formulation

7 Product analysis

7.1 Introduction

7.2 Protein-based contaminants

7.3 Removal of altered forms of the protein of interest from the product stream

7.4 Detection of protein-based product impurities

7.5 Immunological approaches to detection of contaminants

7.6 Endotoxin and other pyrogenic contaminants

8 The cytokines: The interferon family

8.1 Cytokines

8.2 The interferons

8.3 Interferon biotechnology

8.4 Conclusion

9 Cytokines: Interleukins and tumour necrosis factor

9.1 Introduction

9.2 Interleukin-2

9.3 Interleukin-1

9.4 Interleukin-11

9.5 Tumour necrosis factors

10 Growth factors

10.1 Introduction

10.2 Haematopoietic growth factors

10.3 Growth factors and wound healing

11 Therapeutic hormones

11.1 Introduction

11.2 Insulin

11.3 Glucagon

11.4 Human growth hormone

11.5 The gonadotrophins

11.6 Medical and veterinary applications of gonadotrophins

11.7 Additional recombinant hormones now approved

11.8 Conclusion

12 Recombinant blood products and therapeutic enzymes

12.1 Introduction

12.2 Haemostasis

12.3 Anticoagulants

12.4 Thrombolytic agents

12.5 Enzymes of therapeutic value

13 Antibodies, vaccines and adjuvants

13.1 Introduction

13.2 Traditional polyclonal antibody preparations

13.3 Monoclonal antibodies

13.4 Vaccine technology

13.5 Adjuvant technology

14 Nucleic-acid- and cell-based therapeutics

14.1 Introduction

14.2 Gene therapy

14.3 Vectors used in gene therapy

14.4 Gene therapy and genetic disease

14.5 Gene therapy and cancer

14.6 Gene therapy and AIDS

14.7 Antisense technology

14.8 Oligonucleotide pharmacokinetics and delivery

14.9 Aptamers

14.10 Cell- and tissue-based therapies

14.11 Conclusion

Index

Copyright © 2007

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected]

Visit our Home Page on www.wileyeurope.com or www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], or faxed to (+44) 1243 770620.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The Publisher is not associated with any product or vendor mentioned in this book.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, Ontario, L5R 4J3

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Anniversary Logo Design: Richard J. Pacifico

Library of Congress Cataloging-in-Publication Data

Walsh, Gary, Dr.

Pharmaceutical biotechnology : concepts and applications / Gary Walsh.

p.; cm.

Includes bibliographical references.

ISBN 978-0-470-01244-4 (cloth)

1. Pharmaceutical biotechnology. I. Title.

[DNLM: 1. Technology, Pharmaceutical. 2. Biotechnology. 3. Pharmaceutical Preparations. QV 778 W224p 2007]

RS380.W35 2007

615′.19–dc22      2007017884

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 978-0-470-01244-4 (HB)

ISBN 978-0-470-01245-1 (PB)

I dedicate this book to my beautiful daughter Alice. To borrow a phrase:

‘without her help, it would have been written in half the time’!

Preface

This book has been written as a sister publication to Biopharmaceuticals: Biochemistry andBiotechnology, a second edition of which was published by John Wiley and Sons in 2003. The latter textbook caters mainly for advanced undergraduate/postgraduate students undertaking degree programmes in biochemistry, biotechnology and related disciplines. Such students have invariably pursued courses/modules in basic protein science and molecular biology in the earlier parts of their degree programmes; hence, the basic principles of protein structure and molecular biology were not considered as part of that publication. This current publication is specifically tailored to meet the needs of a broader audience, particularly to include students undertaking programmes in pharmacy/pharmaceutical science, medicine and other branches of biomedical/clinical sciences. Although evolving from Biopharmaceuticals: Biochemistry and Biotechnology, its focus is somewhat different, reflecting its broader intended readership. This text, therefore, includes chapters detailing the basic principles of protein structure and molecular biology. It also increases/extends the focus upon topics such as formulation and delivery of biopharmaceuticals, and it contains numerous case studies in which both biotech and clinical aspects of a particular approved product of pharmaceutical biotechnology are overviewed. The book, of course, should also meet the needs of students undertaking programmes in core biochemistry, biotechnology or related scientific areas and be of use as a broad reference source to those already working within the pharmaceutical biotechnology sector.

As always, I owe a debt of gratitude to the various people who assisted in the completion of this textbook. Thanks to Sandy for her help in preparing various figures, usually at ridiculously short notice. To Gerard Wall, for all the laughs and for several useful discussions relating to molecular biology. Thank you to Nancy, my beautiful wife, for accepting my urge to write (rather than to change baby’s nappies) with good humour – most of the time anyway! I am also grateful to the staff of John Wiley and Sons for their continued professionalism and patience with me when I keep overrunning submission deadlines. Finally, I have a general word of appreciation to all my colleagues at the University of Limerick for making this such an enjoyable place to work.

Gary Walsh

November 2006

Acronyms

ADCC

antibody-dependent cell cytoxicity

BAC

bacterial artificial chromosome

BHK

baby hamster kidney

cDNA

complementary DNA

CHO

Chinese hamster ovary

CNTF

ciliary neurotrophic factor

CSF

colony-stimulating factor

dsRNA

double-stranded RNA

EDTA

ethylenediaminetetraacetic acid

ELISA

enzyme-linked immunosorbent assay

EPO

erythropoietin

FGF

fibroblast growth factor

FSH

follicle-stimulating hormone

GDNF

glial cell-derived neurotrophic factor

GH

growth hormone

hCG

human chorionic gonadotrophin

HIV

human immunodeficiency virus

HPLC

high-performance liquid chromatography

IGF

insulin-like growth factor

ISRE

interferon-stimulated response element

JAK

Janus kinase

LAF

lymphocyte activating factor

LIF

leukaemia inhibitory factor

LPS

lipopolysaccharide

MHC

major histocompatibility complex

MPS

mucopolysaccharidosis

mRNA

messenger RNA

PDGF

platelet-derived growth factor

PEG

polyethylene glycol

PTK

protein tyrosine kinase

PTM

post-translational modification

rDNA

recombinant DNA

RNAi

RNA interference

rRNA

ribosomal RNA

SDS

sodium dodecyl sulfate

ssRNA

single-stranded RNA

STATs

signal transducers and activators of transcription

TNF

tumour necrosis factor

tPA

tissue plasminogen activator

tRNA

transfer RNA

WAP

whey acid protein

WFI

water for injections

1

Pharmaceuticals, biologics and biopharmaceuticals

1.1 Introduction to pharmaceutical products

Pharmaceutical substances form the backbone of modern medicinal therapy. Most traditional pharmaceuticals are low molecular weight organic chemicals (Table 1.1). Although some (e.g. aspirin) were originally isolated from biological sources, most are now manufactured by direct chemical synthesis. Two types of manufacturing company thus comprise the ‘traditional’ pharmaceutical sector: the chemical synthesis plants, which manufacture the raw chemical ingredients in bulk quantities, and the finished product pharmaceutical facilities, which purchase these raw bulk ingredients, formulate them into final pharmaceutical products, and supply these products to the end user.

In addition to chemical-based drugs, a range of pharmaceutical substances (e.g. hormones and blood products) are produced by/extracted from biological sources. Such products, some major examples of which are listed in Table 1.2, may thus be described as products of biotechnology. In some instances, categorizing pharmaceuticals as products of biotechnology or chemical synthesis becomes somewhat artificial. For example, certain semi-synthetic antibiotics are produced by chemical modification of natural antibiotics produced by fermentation technology.

1.2 Biopharmaceuticals and pharmaceutical biotechnology

Terms such as ‘biologic’, ‘biopharmaceutical’ and ‘products of pharmaceutical biotechnology’ or ‘biotechnology medicines’ have now become an accepted part of the pharmaceutical literature. However, these terms are sometimes used interchangeably and can mean different things to different people.

Although it might be assumed that ‘biologic’ refers to any pharmaceutical product produced by biotechnological endeavour, its definition is more limited. In pharmaceutical circles, ‘biologic’ generally refers to medicinal products derived from blood, as well as vaccines, toxins and allergen products. ‘Biotechnology’ has a much broader and long-established meaning. Essentially, it refers to the use of biological systems (e.g. cells or tissues) or biological molecules (e.g. enzymes or antibodies) for/in the manufacture of commercial products.

Table 1.1 Some traditional pharmaceutical substances that are generally produced by direct chemical synthesis

The term ‘biopharmaceutical’ was first used in the 1980s and came to describe a class of therapeutic proteins produced by modern biotechnological techniques, specifically via genetic engineering (Chapter 3) or, in the case of monoclonal antibodies, by hybridoma technology (Chapter 13). Although the majority of biopharmaceuticals or biotechnology products now approved or in development are proteins produced via genetic engineering, these terms now also encompass nucleic-acid-based, i.e. deoxyribonucleic acid (DNA)- or ribonucleic acid (RNA)-based products, and whole-cell-based products.

1.3 History of the pharmaceutical industry

The pharmaceutical industry, as we now know it, is barely 60 years old. From very modest beginnings, it has grown rapidly, reaching an estimated value of US$100 billion by the mid 1980s. Its current value is likely double or more this figure. There are well in excess of 10 000 pharmaceutical companies in existence, although only about 100 of these can claim to be of true international significance. These companies manufacture in excess of 5000 individual pharmaceutical substances used routinely in medicine.

Table 1.2 Some pharmaceuticals that were traditionally obtained by direct extraction from biological source material. Many of the protein-based pharmaceuticals mentioned are now also produced by genetic engineering

Substance

Medical application

Blood products (e.g. coagulation factors)

Treatment of blood disorders such as haemophilia A or B

Vaccines

Vaccination against various diseases

Antibodies

Passive immunization against various diseases

Insulin

Treatment of diabetes mellitus

Enzymes

Thrombolytic agents, digestive aids, debriding agents (i.e. cleansing of wounds)

Antibiotics

Treatment against various infections agents

Plant extracts (e.g. alkaloids)

Various, including pain relief

The first stages of development of the modern pharmaceutical industry can be traced back to the turn of the twentieth century. At that time (apart from folk cures), the medical community had at their disposal only four drugs that were effective in treating specific diseases:

Digitalis (extracted from foxglove) was known to stimulate heart muscle and, hence, was used to treat various heart conditions.

Quinine, obtained from the barks/roots of a plant (

Cinchona

genus), was used to treat malaria.

Pecacuanha (active ingredient is a mixture of alkaloids), used for treating dysentery, was obtained from the bark/roots of the plant genus

Cephaelis

.

Mercury, for the treatment of syphilis.

This lack of appropriate, safe and effective medicines contributed in no small way to the low life expectancy characteristic of those times.

Developments in biology (particularly the growing realization of the microbiological basis of many diseases), as well as a developing appreciation of the principles of organic chemistry, helped underpin future innovation in the fledgling pharmaceutical industry. The successful synthesis of various artificial dyes, which proved to be therapeutically useful, led to the formation of pharmaceutical/chemical companies such as Bayer and Hoechst in the late 1800s. Scientists at Bayer, for example, succeeded in synthesizing aspirin in 1895.

Despite these early advances, it was not until the 1930s that the pharmaceutical industry began to develop in earnest. The initial landmark discovery of this era was probably the discovery, and chemical synthesis, of the sulfa drugs. These are a group of related molecules derived from the red dye prontosil rubrum. These drugs proved effective in the treatment of a wide variety of bacterial infections (Figure 1.1). Although it was first used therapeutically in the early 1920s, large-scale industrial production of insulin also commenced in the 1930s.

The medical success of these drugs gave new emphasis to the pharmaceutical industry, which was boosted further by the commencement of industrial-scale penicillin manufacture in the early 1940s. Around this time, many of the current leading pharmaceutical companies (or their forerunners) were founded. Examples include Ciba Geigy, Eli Lilly, Wellcome, Glaxo and Roche. Over the next two to three decades, these companies developed drugs such as tetracyclines, corticosteroids, oral contraceptives, antidepressants and many more. Most of these pharmaceutical substances are manufactured by direct chemical synthesis.

1.4 The age of biopharmaceuticals

Biomedical research continues to broaden our understanding of the molecular mechanisms underlining both health and disease. Research undertaken since the 1950s has pinpointed a host of proteins produced naturally in the body that have obvious therapeutic applications. Examples include the interferons and interleukins (which regulate the immune response), growth factors, such as erythropoietin (EPO; which stimulates red blood cell production), and neurotrophic factors (which regulate the development and maintenance of neural tissue).

Figure 1.1 Sulfa drugs and their mode of action. The first sulfa drug to be used medically was the red dye prontosil rubrum (a). In the early 1930s, experiments illustrated that the administration of this dye to mice infected with haemolytic streptococci prevented the death of the mice. This drug, although effective in vivo, was devoid of in vitro antibacterial activity. It was first used clinically in 1935 under the name Streptozon. It was subsequently shown that prontosil rubrum was enzymatically reduced by the liver, forming sulfanilamide, the actual active antimicrobial agent (b). Sulfanilamide induces its effect by acting as an anti-metabolite with respect to para-aminobenzoic acid (PABA) (c). PABA is an essential component of tetrahydrofolic acid (THF) (d). THF serves as an essential cofactor for several cellular enzymes. Sulfanilamide (at sufficiently high concentrations) inhibits manufacture of THF by competing with PABA. This effectively inhibits essential THF-dependent enzyme reactions within the cell. Unlike humans, who can derive folates from their diets, most bacteria must synthesize it de novo, as they cannot absorb it intact from their surroundings

Although the pharmaceutical potential of these regulatory molecules was generally appreciated, their widespread medical application was in most cases rendered impractical due to the tiny quantities in which they were naturally produced. The advent of recombinant DNA technology (genetic engineering) and monoclonal antibody technology (hybridoma technology) overcame many such difficulties, and marked the beginning of a new era of the pharmaceutical sciences.

Recombinant DNA technology has had a fourfold positive impact upon the production of pharmaceutically important proteins:

It overcomes the problem of source availability.

Many proteins of therapeutic potential are produced naturally in the body in minute quantities. Examples include interferons (Chapter 8), interleukins (Chapter 9) and colony-stimulating factors (CSFs; Chapter 10). This rendered impractical their direct extraction from native source material in quantities sufficient to meet likely clinical demand. Recombinant production (Chapters 3 and 5) allows the manufacture of any protein in whatever quantity it is required.

It overcomes problems of product safety.

Direct extraction of product from some native biological sources has, in the past, led to the unwitting transmission of disease. Examples include the transmission of blood-borne pathogens such as hepatitis B and C and human immunodeficiency virus (HIV) via infected blood products and the transmission of Creutzfeldt–Jakob disease to persons receiving human growth hormone (GH) preparations derived from human pituitaries.

It provides an alternative to direct extraction from inappropriate/dangerous source material.

A number of therapeutic proteins have traditionally been extracted from human urine. Follicle-stimulating hormone (FSH), the fertility hormone, for example, is obtained from the urine of post-menopausal women, and a related hormone, human chorionic gonadotrophin (hCG), is extracted from the urine of pregnant women (Chapter 11). Urine is not considered a particularly desirable source of pharmaceutical products. Although several products obtained from this source remain on the market, recombinant forms have now also been approved. Other potential biopharmaceuticals are produced naturally in downright dangerous sources. Ancrod, for example, is a protein displaying anti-coagulant activity (Chapter 12) and, hence, is of potential clinical use. It is, however, produced naturally by the Malaysian pit viper. Although retrieval by milking snake venom is possible, and indeed may be quite an exciting procedure, recombinant production in less dangerous organisms, such as

Escherichia coli

or

Saccharomycese cerevisiae

, would be considered preferable by most.

It facilitates the generation of engineered therapeutic proteins displaying some clinical advantage

over the native protein product.

Techniques such as site-directed mutagenesis facilitate the logical introduction of predefined changes in a protein’s amino acid sequence. Such changes can be as minimal as the insertion, deletion or alteration of a single amino acid residue, or can be more substantial (e.g. the alteration/deletion of an entire domain, or the generation of a novel hybrid protein). Such changes can be made for a number of reasons, and several engineered products have now gained marketing approval. An overview summary of some engineered product types now on the market is provided in

Table 1.3

. These and other examples will be discussed in subsequent chapters.

Despite the undoubted advantages of recombinant production, it remains the case that many protein-based products extracted directly from native source material remain on the market. In certain circumstances, direct extraction of native source material can prove equally/more attractive than recombinant production. This may be for an economic reason if, for example, the protein is produced in very large quantities by the native source and is easy to extract/purify, e.g. human serum albumin (HSA; Chapter 12). Also, some blood factor preparations purified from donor blood actually contain several different blood factors and, hence, can be used to treat several haemophilia patient types. Recombinant blood factor preparations, on the other hand, contain but a single blood factor and, hence, can be used to treat only one haemophilia type (Chapter 12).

The advent of genetic engineering and monoclonal antibody technology underpinned the establishment of literally hundreds of start-up biopharmaceutical (biotechnology) companies in the late 1970s and early 1980s. The bulk of these companies were founded in the USA, with smaller numbers of start-ups emanating from Europe and other world regions.

Table 1.3 Selected engineered biopharmaceutical types/products that have now gained marketing approval. These and additional such products will be discussed in detail in subsequent chapters

Many of these fledgling companies were founded by academics/technical experts who sought to take commercial advantage of developments in the biotechnological arena. These companies were largely financed by speculative monies attracted by the hype associated with the establishment of the modern biotech era. Although most of these early companies displayed significant technical expertise, the vast majority lacked experience in the practicalities of the drug development process (Chapter 4). Most of the well-established large pharmaceutical companies, on the other hand, were slow to invest heavily in biotech research and development. However, as the actual and potential therapeutic significance of biopharmaceuticals became evident, many of these companies did diversify into this area. Most either purchased small, established biopharmaceutical concerns or formed strategic alliances with them. An example was the long-term alliance formed by Genentech (see later) and the well-established pharmaceutical company Eli Lilly. Genentech developed recombinant human insulin, which was then marketed by Eli Lilly under the trade name Humulin. The merger of biotech capability with pharmaceutical experience helped accelerate development of the biopharmaceutical sector.

Table 1.4 Pharmaceutical companies who manufacture and/or market biopharmaceutical products approved for general medical use in the USA and EU

Sanofi-Aventis

Hoechst AG

Bayer

Wyeth

Novo Nordisk

Genzyme

Isis Pharmaceuticals

Abbott

Genentech

Roche

Centocor

Novartis

Boehringer Manheim

Serono

Galenus Manheim

Organon

Eli Lilly

Amgen

Ortho Biotech

GlaxoSmithKline

Schering Plough

Cytogen

Hoffman-la-Roche

Immunomedics

Chiron

Biogen

Many of the earlier biopharmaceutical companies no longer exist. The overall level of speculative finance available was not sufficient to sustain them all long term (it can take 6–10 years and US$800 million to develop a single drug; Chapter 4). Furthermore, the promise and hype of biotechnology sometimes exceeded its ability actually to deliver a final product. Some biopharmaceutical substances showed little efficacy in treating their target condition, and/or exhibited unacceptable side effects. Mergers and acquisitions also led to the disappearance of several biopharmaceutical concerns. Table 1.4 lists many of the major pharmaceutical concerns which now manufacture/market biopharmaceuticals approved for general medical use. Box 1.1 provides a profile of three well-established dedicated biopharmaceutical companies.

Box 1.1 Amgen, Biogen and Genentech

Amgen, Biogen and Genentech represent three pioneering biopharmaceutical companies that still remain in business.

Founded in the 1980s as AMGen (Applied Molecular Genetics), Amgen now employs over 9000 people worldwide, making it one of the largest dedicated biotechnology companies in existence. Its headquarters are situated in Thousand Oaks, California, although it has research, manufacturing, distribution and sales facilities worldwide. Company activities focus upon developing novel (mainly protein) therapeutics for application in oncology, inflammation, bone disease, neurology, metabolism and nephrology. By mid 2006, seven of its recombinant products had been approved for general medical use (the EPO-based products ‘Aranesp’ and ‘Epogen’ (Chapter 10), the CSF-based products ‘Neupogen’ and ‘Neulasta’ (Chapter 10), as well as the interleukin-1 (IL-1) receptor antagonist ‘Kineret’, the anti-rheumatoid arthritis fusion protein Enbrel (Chapter 9) and the keratinocyte growth factor ‘Kepivance’, indicated for the treatment of severe oral mucositis. Total product sales for 2004 reached US$9.9 billion. In July 2002, Amgen acquired Immunex Corporation, another dedicated biopharmaceutical company founded in Seattle in the early 1980s.

Biogen was founded in Geneva, Switzerland, in 1978 by a group of leading molecular biologists. Currently, its global headquarters are located in Cambridge, MA, and it employs in excess of 2000 people worldwide. The company developed and directly markets the interferon-based product ‘Avonex’ (Chapter 8), but also generates revenues from sales of other Biogen-discovered products that are licensed to various other pharmaceutical companies. These include Schering Plough’s ‘Intron A’ (Chapter 8) and a number of hepatitis B-based vaccines sold by SmithKline Beecham (SKB) and Merck (Chapter 13).

Genentech was founded in 1976 by scientist Herbert Boyer and the venture capitalist Robert Swanson. Headquartered in San Francisco, it employs almost 5000 staff worldwide and has 10 protein-based products on the market. These include hGHs (Nutropin, Chapter 11), the antibody-based products ‘Herceptin’ and ‘Rituxan’ (Chapter 13) and the thrombolytic agents ‘Activase’ and ‘TNKase’ (Chapter 12). The company also has 20 or so products in clinical trials. In 2004, it generated some US$4.6 billion in revenues.

1.5 Biopharmaceuticals: current status and future prospects

Approximately one in every four new drugs now coming on the market is a biopharmaceutical. By mid 2006, some 160 biopharmaceutical products had gained marketing approval in the USA and/or EU. Collectively, these represent a global biopharmaceutical market in the region of US$35 billion (Table 1.5), and the market value is estimated to surpass US$50 billion by 2010. The products include a range of hormones, blood factors and thrombolytic agents, as well as vaccines and monoclonal antibodies (Table 1.6). All but two are protein-based therapeutic agents. The exceptions are two nucleic-acid-based products: ‘Vitravene’, an antisense oligonucleotide, and ‘Macugen’, an aptamer (Chapter 14). Many additional nucleic-acid-based products for use in gene therapy or antisense technology are in clinical trials, although the range of technical difficulties that still beset this class of therapeutics will ensure that protein-based products will overwhelmingly predominate for the foreseeable future (Chapter 14).

Many of the initial biopharmaceuticals approved were simple replacement proteins (e.g. blood factors and human insulin). The ability to alter the amino acid sequence of a protein logically coupled to an increased understanding of the relationship between protein structure and function (Chapters 2 and 3) has facilitated the more recent introduction of several engineered therapeutic proteins (Table 1.3). Thus far, the vast majority of approved recombinant proteins have been produced in the bacterium E. coli, the yeast S. cerevisiae or in animal cell lines (most notably Chinese hamster ovary (CHO) cells or baby hamster kidney (BHK) cells. These production systems are discussed in Chapter 5.

Although most biopharmaceuticals approved to date are intended for human use, a number of products destined for veterinary application have also come on the market. One early such example is that of recombinant bovine GH (Somatotrophin), which was approved in the USA in the early 1990s and used to increase milk yields from dairy cattle. Additional examples of approved veterinary biopharmaceuticals include a range of recombinant vaccines and an interferon-based product (Table 1.7).

Table 1.5 Approximate annual market values of some leading approved biopharmaceutical products. Data gathered from various sources, including company home pages, annual reports and industry reports

Table 1.6 Summary categorization of biopharmaceuticals approved for general medical use in the EU and/or USA by 2006

At least 1000 potential biopharmaceuticals are currently being evaluated in clinical trials, although the majority of these are in early stage trials. Vaccines and monoclonal antibody-based products represent the two biggest product categories. Regulatory factors (e.g. hormones and cytokines) and gene therapy and antisense-based products also represent significant groupings. Although most protein-based products likely to gain marketing approval over the next 2–3 years will be produced in engineered E. coli, S. cerevisiae or animal cell lines, some products now in clinical trials are being produced in the milk of transgenic animals (Chapter 5). Additionally, plant-based transgenic expression systems may potentially come to the fore, particularly for the production of oral vaccines (Chapter 5).

Table 1.7 Some recombinant (r) biopharmaceuticals recently approved for veterinary application in the EU

Interestingly, the first generic biopharmaceuticals are already entering the market. Patent protection for many first-generation biopharmaceuticals (including recombinant human GH (rhGH), insulin, EPO, interferon-α (IFN-α) and granulocyte-CSF (G-CSF)) has now/is now coming to an end. Most of these drugs command an overall annual market value in excess of US$1 billion, rendering them attractive potential products for many biotechnology/pharmaceutical companies. Companies already/soon producing generic biopharmaceuticals include Biopartners (Switzerland), Genemedix (UK), Sicor and Ivax (USA), Congene and Microbix (Canada) and BioGenerix (Germany). Genemedix, for example, secured approval for sale of a recombinant CSF in China in 2001 and is also commencing the manufacture of recombinant EPO. Sicor currently markets hGH and IFN-α in eastern Europe and various developing nations. A generic hGH also gained approval in both Europe and the USA in 2006.

To date (mid 2006), no gene-therapy-based product has thus far been approved for general medical use in the EU or USA, although one such product (‘Gendicine’; Chapter 14) has been approved in China. Although gene therapy trials were initiated as far back as 1989, the results have been disappointing. Many technical difficulties remain in relation to, for example, gene delivery and regulation of expression. Product effectiveness was not apparent in the majority of trials undertaken and safety concerns have been raised in several trials.

Only one antisense-based product has been approved to date (in 1998) and, although several such antisense agents continue to be clinically evaluated, it is unlikely that a large number of such products will be approved over the next 3–4 years. Aptamers represent an additional emerging class of nucleic-acid-based therapeutic. These are short DNA- or RNA-based sequences that adopt a specific three-dimensional structure, enabling them to bind (and thereby inhibit) specific target molecules. One such product (Macugen) has been approved to date. RNA interference (RNAi) represents a yet additional mechanism of achieving downregulation of gene expression (Chapter 14). It shares many characteristics with antisense technology and, like antisense, provides a potential means of treating medical conditions triggered or exacerbated by the inappropriate overexpression of specific gene products. Despite the disappointing results thus far generated by nucleic-acid-based products, future technical advances will almost certainly ensure the approval of gene therapy and antisense-based products in the intermediate to longer term future.

Technological developments in areas such as genomics, proteomics and high-throughput screening are also beginning to impact significantly upon the early stages of drug development (Chapter 4). By linking changes in gene/protein expression to various disease states, for example, these technologies will identify new drug targets for such diseases. Many/most such targets will themselves be proteins, and drugs will be designed/developed specifically to interact with. They may be protein based or (more often) low molecular mass ligands.

Additional future innovations likely to impact upon pharmaceutical biotechnology include the development of alternative product production systems, alternative methods of delivery and the development of engineered cell-based therapies, particularly stem cell therapy. As mentioned previously, protein-based biotechnology products produced to date are produced in either microbial or in animal cell lines. Work continues on the production of such products in transgenic-based production systems, specifically either transgenic plants or animals (Chapter 5).

Virtually all therapeutic proteins must enter the blood in order to promote a therapeutic effect. Such products must usually be administered parenterally. However, research continues on the development of non-parenteral routes which may prove more convenient, less costly and obtain improved patient compliance. Alternative potential delivery routes include transdermal, nasal, oral and bucal approaches, although most progress to date has been recorded with pulmonary-based delivery systems (Chapter 4). An inhaled insulin product (‘Exubera’, Chapters 4 and 11) was approved in 2006 for the treatment of type I and II diabetes.

A small number of whole-cell-based therapeutic products have also been approved to date (Chapter 14). All contain mature, fully differentiated cells extracted from a native biological source. Improved techniques now allow the harvest of embryonic and, indeed, adult stem cells, bringing the development of stem-cell-based drugs one step closer. However, the use of stem cells to replace human cells or even entire tissues/organs remains a long term goal (Chapter 14). Overall, therefore, products of pharmaceutical biotechnology play an important role in the clinic and are likely to assume an even greater relative importance in the future.

Further reading

Books

Crommelin, D. and Sindelar, R. 2002. Pharmaceutical Biotechnology, second edition. Taylor and Francis, London, UK.

Goldberg, R. 2001. Pharmaceutical Medicine, Biotechnology and European Law. Cambridge University Press.

Grindley, J. and Ogden, J. 2000. Understanding Biopharmaceuticals. Manufacturing and Regulatory Issues. Interpharm Press.

Kayser, O. and Muller, RH. 2004. Pharmaceutical Biotechnology. Wiley VCH, Weinheim, Germany.

Oxender, D. and Post, L. 1999. Novel Therapeutics from Modern Biotechnology. Springer Verlag.

Spada, S. and Walsh, G. 2005. Directory of Approved Biopharmaceutical Products. CRC Press, Florida, USA.

Articles

Mayhall, E., Paffett-Lugassy N., and Zon L.I. 2004. The clinical potential of stem cells. Current Opinion in CellBiology16, 713–720.

Reichert, J. and Paquette, C. 2003. Therapeutic recombinant proteins: trends in US approvals 1982-2002. CurrentOpinion in Molecular Therapy5, 139–147.

Reichert, J. and Pavlov, A. 2004. Recombinant therapeutics – success rates, market trends and values to 2010. Nature Biotechnology22, 1513–1519.

Walsh, G. 2005. Biopharmaceuticals: recent approvals and likely directions. Trends in Biotechnology23, 553–558.

Walsh, G. 2006. Biopharmaceutical benchmarks 2006. Nature Biotechnology24, 769–776.

Weng, Z. and DeLisi, C. 2000. Protein therapeutics: promises and challenges of the 21st century. Trends in Biotechnology20, 29–36.

2

Protein structure

2.1 Introduction

Almost all products of modern pharmaceutical biotechnology, be they on the market or likely to gain approval in the short to intermediate term, are protein based. As such, an understanding of protein structure is central to this topic. A comprehensive treatment of the subject would easily constitute a book on its own, and many such publications are available. The aim of this chapter is to provide a basic overview of the subject in order to equip the reader with a knowledge of protein science sufficient to understand relevant concepts outlined in the remaining chapters of this book. The interested reader is also referred to the ‘Further reading’ section, which lists several excellent specialist publications in the field. Much additional information may also be sourced via the web sites mentioned within the chapter.

2.2 Overview of protein structure

Proteins are macromolecules consisting of one or more polypeptides (Table 2.1). Each polypeptide consists of a chain of amino acids linked together by peptide (amide) bonds. The exact amino acid sequence is determined by the gene coding for that specific polypeptide. When synthesized, a polypeptide chain folds up, assuming a specific three-dimensional shape (i.e. a specific conformation) that is unique to it. The conformation adopted is dependent upon the polypeptide’s amino acid sequence, and this conformation is largely stabilized by multiple, weak non-covalent interactions. Any influence (e.g. certain chemicals and heat) that disrupts such weak interactions results in disruption of the polypeptide’s native conformation, a process termed denaturation. Denaturation usually results in loss of functional activity, clearly demonstrating the dependence of protein function upon protein structure. A protein’s structure currently cannot be predicted solely from its amino acid sequence. Its conformation can, however, be determined by techniques such as X-ray diffraction and nuclear magnetic resonance (NMR) spectroscopy.

Proteins are sometimes classified as ‘simple’ or ‘conjugated’. Simple proteins consist exclusively of polypeptide chain(s) with no additional chemical components present or being required for biological activity. Conjugated proteins, in addition to their polypeptide components(s), contain one or more non-polypeptide constituents known as prosthetic group(s). The most common prosthetic groups found in association with proteins include carbohydrates (glycoproteins), phosphate groups (phosphoproteins), vitamin derivatives (e.g. flavoproteins) and metal ions (metalloproteins).

Table 2.1 Selected examples of proteins. The number of polypeptide chains and amino acid residues constituting the protein are listed, along with its molecular mass and biological function

Table 2.2 The 20 commonly occurring amino acids. They may be subdivided into five groups on the basis of side-chain structure. Their three- and one-letter abbreviations are also listed (one-letter abbreviations are generally used only when compiling extended sequence data, mainly to minimize writing space and effort). In addition to their individual molecular masses, the percentage occurrence of each amino acid in an ‘average’ protein is also presented. These data were generated from sequence analysis of over 1000 different proteins

2.2.1 Primary structure

Polypeptides are linear, unbranched polymers, potentially containing up to 20 different monomer types (i.e. the 20 commonly occurring amino acids) linked together in a precise predefined sequence. The primary structure of a polypeptide refers to its exact amino acid sequence, along with the exact positioning of any disulfide bonds present (described later). The 20 commonly occurring amino acids are listed in Table 2.2, along with their abbreviated and one-letter designations. The structures of these amino acids are presented in Figure 2.1. Nineteen of these amino acids contain a central (α) carbon atom, to which is attached a hydrogen atom (H), an amino group (NH2) a carboxyl group (COOH), and an additional side chain (R) group – which differs from amino acid to amino acid. The amino acid proline is unusual in that its R group forms a direct covalent bond with the nitrogen atom of what is the free amino group in other amino acids (Figure 2.1).

Figure 2.1 The chemical structure of the 20 amino acids commonly found in proteins

As will be evident from Section 2.2.2, peptide bond formation between adjacent amino acid residues entails the establishment of covalent linkages between the amino and carboxyl groups attached to their respective central (α) carbon atoms. Hence, the free functional (i.e. chemically reactive) groups in polypeptides are almost entirely present as part of the constituent amino acids’ side chains (R groups). In addition to determining the chemical reactivity of a polypeptide, these R groups also very largely dictate the final conformation adopted by a polypeptide. Stabilizing/repulsive forces between different R groups (as well as between R groups and the surrounding aqueous media) largely dictate what final shape the polypeptide adopts, as will be described later.

The R groups of the non-polar, alipathic amino acids (Gly, Ala, Val, Leu, Ile and Pro) are devoid of chemically reactive functional groups. These R groups are noteworthy in that, when present in a polypeptide’s backbone, they tend to interact with each other non-covalently (via hydrophobic interactions). These interactions have a significant stabilizing influence on protein conformation.

Glycine is noteworthy in that its R group is a hydrogen atom. This means that the α-carbon of glycine is not asymmetric, i.e. is not a chiral centre. (To be a chiral centre the carbon would have to have four different chemical groups attached to it; in this case, two of its four attached groups are identical.) As a consequence, glycine does not occur in multiple stereo-isomeric forms, unlike the remaining amino acids, which occur as either D or L isomers. Only L-amino acids are naturally found in polypeptides.

The side chains of the aromatic amino acids (Phe, Tyr and Trp) are not particularly reactive chemically, but they all absorb ultraviolet (UV) light. Tyr and Trp in particular absorb strongly at 280 nm, allowing detection and quantification of proteins in solution by measuring the absorbance at this wavelength.

Of the six polar but uncharged amino acids, two (cysteine and methionine) are unusual in that they contain a sulfur atom. The side chain of methionine is non-polar and relatively unreactive, although the sulfur atom is susceptible to oxidation. In contrast, the thiol (—C—SH) portion of cysteine’s R group is the most reactive functional group of any amino acid side chain. In vivo, this group can form complexes with various metal ions and is readily oxidized, forming ‘disulfide linkages’ (covalent linkages between two cysteine residues within the same or even different polypeptide backbones). These help stabilize the three-dimensional structure of such polypeptides. Interchain disulfide linkages can also form, in which cysteines from two different polypeptides participate. This is a very effective way of covalently linking adjacent polypeptides.

Of the four remaining polar but uncharged amino acids, the R groups of serine and threonine contain hydroxyl (OH) groups and the R groups of asparagine and glutamine contain amide (CONH2) groups. None are particularly reactive chemically; however, upon exposure to high temperatures or extremes of pH, the latter two can deamidate, yielding aspartic acid and glutamic acid respectively.

Aspartic and glutamic acids are themselves negatively charged under physiological conditions. This allows them to chelate certain metal ions, and also to markedly influence the conformation adopted by polypeptide chains in which they are found.

Lysine, arganine and histidine are positively charged amino acids. The arganine R group consists of a hydrophobic chain of four —CH2 groups (Figure 2.1), capped with an amino (NH2) group, which is ionized (NH3+) under most physiological conditions. However, within most polypeptides there is normally a fraction of un-ionized lysines, and these (unlike their ionized counterparts) are quite chemically reactive. Such lysine side chains can be chemically converted into various analogues. The arganine side chain is also quite bulky, consisting of three CH2 groups, an amino group (—NH2) and an ionized guanido group (=NH2+). The ‘imidazole’ side chain of histidine can be described chemically as a tertiary amine (R3—N), and thus it can act as a strong nucleophilic catalyst (the nitrogen atom houses a lone pair of electrons, making it a ‘nucleus lover’ or nucleophile; it can donate its electron pair to an ‘electron lover’ or electrophile). As such, the histidine side chain often constitute an essential part of some enzyme active sites.

In addition to the 20 ‘common’ amino acids, some modified amino acids are also found in several proteins. These amino acids are normally altered via a process of post-translational modification (PTM) reactions (i.e. modified after protein synthesis is complete). Almost 200 such modified amino acids have been characterized to date. The more common such modifications are discussed separately in Section 2.5.

Figure 2.2 (a) Peptide bond formation. (b) Polypeptides consist of a linear chain of amino acids successively linked via peptide bonds. (c) The peptide bond displays partial double-bonded character

2.2.2 The peptide bond

Successive amino acids are joined together during protein synthesis via a ‘peptide’ (i.e. amide) bond (Figure 2.2). This is a condensation reaction, as a water molecule is eliminated during bond formation. Each amino acid in the resultant polypeptide is termed a ‘residue’, and the polypeptide chain will display a free amino (NH2) group at one end and a free carboxyl (COOH) group at the other end. These are termed the amino and carboxyl termini respectively.

The peptide bond has a rigid, planar structure and is in the region of 1.33 Å in length. Its rigid nature is a reflection of the fact that the amide nitrogen lone pair of electrons is delocalized across the bond (i.e. the bond structure is a halfway house between the two forms illustrated in Figure 2.2c). In most instances, peptide groups assume a ‘trans’ configuration (Figure 2.2b). This minimizes steric interference between the R groups of successive amino acid residues.

Figure 2.3 Fragment of polypeptide chain backbone illustrating rigid peptide bonds and the intervening N—Cα and Cα—C backbone linkages, which are free to rotate

Whereas the peptide bond is rigid, the other two bond types found in the polypeptide backbone (i.e. the N—Cα bond and the Cα—C bond, Figure 2.3) are free to rotate. The polypeptide backbone can thus be viewed as a series of planar ‘plates’ that can rotate relative to one another. The angle of rotation around the N—Cα bond is termed ϕ(phi) and that around the Cα—C bond is termed ψ (psi) (Figure 2.3). These angles are also known as rotation angles, dihedral angles or torsion angles. By convention, these angles are defined as being 180° when the polypeptide chain is in its fully extended, trans form. In principle, each bond can rotate to any value between –180° and +180°. However, the degrees of rotation actually observed are restricted due to the occurrence of steric hindrance between atoms of the polypeptide backbone and those of amino acid side chains.

For each amino acid residue in a polypeptide backbone, the actual ϕ and ψ angles that are physically possible can be calculated, and these angle pairs are often plotted against each other in a diagram termed a Ramachandran plot. Sterically allowable angles fall within relatively narrow bands in most instances. A greater than average degree of ϕ/ψ rotational freedom is observed around glycine residues, due to the latter’s small R group – hence steric hindrance is minimized. On the other hand, bond angle freedom around proline residues is quite restricted due to this amino acid’s unusual structure (Figure 2.1). The ϕ and ψ angles allowable around each Cα in a polypeptide backbone obviously exert a major influence upon the final three-dimensional shape assumed by the polypeptide.

2.2.3 Amino acid sequence determination

The amino acid sequence of a polypeptide may be determined directly via chemical sequencing or by physical fragmentation and analysis, usually by mass spectrometry. Direct chemical sequencing was the only method available until the 1970s. Insulin was the first protein to be sequenced by this approach (in 1953), requiring several years and several hundred grams of protein to complete. The method has been refined and automated over the years, such that, today, polypeptides containing 100 amino acids or more can be automatically sequenced within a few hours, using microgram to milligram levels of protein. The actual chemical sequencing procedure employed is termed the Edman degradation method.

Table 2.3 Representative organisms whose genomes have been or will soon be completely/almost completely sequenced. Data taken largely from http://wit.integratedgenomics.com/GOLD/eucaryoticgenomes.html and http://www.tigr.org/tdb/mdb/mdcomplete.html. Updated information is available on these sites

Table 2.4 The major primary sequence (protein and nucleic acid) databases and the web addresses from which they may be accessed

Database

Web address

Protein

PIR

http://www-nbrf.georgetown.edu/

Swiss-Prot

http://www.ebi.ac.uk/swissprot/

MIPS

http://www.mips.biochem.mpg.de/

NRL-3D

http://www-nbrf.georgetown.edu/pirwww/dbinfo/nrl3d.html

Tr EMBL

http://www.ebi.ac.uk/index.html

Owl

http://www.bis.med.jhmi.edu/Dan/proteins/owl.html

Nucleic acid

EMBL

http://www.ebi.ac.uk/embl/index.html/

GenBank

http://www.ncbi.nlm.nih.gov

DDBJ

http://www.ddbj.nig.ac.jp/

An alternative approach to amino acid sequence determination is to sequence its gene (Chapter 3). The amino acid sequence can be inferred from the nucleotide sequence obtained. This approach has gained favour in recent years. Refinements to DNA sequencing methodologies and equipment have made such sequence analysis both rapid and relatively inexpensive. The ongoing genome projects continue to generate enormous amounts of sequence data. By the early 2000s, substantial/complete sequence data for some 300 organisms were available (Table 2.3). As a result, the putative amino acid sequences of an enormous number of proteins (most of unknown function/structure) had been determined.

Upon its generation, sequence information is normally submitted to various databases. The major databases in which protein primary sequence data are available are listed in Table 2.4. Also included in this table are the major nucleic acid sequence databases, as amino acid sequence information can potentially be derived from these.

The Swiss-Prot database is probably the most widely used protein database. It is maintained collaboratively by the European Bioinformatics Institute (EBI) and the Swiss Institute for Bioinformatics. It is relatively easy to access and search via the World Wide Web (Table 2.4). A sample entry for human insulin is provided in Figure 2.4. Additional information detailing such databases is available via the web addresses provided in Table 2.4 and in the bioinformatics publications listed at the end of this chapter.

A polypeptide’s amino acid sequence can thus be determined by direct chemical (Edman) or physical (mass spectrometry) means, or indirectly via gene sequencing. In practice, these methods are complementary to one another and can be used to cross-check sequence accuracy. If the target gene/messenger RNA (mRNA) has been previously isolated, then DNA sequencing is usually most convenient. However, this approach reveals little information regarding any PTMs present in the mature polypeptide, many of whom are of critical significance in the context of therapeutic proteins (discussed in Section 2.5).

Figure 2.4 Sample entry for human insulin as present in the Swiss-Prot database. Refer to text for further details. Reproduced from the Swiss-Prot database on the Uniprot website htt://www.ebi.uniprot.org/

2.2.4 Polypeptide synthesis

Full-scale polypeptide characterization usually requires modest/large (milligram to gram) amounts of the purified target polypeptide. Even larger quantities are then generally required if the polypeptide has a commercial application. In some cases a polypeptide can be obtained in sufficient quantities by direct extraction from its natural producer source. However, polypeptides may also be produced by direct chemical synthesis, as long as their amino acid sequence (and any PTMs) has been elucidated. Synthesis can be undertaken via a biological route (recombinant DNA technology), as is the case for virtually all modern therapeutic proteins.

2.3 Higher level structure

Thus far we have concentrated on the primary structure (amino acid sequence) of a polypeptide. Higher level protein structure can be described at various levels, i.e. secondary, tertiary and quaternary:

Secondary structure can be described as the local spatial conformation of a polypeptide’s backbone, excluding the constituent amino acid’s side chains. The major elements of secondary structure are the α-helix and β-strands, as described below.

Tertiary structure refers to the three-dimensional arrangement of all the atoms that contribute to the polypeptide.

Quaternary structure refers to the overall spatial arrangement of polypeptide subunits within a protein composed of two or more polypeptides.

2.3.1 Secondary structure

By studying the backbone of most proteins, stretches of amino acids that adopt a regular, recurring shape usually become evident. The most commonly observed secondary structural elements are termed the α-helix and β-strands, which are usually separated by stretches largely devoid of regular, recurring conformation. The α-helix and β-sheets are commonly formed because they maximize formation of stabilizing intramolecular hydrogen bonds and minimize steric repulsion between adjacent side chain groups, while also being compatible with the rigid planar nature of the peptide bonds.

The α-helix contains 3.6 amino acid residues in a full turn (Figure 2.5). This approximates to a length of 0.56 nm along the long axis of the helix. The participating amino acid side chains protrude outward from the helical backbone. Amino acids most conducive with α-helix formation include alanine, leucine, methionine and glutamate. Proline, as well as the occurrence in close proximity of multiple residues with either bulky side groups or side groups of the same charge, tends to disrupt α-helical formation. The helical structure is stabilized by hydrogen bonding, with every backbone C=O group forming a hydrogen bond with the N—H group four residues ahead of it in the helix. Stretches of α-helix found in globular (i.e. tightly folded, approximately spherical) polypeptides can vary in length from a single helical turn to greater than 10 consecutive helical turns. The average length is about three turns.

Figure 2.5 Ball-and-stick and ribbon representations of an α-helix. Reproduced from Sun, P. and Boyington. 1997. Current Protocols in Protein Science by kind permission of the publisher, John Wiley and Sons

Stretches of α-helix are most often positioned on the protein’s surface, with one face of the helix facing the hydrophobic interior and the other facing the surrounding aqueous medium. The amino acid sequence of these helices is such that hydrophobic amino acid residues are positioned on one face of the helix, whereas hydrophobic amino acids line the other. The transmembrane sections of polypeptides that span biological membranes often display one (or more) α-helical stretches. In such instances, almost all the residues found in the helix display hydrophobic side chains.

Figure 2.6 The β-sheet. (a) Two segments of β-strands (antiparallel) forming a β-sheet via hydrogen bonding. The β-strand is drawn schematically as a thick arrow. By convention the arrowhead points in the direction of the polypeptide’s C terminus. (b) Schematic illustration of a two-strand β-sheet in parallel and antiparallel modes

β-strands represent the other major recurring structural element of proteins. β-strands usually are 5–10 amino acid residues in length, with the residues adopting an almost fully extended zigzag conformation. Single β-strands are rarely, if ever, found alone. Instead, two or more of these strands align themselves together to form a β-sheet. The β-sheet is a common structural element stabilized by maximum hydrogen bonding (Figure 2.6). The individual β-strands participating in β-sheet formation may all be present in the same polypeptide, or may be present in two polypeptides held in close juxtaposition. β-sheets are described as being parallel, antiparallel or mixed. A parallel sheet is formed when all the participating β-stretches are running in the same direction (e.g. from the amino terminus to the carboxy terminus; Figure 2.6). An antiparallel sheet is formed when successive strands have alternating directions (N-terminus to C-terminus followed by C-terminus to N-terminus, etc.). A β-sheet containing both parallel and antiparallel strands is termed a mixed sheet.

In terms of secondary structure, most proteins consist of several segments of α-helix and/ or β-strands separated from each other by various loop regions. These regions can vary in length and shape, and allow the overall polypeptide to fold into a compact tertiary structure. In addition to their obvious role in connecting stretches of regular secondary elements, loop regions themselves often participate/contribute directly to the polypeptide’s biological function. The antigen-binding region of antibodies, for example, is largely constructed from six loop regions (Chapter 13). Such loops also often form the active site of enzymes (Chapter 12). One loop structure, termed a β-turn or β-bend, is a characteristic feature of many polypeptides (Figure 2.7).

Figure 2.7 (a) The β-bend or β-turn is often found between two stretches of antiparallel β-strands. (b) It is stabilized in part by hydrogen bonding between the C=O bond and the NH groups of the peptide bonds at the neck of the turn

2.3.2 Tertiary structure

As mentioned previously, a polypeptide’s tertiary structure refers to its exact three-dimensional structure, relating the relative positioning in space of all the polypeptide’s constituent atoms to each other. The tertiary structure of small polypeptides (approximately 200 amino acid residues or less) usually forms a single discrete structural unit. However, when the three-dimensional structure of many larger polypeptides is examined, the presence of two or more structural subunits within the polypeptide becomes apparent. These are termed domains. Domains, therefore, are (usually) tightly folded subregions of a single polypeptide, connected to each other by more flexible or extended regions. As well as being structurally distinct, domains often serve as independent units of function. Cell surface receptors, for example, usually contain one or more extracellular domains (some or all of which participates in ligand binding), a transmembrane domain (hydrophobic in nature and serving to stabilize the protein in the membrane) and one or more intracellular domains that play an effector function (e.g. generation of second messengers). Many therapeutic proteins also display several domains. Tissue plasminogen activator (tPA), for example (Chapter 12), consists of five such domains.

2.3.3 Higher structure determination

There are three potential methods by which a protein’s three-dimensional structure can be visualized: X-ray diffraction, NMR and electron microscopy. The latter method reveals structural information at low resolution, giving little or no atomic detail. It is used mainly to obtain the gross three-dimensional shape of very large (multi-polypeptide) proteins, or of protein aggregates such as the outer viral caspid. X-ray diffraction and NMR are the techniques most widely used to obtain high-resolution protein structural information, and details of both the principles and practice of these techniques may be sourced from selected references provided at the end of this chapter. The experimentally determined three-dimensional structures of some polypeptides are presented in Figure 2.8.

Figure 2.8 Three-dimensional structure of (a) human interleukin-4, as determined by NMR, and (b) human follicle-stimulating hormone, as determined by X-ray diffraction. Reproduced from protein data bank (www.rcsb.org/pdb, molecule ID numbers 1 ITM and 1 FL7 respectively)

2.4 Protein stability and folding

Upon biosynthesis, a polypeptide folds into its native conformation, which is structurally stable and functionally active. The conformation adopted ultimately depends upon the polypeptide’s amino acid sequence, explaining why different polypeptide types have different characteristic conformations. We have previously noted that stretches of secondary structure are stabilized by short-range interactions between adjacent amino acid residues. Tertiary structure, on the other hand, is stabilized by interactions between amino acid residues that may be far apart from each other in terms of amino acid sequence, but which are brought into close proximity by protein folding. The major stabilizing forces of a polypeptide’s overall conformation are:

hydrophobic interactions

electrostatic attractions

covalent linkages.

Hydrophobic interactions are the single most important stabilizing influence of protein native structure. The ‘hydrophobic effect’ refers to the tendency of non-polar substances to minimize contact with a polar solvent such as water. Non-polar amino acid residues constitute a significant proportion of the primary sequence of virtually all polypeptides. These polypeptides will fold in such a way as to maximize the number of such non-polar residue side chains buried in the polypeptide’s interior, i.e. away from the surrounding aqueous environment. This situation is most energetically favourable.

Stabilizing electrostatic interactions include van der Waals forces (which are relatively weak), hydrogen bonds and ionic interactions. Although nowhere near as strong as covalent linkages (Table 2.5), the large number of such interactions existing within a polypeptide renders them collectively quite strong.

Although polypeptides display extensive networks of intramolecular hydrogen bonds, such bonds do not contribute very significantly to overall conformational stability. This is because atoms hydrogen bonding with each other in a folded polypeptide can form energetically equivalent hydrogen bonds with water molecules if the polypeptide is in the unfolded state. Ionic attractions between (oppositely) charged amino acid side chains also contribute modestly to overall protein conformational stability. Such linkages are termed salt bridges, and, as one would expect, they are located primarily on the polypeptide surface.

Table 2.5 Approximate bond energies associated with various (non-covalent) electrostatic interactions, compared with a carbon–carbon single bond

Bond type

Bond strength (kJ mol

−1

)

Van der Waals forces

10

Hydrogen bond

20

Ionic interactions

86

Carbon–carbon bond

350

Disulfide bonds represent the major covalent bond type that can help stabilize a polypeptide’s native three-dimensional structure. Intracellular proteins, although generally harbouring multiple cysteine residues, rarely form disulfide linkages, due to the reducing environment that prevails within the cell. Extracellular proteins, in contrast, are usually exposed to a more oxidizing environment, conducive to disulfide bond formation. In many cases the reduction (i.e. breaking) of disulfide linkages has little effect upon a polypeptide’s native conformation. However, in other cases (particularly disulfide-rich proteins), disruption of this covalent linkage does render the protein less conformationally stable. In these cases the disulfide linkages likely serve to ‘lock’ functional/structurally important elements of domain/tertiary structure in place.

The description of protein structure as presented thus far may lead to the conclusion that proteins are static, rigid structures. This is not the case. A protein’s constituent atoms are constantly in motion, and groups ranging from individual amino acid side chains to entire domains can be displaced via random motion by anything up to approximately 0.2 nm. A protein’s conformation, therefore, displays a limited degree of flexibility, and such movement is termed ‘breathing’.

Breathing can sometimes be functionally significant by, for example, allowing small molecules to diffuse in/out if the protein’s interior. In addition to breathing, some proteins may undergo more marked (usually reversible) conformational changes. Such changes are usually functionally significant. Most often they are induced by biospecific ligand interactions (e.g. binding of a substrate to an enzyme or antigen binding to an antibody).

2.4.1 Structural prediction

Currently, there exists an enormous and growing deficit between the number of polypeptides whose amino acid sequence has been determined and the numbers of polypeptides whose three-dimensional structure has been resolved. Given the complexities of resolving three-dimensional structure experimentally, it is not surprising that scientists are continually attempting to develop methods by which they could predict higher order structure from amino acid sequence data. Although modestly successful secondary structure predictive approaches have been developed, no method by which tertiary structure may be predicted from primary data has thus far been developed.

Over 20 different methods of secondary structure prediction have been reported (Table 2.6). The approaches taken fall into two main categories:

Table 2.6 Some secondary structure predictive methods currently used. Refer to text for further details

Method

Basis of prediction

Chou and Fasman

Empirical statistical method

Garnier, Osguthorpe and Robson (GOR)

Empirical statistical method

EMBL profile neural network (PHD)

Empirical statistical method

Protein sequence analysis (PSA)

Empirical statistical method

Lim

Physicochemical criteria

1. Empirical statistical methods, which are based upon data generated from studying proteins of known three-dimensional structure and correlation of such proteins’ primary amino acid sequences with structural features.

2. Methods based upon physicochemical criteria, such as fold compactness (i.e. the generation of a folded form displaying a tightly packed hydrophobic core and a polar surface).

Most such predictive methods are at best 50–70 per cent accurate. The relatively large inaccuracy stems from the fact that the folded (tertiary) structure imposes constraints upon the nature/extent of secondary structure within some regions of the polypeptide chain. Any generalized ‘rules’ relating secondary structure to amino acid sequence data, by nature, will not take such issues into consideration.