66,99 €
The increasing integration between gene manipulation and genomics is embraced in this new book, Principles of Gene Manipulation and Genomics, which brings together for the first time the subjects covered by the best-selling books Principles of Gene Manipulation and Principles of Genome Analysis & Genomics. * Comprehensively revised, updated and rewritten to encompass within one volume, basic and advanced gene manipulation techniques, genome analysis, genomics, transcriptomics, proteomics and metabolomics * Includes two new chapters on the applications of genomics * An accompanying website - www.blackwellpublishing.com/primrose - provides instructional materials for both student and lecturer use, including multiple choice questions, related websites, and all the artwork in a downloadable format. * An essential reference for upper level undergraduate and graduate students of genetics, genomics, molecular biology and recombinant DNA technology.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1747
Preface
Abbreviations
Chapter 1 Gene manipulation in the post-genomics era
Introduction
Gene manipulation involves the creation and cloning of recombinant DNA
The genomics era began in earnest in 1995 with the complete sequencing of a bacterial genome
Outline of the rest of the book
Part I Fundamental Techniques of Gene Manipulation
Chapter 2 Basic techniques
Introduction
Three technical problems had to be solved before in vitro gene manipulation was possible on a routine basis
A number of basic techniques are common to most gene-cloning experiments
Gel electrophoresis is used to separate different nucleic acid molecules on the basis of their size
Blotting is used to transfer nucleic acids from gels to membranes for further analysis
Southern blotting is the method used to transfer DNA from agarose gels to membranes so that the compositional properties of the DNA can be analyzed
Northern blotting is a variant of Southern blotting that is used for RNA analysis
Western blotting is used to transfer proteins from acrylamide gels to membranes
A number of techniques have been devised to speed up and simplify the blotting process
The ability to transform E. coli with DNA is an essential prerequisite for most experiments on gene manipulation
Electroporation is a means of introducing DNA into cells without making them competent for transformation
The ability to transform organisms other than E. coli with recombinant DNA enables genes to be studied in different host backgrounds
The polymerase chain reaction (PCR) has revolutionized the way that biologists manipulate and analyze DNA
The principle of the PCR is exceedingly simple
RT-PCR enables the sequences on a mRNA molecule to be amplified as DNA
The basic PCR is not efficient at amplifying long DNA fragments
The success of a PCR experiment is very dependent on the choice of experimental variables
By using special instrumentation it is possible to make the PCR quantitative
There are a number of different ways of generating fluorescence in quantitative PCR reactions
It is now possible to amplify whole genomes as well as gene segments
Useful website
Chapter 3 Cutting and joining DNA molecules
Cutting DNA molecules
Joining DNA molecules
Chapter 4 Basic biology of plasmid and phage vectors
Plasmid biology and simple plasmid vectors
Bacteriophage λ
DNA cloning with single-stranded DNA vectors
Chapter 5 Cosmids, phasmids, and other advanced vectors
Introduction
Vectors for cloning large fragments of DNA
Specialist-purpose vectors
Chapter 6 Gene-cloning strategies, 96 Introduction
Introduction
Genomic DNA libraries are generated by fragmenting the genome and cloning overlapping fragments in vectors
The PCR can be used as an alternative to genomic DNA cloning
Complementary DNA (cDNA) libraries are generated by the reverse transcription of mRNA
The PCR can be used as an alternative to cDNA cloning
Many different strategies are available for library screening
Difference cloning exploits differences in the abundance of particular DNA fragments
Chapter 7 Sequencing genes and short stretches of DNA
Chapter 8 Changing genes: site-directed mutagenesis and protein engineering
Protein engineering
Chapter 9 Bioinformatics
Introduction
Databases are required to store and cross-reference large biological datasets
Sequence analysis of genomic DNA involves the de novo identification of genes and other features
Caution must be exercised when using purely in silico methods to annotate genomes
Sequencing also provides new data for molecular phylogenetics
Part II Manipulating DNA in Microbes, Plants, and Animals
Chapter 10 Cloning in bacteria other than Escherichia coli
Cloning in Gram-negative bacteria other than E. coli
Cloning in Gram-positive bacteria
Cloning in Archaea
Chapter 11 Cloning in Saccharomyces cerevisiae and other fungi
Cloning and manipulating large fragments of DNA
Chapter 12 Gene transfer to animal cells, 218 Introduction
Introduction
There are four major strategies for gene transfer to animal cells
There are several chemical transfection techniques for animal cells but all are based on similar principles
Physical transfection techniques have diverse mechanisms
Cells can be transfected with either replicating or non-replicating DNA
Three types of selectable marker have been developed for animal cells
Plasmid vectors for the transfection of animal cells contain modules from bacterial and animal genes
DNA can be delivered to animal cells using bacterial vectors
Chapter 13 Genetic manipulation of animals
Introduction
Three major methods have been developed for the production of transgenic mice
ES cells can be used for gene targeting in mice
Applications of genetically modified mice
Applications of gene targeting
Standard transgenesis methods are more difficult to apply in other mammals and birds
Nuclear transfer technology can be used to clone animals
Gene transfer to Xenopus can result in transient expression or germline transformation
Gene transfer to fish is generally carried out by microinjection, but other methods are emerging
Gene transfer to fruit flies involves the microinjection of DNA into the pole plasma
Chapter 14 Gene transfer to plants
Introduction
Plant tissue culture is required for most transformation procedures
There are four major strategies for gene transfer to plant cells
Agrobacterium-mediated transformation
Direct DNA transfer to plants
Gene targeting in plants
Inplanta transformation minimizes or eliminates the tissue culture steps usually needed for the generation of transgenic plants
Plant viruses can be used as episomal expression vectors
Chapter 15 Advanced transgenic technology
Introduction
Inducible expression systems allow transgene expression to be controlled by physical stimuli or the application of small chemical modulators
Recombinant inducible systems are built from components that are not found in the host animal or plant
Site-specific recombination allows precise manipulation of the genome in organisms where gene targeting is inefficient
Many strategies for gene inactivation do not require the direct modification of the target gene
Gene inhibition is also possible at the protein level
Part III Genome Analysis, Genomics, and Beyond
Chapter 16 The organization and structure of genomes
The organization of nuclear DNA in eukaryotes
Chapter 17 Mapping and sequencing genomes
Sequencing genomes
Chapter 18 Comparative genomics
Comparative genomics of bacteria
Comparative genomics of organelles
Comparative genomics of eukaryotes
Chapter 19 Large-scale mutagenesis and interference
Introduction
Genome-wide gene targeting is the systematic approach to large-scale mutagenesis
Genome-wide random mutagenesis is a strategy applicable to all organisms
Insertional mutagenesis in invertebrates
Libraries of knock-down phenocopies can be created by RNA interference
Chapter 20 Analysis of the transcriptome
Introduction
The transcriptome is the collection of all messenger RNAs in the cell
Steady-state mRNA levels can be quantified directly by sequence sampling
DNA microarray technology allows the parallel analysis of thousands of genes on a convenient miniature device
As transcriptomics technology matures, standardization of data processing and presentation become important challenges
Expression profiling with DNA arrays has permeated almost every area of biology
Chapter 21 Proteomics I - Expression analysis and characterization of proteins
Introduction
Protein expression analysis is more challenging than mRNA profiling because proteins cannot be amplified like nucleic acids
There are two major technologies for protein separation in proteomics
Mass spectrometry is used for protein characterization
Protein microarrays can also be used for expression analysis
Chapter 22 Proteomics II - Analysis of protein structures
Introduction
Structural proteomics has required developments in structural analysis techniques and bioinformatics
International structural proteomics initiatives have been established to solve protein structures on a large scale
Chapter 23 Proteomics III - Protein interactions, 453 Introduction
Introduction
Protein interactions can be inferred by a variety of genetic approaches
New methods based on comparative genomics can also infer protein interactions
Traditional biochemical methods for protein interaction analysis cannot be applied on a large scale
Library-based screening methods allow the large-scale analysis of binary interactions
Systematic analysis of protein complexes can be achieved by affinity purification and mass spectrometry
Interaction screening produces large data sets which require extensive bioinformatic support
Chapter 24 Metabolomics and global biochemical networks
Introduction
Part IV Applications of Gene Manipulation and Genomics
Chapter 25 Applications of genomics: understanding the basis of polygenic disorders and identifying quantitative trait loci
Investigating discrete traits in outbreeding populations (genetic diseases of humans)
Investigating quantitative trait loci (QTLs) in inbred populations
Understanding responses to drugs (pharmacogenomics)
Chapter 26 Applications of recombinant DNA technology
Introduction
Theme 1: Producing useful molecules
Theme 2: Improving agronomic traits by genetic modification
Theme 3: Using genetic modification to study, prevent, and cure disease
References
Appendix: the genetic code and single-letter amino acid designations
Index
© 2006 Blackwell Publishing
BLACKWELL PUBLISHING
350 Main Street, Malden, MA 02148-5020, USA
9600 Garsington Road, Oxford OX4 2DQ, UK
550 Swanston Street, Carlton, Victoria 3053, Australia
The rights of Sandy Primrose and Richard Twyman to be identified as the Authors of this Work have been asserted in accordance with the UK Copyright, Designs, and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs, and Patents Act 1988, without the prior permission of the publisher.
This material was originally published in two separate volumes: Principles of Gene Manipulation, 6th edition (2001) and Principles of Genetic Analysis and Genomics, 3rd edition (2003).
First published 1980Second edition published 1981Third edition published 1985Fourth edition published 1989Fifth edition published 1994Sixth edition published 2001Seventh edition published 2006
1 2006
Library of Congress Cataloging-in-Publication Data
Primrose, S.B.
Principles of gene manipulation and genomics / S.B. Primrose and R.M. Twyman.—7th ed.
p.;cm.
Rev. ed. of: Principles of gene manipulation. 6th ed. 2001 and: Principles of genome analysis and genomics / Sandy B. Primrose, Richard M. Twyman. 3rd ed. 2003.
Includes bibliographical references and index.
ISBN 1-4051-3544-1 (pbk.: alk. paper) 1. Genetic engineering. 2. Genomics. 3. Gene mapping. 4. Nucleotide sequence.
[DNLM: 1. Genetic Engineering. 2. Base Sequence. 3. Chromosome Mapping. 4. DNA, Recombinant. 5. Genomics. QH 442 P952pa 2006] I. Twyman, Richard M. II. Primrose, S.B. Principles of gene manipulation. III. Primrose, S. B. Principles of genome analysis and genomics. IV. Title.
QH442.O42 2006
660.6’5—dc22
2005018202
A catalogue record for this title is available from the British Library.
For further information onBlackwell Publishing, visit our website:www.blackwellpublishing.com
Preface
The first edition of Principles of Gene Manipulation was published over 25 years ago when the recombinant DNA era was in its infancy and the idea of sequencing the entire human genome was inconceivable. In writing the first edition, the aim was to explain a new and rapidly growing technology. The basic philosophy was to present the principles of gene manipulation, and its associated techniques, in sufficient detail to enable the non-specialist reader to understand them. However, as the techniques became more sophisticated and advanced, so the book grew in size and complexity. Eventually, recombinant DNA technology advanced to the stage where the sequencing and analysis of entire genomes became possible. This gave rise to a whole new biological discipline, known as genomics, with its own principles and associated techniques. From this emerged the first edition of another book, Principles of Genome Analysis, whose title changed to Principles of Genome Analysis andGenomics in its third edition to reflect the rapid growth of post-sequencing technologies aiming at the large-scale analysis of gene function. It is now five years since the draft human genome sequence was published and we are reaching the stage where the technologies of gene manipulation and genomics are becoming increasingly integrated. Genome mapping and sequencing technologies borrow extensively from the early recombinant DNA technologies of library construction, cloning, and amplification using the polymerase chain reaction; gene transfer to microbes, animals, and plants is now widely used for the functional analysis of genomes; and the applications of genomics and recombinant DNA are becoming difficult to separate.
This new edition, entitled Principles of Gene Manipulation and Genomics, therefore unites the themes covered formerly by the two separate books and provides for the first time a fully integrated approach to the principles and practice of gene manipulation in the context of the genomics era. As in previous editions of the two books, we have written the text at an advanced undergraduate level, assuming a basic knowledge of molecular biology and genetics but no knowledge of recombinant DNA technology or genomics. However, we are aware that the book is favored not only by newcomers to the field but also by experts, and we have tried to remain faithful to both audiences with our coverage. As before we have not changed the level at which the book is written nor the general style, but we have divided the book into sections to enable the book to be used in different ways by different readers.
The basic methodologies are presented in the first part of the book, which is devoted to cloning in Escherichia coli, while more advanced gene-transfer techniques (applying to other microbes and to animals and plants) are presented in the second part. The reader who has read and understood the material in the first part, or already knows it, should have no difficulty in understanding any of the material in the second part of the book. The third part moves from the basic gene-manipulation technologies to genomics, transcriptomics, proteomics, and metabolomics, the major branches of the high-throughput, large-scale biology that has become synonymous with the new millennium. Finally, the fourth part of the book contains two chapters that discuss how recombinant DNA technology and genomics are being applied in the fields of medicine, agriculture, diagnostics, forensics, and biotechnology.
In writing the first part of the book, we thought carefully about the inclusion of early “historical” information. Although older readers may feel that some of this material is dated, we elected to leave much of it in place because it has an important bearing on today’s methods and an understanding of it is incorrectly assumed in many of today’s publications. We have included such information where it illustrates how modern techniques and procedures have evolved, but we have tried not to catalog outmoded or redundant methods that are no longer used. This is particularly the case in the genomics section where new technologies seem to come and go every day, and few stand the test of time or become truly indispensable. We have aimed to avoid as much jargon as possible, and to explain it clearly where it is absolutely necessary. As is common in all areas of science, the principles of gene manipulation and genomics abound with acronyms and synonyms which are often confusing particularly now molecular biology is becoming increasingly commercial in both basic research and its applications. Where appropriate, we have provided lists of definitions as boxes set aside from the text. Boxes are also used to illustrate key experiments or principles, historical information, and applications. While the text is fully referenced throughout, we have also provided a list of classic papers and reviews at the end of each chapter to ease the wary reader into the scientific literature.
This book would not have been possible without the help and advice of many colleagues. Particular thanks are due to Sue Goddard and her library staff at HPA Porton for assistance with many literature searches. Sandy Primrose would like to dedicate this book to his wife Jill and Richard Twyman would like to dedicate this book to his parents, Irene and Peter, to his children Emily and Lucy, and to Liz for her endless support and encouragement.
Abbreviations
2DE
two-dimensional gel electrophoresis
Ac
Activator
ADME
adsorption, distribution, metabolism and excretion
AFBAC
affected family-based control
AFLP
amplified fragment length polymorphism
ALL
acute lymphoblastic leukemia
AML
acute myeloid leukemia
AMV
avian myeloblastosis virus
APL
acute promyelocytic leukemia
ARS
autonomously replicating sequence
ATRA
all-trans
-retinoic acid
BAC
bacterial artificial chromosome
BCG
Bacille Calmette–Guérin
bFGF
basic fibroblast growth factor
BIND
Biomolecular Interaction Network Database
BLAST
Basic Local Alignment Search Tool
BLOSUM
Blocks Substitution Matrix
BMP
bone morphogenetic protein
bp
base pair
BRET
bioluminescence resonance energy transfer
CAPS
cleavable amplified polymorphic sequences
CASP
Critical Assessment of Structural Prediction
CATH
Class, Architecture, Topology and Homologous superfamily (database)
ccc DNA
covalently closed circular DNA
CCD
charge couple device
CD
circular dichroism
cDNA
complementary DNA
CEPH
Centre d’ Etude du Polymorphisme Humain
cfu
commonly forming unit
CHEF
contour-clamped homogeneous electrical field
CID
chemically induced dimerization Also: collision-induced dissociation
cM
centimorgan
COG
cluster of orthologous groups
cR
centiRay
cRNA
complementary RNA
CSSL
chromosome segment substitution line
ct
chloroplast
DALPC
direct analysis of large protein complexes
DAS
distributed annotation system
DAS
downstream activation site
DBM
diazobenzyloxymethyl
DDBJ
DNA Databank of Japan
DIP
Database of Interacting Proteins
DMD
Duchenne muscular dystrophy
DNA
deoxyribonucleic acid
dNTP
deoxynucleoside triphosphate
Ds
Dissociation
dsDNA
double-stranded DNA
dsRNA
double-stranded RNA
EGF
epidermal growth factor
ELISA
enzyme-linked immunosorbent sandwich assay
EMBL
European Molecular Biology Laboratory
ENU
ethylnitrosourea
EOP
efficiency of plating
ES
embryonic stem (cells)
ESI
electrospray ionization
EST
expressed sequence tag
EUROFAN
European Functional Analysis Network (consortium)
FACS
fluorescence-activated cell sorting
FEN
flap endonuclease
FIAU
Fialuridine (1–2′-deoxy-2′-fluoro-β-d-arabinofuranosyl-5-iodouracil
FIGE
field-inversion gel electrophoresis
FISH
fluorescence
in
situ hybridization
FPC
fingerprinted contigs
FRET
fluorescence resonance energy transfer
FSSP
Fold classification based on Structure–Structure alignment of Proteins (database)
GASP
Genome Annotation aSsessment Project
G-CSF
granulocyte colony stimulating factor
GeneEMAC
gene external marker-based automatic congruencing
GGTC
German Gene Trap Consortium
GST
gene trap sequence tag
GST
glutathione-
S
-transferase
HAT
hypoxanthine, aminopterin and thymidine
HDL
high-density lipoprotein
HERV
human endogenous retrovirus
HGP
Human Genome Project
HLA
human leukocyte antigen
HPRT
hypoxanthine phosphoribosyltransferase
HTF
Hpa
II tiny fragment
htSNP
haplotype tag single nucleotide polymorphism
ibd
identical by descent
ICAT
isotope-coded affinity tag
IDA
interaction defective allele
IEF
isoelectric focusing
Ihh
Indian hedgehog
IPTG
isopropylthio-β-d-galactopyranoside
IST
interaction sequence tag
ITCHY
incremental truncation for the creation of hybrid enzymes
IVET
in vivo
expression technology
kb
kilobase
LCR
low complexity region
LD
linkage disequilibrium
LINE
long interspersed nuclear element
LOD
logarithm
10
of odds
LTR
long terminal repeat
m: z
mass : charge ratio
MAD
multiwavelength anomalous diffraction
MAGE
microarray and gene expression
MAGE-ML
microarray and gene expression mark-up language
MAGE-OM
microarray and gene expression object model
MALDI
matrix assisted laser desorption ionization
MAR
matrix attachment region
Mb
megabase
MCAT
mass coded abundance tag
MCS
multiple cloning site
MDA
multiple displacement amplification
MGED
Microarray Gene Expression Database
MHC
major histocompatibility complex
MIAME
minimum information about a microarray experiment
MIP
molecularly imprinted polymer
MIPS
Munich Information Center for Protein Sequences
MM
‘mismatch’ oligonucleotide
MMTV
mouse mammary tumor virus
MPSS
massively parallel signature sequencing
mRNA
messenger RNA
MS
mass spectrometry
MS/MS
tandem mass spectroscopy
mt
mitochondrial
MTM
Maize Targeted Mutagenesis project
Mu
Mutator
MudPIT
multidimensional protein identification technology
MuLV
Moloney murine leukemia virus
NCBI
National Center for Biotechnology Information
NDB
Nucleic Acid Databank
NGF
nerve growth factor
NIGMS
National Institute of General Medical Sciences
NIL
near isogenic line
NMR
nuclear magnetic resonance
NOE
nuclear Overhauser effect
NOESY
NOE spectroscopy
nt
nucleotide
oc DNA
open circular DNA
OFAGE
orthogonal-field-alternation gel electrophoresis
OMIM
on-line Mendelian inheritance in man
ORF
open-reading frame
ORFan
orphan open-reading frame
P/A
presence/absence polymorphism
PAC
P1-derived artificial chromosome
PAGE
polyacrylaminde gel electrophoresis
PAI
pathogenicity island
PAM
percentage of accepted point mutations
PCR
polymerase chain reaction
PDB
Protein Databank (database)
Pfam
Protein families database of alignments
PFGE
pulsed field gel electrophoresis
PM
‘perfect match’ oligonucleotide
poly(A)
+
polyadenylated
PQL
protein quantity loci
PRINS
primed
in situ
PS
position shift polymorphism
PSI-BLAST
Position-Specific Iterated BLAST (software)
PTGS
post-transcriptional gene silencing
PVDF
polyvinylidine difluoride
QTL
quantitative trait loci
RACE
rapid amplification of cDNA ends
RAGE
recombinase-activated gene expression
RAPD
randomly amplified polymorphic DNA
RARE
RecA-assisted restriction endonuclease
RC
recombinant congenic (strains)
RCA
rolling circle amplification
RCSB
Research Collaboratory for Structural Bioinformatics
rDNA/RNA
ribosomal DNA/RNA
REMI
restriction enzyme-mediated integration
RFLP
restriction fragment length polymorphism
RIL
recombinant inbred line
R-M
restriction-modification
RNA
ribonucleic acid
RNAi
RNA interference
RNase
ribonuclease
RPMLC
reverse phase microcapillary liquid chromatography
RRS
Ras recruitment system
RT-PCR
reverse transcriptase polymerase chain reaction
RTX
repeats in toxins
SAGE
serial analysis of gene expression
SCOP
Structural Classification of Proteins (database)
SCOPE
structure-based combinatorial protein engineering
SDS
sodium dodecyl sulfate
SELDI
surface-enhanced laser desorption and ionization
SGA
synthetic genetic array
SGDP
Saccharomyces
Gene Deletion Project
Shh
sonic hedgehog
SILAC
stable-isotope labeling with amino acids in cell culture
SINE
short interspersed nuclear element
SINS
sequenced insertion sites
SISDC
sequence-independent site-directed chimeragenesis
SNP
single nucleotide polymorphism
SPIN
Surface Properties of protein–protein Interfaces (database)
Spm
Suppressor–mutator
SPR
surface plasmon resonance
SRCD
synchrotron radiation circular dichroism
SRS
sequence retrieval system
SRS
SOS recruitment system
SSLP
simple sequence length polymorphism
SSR
simple sequence repeat
STC
sequence-tagged connector
STM
signature-tagged mutagenesis
STS
sequence-tagged site
TAC
transformation-competent artificial chromosome
TAFE
transversely alternating-field electrophoresis
TAP
tandem affinity purification
TAR
transformation-associated recombination
T-DNA
Agrobacterium
transfer DNA
TIGR
The Institute for Genomic Research
TIM
triose phosphate isomerase
TOF
time of flight
tRNA
transfer RNA
TUSC
Trait Utility System for Corn
UAS
upstream activation site
UPA
universal protein array
URS
upstream repression site
USPS
ubiquitin-based split protein sensor
UTR
untranslated region
VDA
variant detector array
VIGS
virus-induced gene silencing
WGA
whole-genome amplification
Y2H
yeast two-hybrid
YAC
yeast artificial chromosome
YCp
yeast centromere plasmid
YEp
yeast episomal plasmid
YIp
yeast integrating plasmid
YRp
yeast replicating plasmid
Since the beginning of the last century, scientists have been interested in genes. First, they wanted to find out what genes were made of, how they worked, and how they were transmitted from generation to generation with the seemingly mythic ability to control both heredity and variation. Genes were initially thought of in functional terms as hereditary units responsible for the appearance of particular biological characteristics, such as eye or hair color in human beings, but their physical properties were unclear. It was not until the 1940s that genes were shown to be made of DNA, and that a workable physical and functional definition of the gene – a length of DNA encoding a particular protein – was achieved (Box 1.1). Next, scientists wanted to find ways to study the structure, behavior, and activity of genes in more detail. This required the simultaneous development of novel techniques for DNA analysis and manipulation. These developments began in the early 1970s with the first experiments involving the creation and manipulation of recombinant DNA. Thus began the recombinant DNA revolution.
The definition of recombinant DNA is any artificially created DNA molecule which brings together DNA sequences that are not usually found together in nature. Gene manipulation refers to any of a variety of sophisticated techniques for the creation of recombinant DNA and, in many cases, its subsequent introduction into living cells. In the developed world there is a precise legal definition of gene manipulation as a result of government legislation to control it. In the UK, for example, gene manipulation is defined as: “… the formation of new combinations of heritable material by the insertion of nucleic acid molecules,produced by whatever means outside the cell, into any virus, bacterial plasmid or other vector system so as to allow their incorporation into a host organism in which they do not naturally occur but in which they are capable of continued propagation.” The propagation of recombinant DNA inside a particular host cell so that many copies of the same sequence are produced is known as cloning.
Cloning was a significant breakthrough in molecular biology because it became possible to obtain homogeneous preparations of any desired DNA molecule in amounts suitable for laboratory-scale experiments. A single organism, the bacterium Escherichia coli, played the dominant role in the early years of the recombinant DNA era. This bacterium had always been a popular model system for molecular geneticists and, prior to the development of recombinant DNA technology, there were already a large number of well-characterized mutants, gene regulation was understood, and many plasmids had been isolated. It is not surprising that the first cloning experiments were undertaken in E. coli and that this organism became the primary cloning host. Subsequently, cloning techniques were extended to a range of other microorganisms, such as Bacillus subtilis, Pseudomonas spp., yeasts, and filamentous fungi, and then to higher eukaryotes. Despite these advances, E. coli remains the most widely used cloning host even today because gene manipulation in this bacterium is technically easier than in any other organism. As a result, it is unusual for researchers to clone DNA directly in other organisms. Rather, DNA from the organism of choice is first manipulated in E. coli and subsequently transferred back to the original host or another organism, as appropriate. Without the ability to clone and manipulate DNA in E. coli, the application of recombinant DNA technology to other organisms would be greatly hindered.
Until the mid-1980s, all cloning was cell-based (i.e. the DNA molecule of interest had to be introduced into E. coli or another host for amplification).
In 1983, there was a further mini-revolution in molecular biology with the invention of the polymerase chain reaction (PCR). This technique allowed DNA sequences to be amplified in vitro using pure enzymes. The great sensitivity and robustness of the PCR allows DNA to be prepared rapidly from very small amounts of starting material and material of very poor quality, but it is not as accurate as cell-based cloning and only works on relatively short DNA sequences. Therefore cell-based cloning and the PCR have complementary but overlapping uses in gene manipulation.
Although the initial cloning experiments generated a great deal of excitement, it is unlikely that any of the early workers in this field could have predicted the immense impact recombinant DNA technology would have on the progress of scientific understanding and indeed on society as a whole, particularly in the fields of medicine and agriculture. Today, gene manipulation underlies a multi-billion dollar industry, employing hundreds of thousands of people worldwide and offering solutions to some of mankind’s most intractable problems. The ability to insert new combinations of genetic material into microbes, animals, and plants offers novel ways to produce valuable small molecules and proteins; provides the means to produce plants and animals that are disease-resistant, tolerant of harsh environments, and have higher yields of useful products; and provides new methods to treat and prevent human disease.
Fig. 1.1 The impact of gene manipulation on the practice of medicine.
The developments in gene manipulation that have taken place in the last 30 years have revolutionized medicine by increasing our understanding of the basis of disease, providing new tools for disease diagnosis, and opening the way to the discovery or development of new drugs, treatments, and vaccines.
The first medical benefit to arise from recombinant DNA technology was the availability of significant quantities of therapeutic proteins, such as human growth hormone (HGH), which is used to treat growth defects. Originally HGH was purified from pituitary glands removed from cadavers. However, many pituitary glands are required to produce enough HGH to treat just one child. Furthermore, some children treated with pituitary-derived HGH have developed Creutzfeld–Jakob syndrome originating from cadavers. Following the cloning and expression of the HGH gene in E. coli, it became possible to produce enough HGH in a 10-liter fermenter to treat hundreds of children. Since then, many different therapeutic proteins have become available for the first time. Many of these proteins are also manufactured in E. coli but others are made in yeast or animal cells and some in plants or the milk of genetically modified animals. The only common factor is that the relevant gene has been cloned and overexpressed using the techniques of gene manipulation.
Medicine has benefited from recombinant DNA technology in other ways (Fig. 1.1). For example, novel routes to vaccines have been developed: the current hepatitis B vaccine is produced by the expression of a viral antigen on the surface of yeast cells, and a recombinant vaccine has been used to eliminate rabies from foxes in a large part of Europe. Gene manipulation can also be used to increase the levels of small molecules within microbial or plant cells. This can be done by cloning all the genes for a particular biosynthetic pathway and overexpressing them. Alternatively, it is possible to shut down particular metabolic pathways and thus redirect intermediates towards the desired end product. This approach has been used to facilitate production of chiral intermediates, antibiotics, and novel therapeutic entities. New antibiotics can also be created by mixing and matching genes from organisms producing different but related molecules in a technique known as combinatorial biosynthesis.
Gene cloning enables nucleic acid probes to be produced readily, and such probes have many uses in medicine. For example, they can be used to determine or confirm the identity of a microbial pathogen or to carry out pre- or peri-natal diagnosis of an inherited genetic disease. Increasingly, probes are being used to determine the likelihood of adverse reactions to drugs or to select the best class of drug to treat a particular illness in different groups of patients. Nucleic acids are also being used as therapeutic entities in their own right. For example, antisense nucleic acids are being used to downregulate gene expression in certain diseases, and the relatively new phenomenon of RNA interference is poised to become a breakthrough technology for the development of new therapeutic approaches. In other cases, nucleic acids are being administered to correct or repair inherited gene defects (gene therapy, gene repair) or as vaccines. In the reverse of gene repair, animals are being generated that have mutations identical to those found in human disease. These are being used as models to learn more about disease pathology and to test novel therapies.
As well as techniques for DNA cloning and transfer to new host cells, the recombinant DNA revolution spawned new technologies for gene mapping (ordering genes on chromosomes) and DNA sequencing (determining the order of bases, identified by the letters A, C, G, and T, along the DNA molecule). Within the gene itself, the order of bases determines the protein encoded by the gene by specifying the order of amino acids. Thus, DNA sequencing made it possible to work out the amino acid sequence of the encoded protein without the direct analysis of the protein itself. This was extremely useful because, at the time DNA sequencing was first developed, only the most abundant proteins in the cell could be purified in sufficient quantities to facilitate direct analysis. Further elements surrounding the coding region of the gene were identified as control regions, specifying each gene’s expression profile. As more sequence data accumulated, it became possible to identify common features in related genes, both in the coding region and the regulatory regions. This type of sequence analysis was greatly facilitated by the foundation of sequence databases, and the development of computer-aided techniques for sequence analysis and comparison, a field now known as bioinformatics. Today, DNA molecules can be scanned quickly for a whole series of structural features, e.g. restriction enzyme recognition sites, matches or overlaps with other sequences, start and stop signals for transcription and translation, and sequence repeats, using programs available on the Internet.
The original goal of sequencing was to determine the precise order of nucleotides in a gene, but soon the goal became the sequence of a small genome. A genome is the complete content of genetic information in an organism, i.e. all the genes and other sequences it contains. The first target was the genome of a small virus called ϕX174, then larger plasmid and viral genomes, then chromosomes and microbial genomes until ultimately the complete genomes of higher eukaryotes were sequenced (Table 1.1). In the mid-1980s, scientists began to discuss seriously how the entire human genome might be sequenced. To put these discussions in context, the largest stretch of DNA that can be sequenced in a single pass (even today) is 600–800 nucleotides and the largest genome that had been sequenced in 1985 was that of the 172-kb Epstein–Barr virus (Baer et al. 1984). By comparison, the human genome is 3000 Mb in size, over 17,000 times bigger! One school of thought was that a completely new sequencing methodology would be required, and a number of different technologies were explored but with little success. Early on, however, it was realized that existing sequencing technology could be used if a large genome could be broken down into more manageable pieces for sequencing in a highly parallel fashion, and then the pieces could be joined together again. A strategy was agreed upon in which a map of the human genome would be used as a scaffold to assemble the sequence.
Table 1.1 Timeline of genome sequencing, showing the increasing genome sizes that have been achieved.
The problem here was that in 1985 there were not enough markers, or points of reference, on the human genome map to produce a physical scaffold on which to assemble the complete sequence. Genetic maps are based on recombination frequencies, and in model organisms they are constructed by carrying out large-scale crosses between different mutant strains. The principle of a genetic map is that the further apart two loci are on a chromosome, the more likely that a crossover will occur between them during meiosis. Recombination events resulting from crossovers can be scored in genetically amenable organisms such as the fruit fly Drosophila melanogaster and yeast by looking for new combinations of the mutant phenotypes in the offspring of the cross. This approach cannot be used in human populations because it would involve setting up large-scale matings between people with different inherited diseases. Instead, human genetic maps rely on the analysis of DNA sequence polymorphisms, i.e. naturally occurring DNA sequence differences in the population which do not have an overt, debilitating effect. A major breakthrough was the development of methods for using DNA probes to identify polymorphic sequences (Botstein et al. 1980).
Prior to the Human Genome Project (HGP), low-resolution genetic maps had been constructed using restriction fragment length polymorphisms (RFLPs). These are naturally occurring variations that create or destroy sites for restriction enzymes and therefore generate different sized bands on Southern blots (Fig. 1.2). The Southern blot is a technique for separating DNA fragments by size, see Fig. 2.6, p. 23. The problem with RFLPs was that they were too few and too widely spaced to be of much use for constructing a framework for physical mapping – the first RFLP map had just over 400 markers and a resolution of 10 cM, equivalent to one marker for every 10 Mb of DNA (Donis-Keller et al. 1987). The necessary breakthrough came with the discovery of new polymorphic markers, known as microsatellites, which were abundant and widely dispersed in the genome (Fig. 1.3). By 1992, a genetic map based on microsatellites had been constructed with a resolution of 1 cM (equivalent to one marker for every 1 Mb of DNA) which was a suitable template for physical mapping.
Fig. 1.2 Restriction fragment length polymorphisms (RFLPs) are sequence variants that create or destroy a restriction site in DNA therefore altering the length of the restriction fragment that is detected. The top panel shows two alternative alleles, in which the restriction fragment detected by a specific probe differs in length due to the presence or absence of the middle of three restriction sites (represented by vertical arrows). Alleles a and b therefore produce hybridizing bands of different sizes in Southern blots (lower panel). This allows the alleles to be traced through a family pedigree. For example child II.2 has inherited two copies of allele a, one from each parent, while child II.4 has inherited one copy of allele a and one copy of allele b.
Unlike genetic maps, physical maps are based on real units of DNA and therefore provide a basis for sequence assembly. The physical mapping phase of the HGP involved the creation of genomic DNA libraries and the identification and assembly of overlapping clones to form contigs (unbroken series of clones representing contiguous segments of the genome). When the HGP was initiated, the highest-capacity vectors available for cloning were cosmids, with a maximum insert size of 40 kb. Because hundreds of thousands of cosmid clones would have to be screened to assemble a physical map, the HGP would not have progressed very quickly without the development of novel high-capacity vectors and methods to find overlaps between them so that clone contigs could be assembled on the genomic scaffold.
Fig. 1.3 Microsatellites are sequence variants that cause restriction fragments or PCR products to differ in length due to the number of copies of a short tandem repeat sequence, 1–12 nt in length. The top panel shows four alternative alleles, in which the restriction fragment detected by a specific probe differs in length due to a variable number of tandem repeats. All four alleles produce bands of different sizes on Southern blots (lower panel) or different sized PCR products (not shown). Unlike RFLPs, multiple allelism is common for microsatellites so the precise inheritance pattern in a family pedigree can be tracked. For example, the mother and father in the pedigree have alleles b/d and a/c, respectively (the smaller DNA fragments move further during electrophoresis). The first child, II.1, has inherited allele b from his mother and allele a from his father.
The late 1980s and early 1990s saw much debate about the desirability of sequencing the human genome. This debate often strayed from rational scientific debate into the realms of politics, personalities, and egos. Among the genuine issues raised were questions such as:
Is the sequencing of the human genome an intellectually appropriate project for biologists?
Is sequencing the human genome feasible?
What benefits might arise from the project?
Will these benefits justify the cost and are there alternative ways of achieving the same benefits?
Will the project compete with other areas of biology for funding and intellectual resources?
Behind the debate was a fear that sequencing the human genome was an end in itself, much like a mountaineer who climbs a new peak just because it is there.
The publicly funded Human Genome Project was officially launched in 1990, and the scientific community began to develop new strategies to enable the large-scale mapping and sequencing that were required to complete the project, strategies which centered around high-throughput, highly parallel automated sequencing. One of the benefits of this new technology development was the completion of several pilot genome projects, beginning with that of the bacterium Hemophilus influenzae (Fleischmann et al. 1995). The net effect was that by the time the human genome had been sequenced (International Human Genome Sequencing Consortium 2001, Venter et al. 2001), the complete sequence was already known for over 30 bacterial genomes plus that of a yeast (Saccharomyces cerevisiae), the fruit fly, a nematode (Caenorhabditis elegans), and a plant (Arabidopsis thaliana).
Parallel developments in the field of bioinformatics were required to handle and analyze the exponentially increasing amounts of sequence data arising from the genome projects, but bioinformatics also facilitated the development of new sequencing strategies. For example, when a European consortium set itself the goal of sequencing the entire genome of the budding yeast S. cerevisiae (15 Mb), they segmented the task by allocating the sequencing of each chromosome to different groups. That is, they subdivided the genome into more manageable parts. At the time this project was initiated there was no other way of achieving the objective and when the resulting genomic sequence was published (Goffeau et al. 1996), it was the result of a unique multi-institution collaboration. While the S. cerevisiae sequencing project was underway, a new genomic sequencing strategy was unveiled: shotgun sequencing. In this approach, large numbers of genomic fragments are sequenced and sophisticated bioinformatics algorithms used to construct the finished sequence. In contrast to the consortium approach used with S. cerevisiae, a single laboratory set up as a sequencing factory undertook shotgun sequencing.
The first success with shotgun sequencing was the complete sequence of the bacterium H. influenzae (Fleischmann et al. 1995) and this was quickly followed with the sequences of Mycoplasmagenitalium (Fraser et al. 1995), Mycoplasma pneumoniae (Himmelreich et al. 1996) and Methanococcus jannaschii (Bult et al. 1996). It should be noted that H. influenzae was selected for sequencing because so little was known about it: there was no genetic map and not much biochemical data either. By contrast, S. cerevisiae was a well-mapped and well-characterized organism. As will be seen in Chapter 17, the relative merits of shotgun sequencing vs. ordered, map-based sequencing are still being debated today. Nevertheless, the fact that a major sequencing laboratory can turn out the entire sequence of a bacterium in 1–2 months shows the power of shotgun sequencing.
Fears that sequencing the human genome would be an end in itself have proved groundless. Because so many different genomes have been sequenced it is now possible to undertake comparative analyses of genomes, a topic known as comparative genomics. By comparing genomes from distantly related species we can begin to decipher the major stages in evolution. By comparing more closely related species we can begin to uncover more recent events such as genome rearrangement which have facilitated speciation (see e.g. Murphy et al. 2004). Currently, the most fertile area of comparative genomics is the analysis of bacterial genomes because so many have been sequenced. Already this analysis is throwing up some interesting questions. For example, over 25% of the genes in any one bacterial genome have no equivalents in any other sequenced genome. Is this an artifact resulting from limited sequence data or does it reflect the unique evolutionary events that have shaped the genomes of these organisms? Similarly, comparative analysis of the genomes of a wide range of thermophiles has revealed numerous interesting features, including strong evidence of extensive horizontal gene transfer. However, what is the genomic basis for thermophily? We still do not know.
One of the fascinating aspects of the classic paper by Fleischmann et al. (1995) was their analysis of the metabolic capabilities of H. influenzae, which they deduced from sequence information alone. This analysis has been extended to every other sequenced genome and is providing tremendous insight into the physiology and ecological adaptability of different organisms. For example, obligate parasitism in bacteria is linked to the absence of genes for certain enzymes involved in central metabolic pathways. Another example is the correlation between genome size and the diversity of ecological niches that can be colonized. The larger the bacterial genome, the greater are the metabolic capabilities of the host organism and this means that the organism can be found in a greater number of habitats.
Another benefit of genome mapping and sequencing that deserves mention is the proliferation of international scientific collaborations. In magnitude, the goal of sequencing the human genome was equivalent to putting a man on the moon. However, putting a man on the moon was a race between two nations and was driven by global political ambitions as much as by scientific challenge. By contrast, genome sequencing truly has been an international effort requiring laboratories in Europe, North America, and Japan to collaborate in a way never seen before. The extent of this collaboration can be seen by looking at the affiliations of the authors on many of the classic genome papers (e.g. The Arabidopsis Genome Initiative 2000, International Human Genome Sequencing Consortium 2001). The fact that one US company, Celera Genomics Inc., has successfully undertaken many sequencing projects in no way diminishes this collaborative effort. Rather, they have constantly challenged the accepted way of doing things and have increased the efficiency with which key tasks have been undertaken.
Three other aspects of genome sequencing and genomics deserve mention. First, in other branches of science such as nuclear physics and space exploration, the concept of “superfacilities” is well established. With the advent of whole genome sequencing, biology is moving into the superfacility league and a number of sequencing “factories” have been established. Secondly, high throughput methodologies have become commonplace and this has meant a partnering of biology with automation, instrumentation, and data management. Thirdly, many biologists have eschewed chemistry, physics, and mathematics but progress in genomics demands that biologists have a much greater understanding of these subjects. For example, methodologies such as mass spectrometry, X-ray crystallography, and protein structure modeling are now fundamental to the identification of gene function. The impact that this has on undergraduate recruitment in the sciences remains to be seen.
Knowing the complete genome sequence of any organism is very useful, but more important is finding the genes and determining their functions. One of the most surprising results from the early genome projects was the discovery of how little was known about even the best-characterized organisms. In the case of the bakers’ yeast (S. cerevisiae), which was considered a very well-characterized model species, only one-third of the genes identified in the sequencing project had been identified before. Over 4000 genes were discovered with no known function. Some of these could be assigned tentative functions on the basis of similarity to known genes either in the yeast or in other organisms, but this still left over 2000 genes whose function could only be established by direct experiments.
Following sequencing and annotation (gene finding) scientists then turned their attention to the functional characterization of newly identified genes. This has given rise to two new branches of biology, completely unheard of before 1995. These are transcriptomics (the large-scale study of mRNA expression) and proteomics (the large-scale study of proteins). While mRNA can yield useful information in terms of sequence, expression profile, and abundance, direct analysis of proteins is much more informative, since proteins can be analyzed not only in terms of sequence and abundance but also in terms of structure, post-translational modification, localization, and interactions with other molecules. No-one working in the 1970s, when recombinant DNA was a novel technology and protein analysis was laborious, could have imagined today’s large-scale experiments, where thousands of proteins can be separated on a high-resolution gel, digested into peptides, and identified rapidly by mass spectrometry. In the post-genomics era, it is becoming possible to carry out complete characterizations of cells, at the level of the genome, the transcriptome, the proteome, and now even the metabolome (the global profile of small-molecule metabolites in the cell).
The early successes in overproducing mammalian proteins in E. coli suggested to a few entrepreneurial individuals that a new company should be formed to exploit the potential of recombinant DNA technology. Thus was Genentech Inc. born (Box 1.2). Since then, thousands of biotechnology companies have been formed worldwide. As soon as major new developments in the science of gene manipulation are reported, a rash of new companies is formed to commercialize the new technology. For example, many recently formed companies are hoping the data from the Human Genome Project will result in the identification of a large number of new proteins with potential for human therapy. Other companies have been founded to exploit novel technologies for recombinant protein expression or the applications of therapeutic nucleic acids.
Although there are thousands of biotechnology companies, fewer than 100 have sales of their products and even fewer are profitable. Already many biotechnology companies have failed, but the technology advances at such a rate that there is no shortage of new company start-ups to take their place. One group of biotechnology companies that has prospered is those supplying specialist reagents to laboratory workers engaged in gene manipulation, genomics, and proteomics. In the very beginning, researchers had to make their own restriction enzymes and this limited the technology to those with protein chemistry skills. Soon a number of companies were formed which catered to the needs of researchers by supplying high-quality enzymes for DNA manipulation. Despite the availability of these enzymes, many people had great difficulty in cloning DNA. The reason for this was the need for careful quality control of all the components used in the preparation of reagents, something researchers are not good at! The supply companies responded by making easy-to-use cloning kits in addition to enzymes. Today, these supply companies can provide almost everything that is needed to clone, express, and analyze DNA and have thereby accelerated the use of recombinant DNA technology in all biological disciplines. In the early days of recombinant DNA technology, the development of methodology was an end in itself for many academic researchers. This is no longer true. The researchers have gone back to using the tools to further our knowledge of biology, and the development of new methodologies has largely fallen to the supply companies.
The remainder of this book is divided into four parts. Part I is devoted to the basic methodology for manipulating genes, and covers techniques for cloning and gene manipulation in E. coli as well as in vitro methods such as the PCR (Fig. 1.4). Basic techniques for gene and protein analysis are also described. Chapter 2 covers many of the techniques that are common to all cloning experiments and are fundamental to the success of the technology. Chapter 3 is devoted to methods for selectively cutting DNA molecules into fragments that can be readily joined together again. Without the ability to do this, there would be no recombinant DNA technology. If fragments of DNA are inserted into cells, they fail to replicate except in those rare cases where they integrate into the chromosome. To enable such fragments to be propagated, they are inserted into DNA molecules (vectors) that are capable of extrachromosomal replication. These vectors are derived from plasmids and bacteriophages and their basic properties are described in Chapter 4.
Table B1.1 Key events at Genentech.
1976Genentech founded1977Genentech produced first human protein (somatostatin) in a microorganism1978Human insulin cloned by Genentech scientists1979Human growth hormone cloned by Genentech scientists1980Genentech went public, raising $35 million1982First recombinant DNA drug (human insulin) marketed (Genentech product licensed to Eli Lilly & Co.)1984First laboratory production of factor VIII for therapy of hemophilia. License granted to Cutter Biological1985Genentech launched its first product, Protropin (human growth hormone), for growth hormone deficiency in children1987Genentech launched Activase (tissue plasminogen activator) for dissolving blood clots in heart-attack patients1990Genentech launched Actimmune (interferon-γ1β) for treatment of chronic granulomatous disease1990Genentech and the Swiss pharmaceutical company Roche complete a $2.1 billion mergerOriginally, the purpose of vectors was the propagation of cloned DNA but today vectors fulfil many other roles, such as facilitating DNA sequencing, promoting expression of cloned genes, facilitating purification of cloned gene products, and reporting the activity and localization of proteins. The specialist vectors for these tasks are described in Chapter 5. With this background in place it is possible to describe in detail how to clone the particular DNA sequences that one wants. There are two basic strategies. Either one clones all the DNA from an organism and then selects the very small number of clones of interest or one amplifies the DNA sequences of interest and then clones these. Both these strategies are described in Chapter 6, which focuses on methods for cloning individual genes. Once the DNA of interest has been cloned, it can be sequenced and this will yield information on the proteins that are encoded and any regulatory signals that are present (Chapter 7). There might also be a wish to modify the DNA and/or protein sequence and determine the biological effects of such changes. The techniques for sequencing and changing cloned genes and the properties of the encoded protein are described in Chapter 8. Finally, Chapter 9 provides an overview of bioinformatics, the essential computer-based methods for the analysis of genes and their products.
Fig. 1.4 Roadmap outlining the first section of the book, which covers basic techniques in gene manipulation and their relationships.
Part II of the book describes the specialist techniques for cloning in organisms other than E. coli (Fig. 1.5). Each of these chapters can be read in isolation from the other chapters in this section provided that there is a thorough understanding of the material from the first part of the book. Chapter 10 details the methods for cloning in other bacteria. Originally it was thought that some of these bacteria, e.g. B. subtilis, would usurp the position of E. coli. This has not happened and gene manipulation techniques are used simply to better understand the biology of these bacteria. Chapter 11 focuses on cloning in fungi, although the emphasis is on the yeast S. cerevisiae. Fungi are eukaryotes and are useful model systems for investigating topics such as meiosis, mitosis, and the control of cell division. Animal cells can be cultured like microorganisms and the techniques for introducing genes into them are described in Chapter 12. Chapters 13 and 14 describe basic procedures for the introduction of genes into animals and plants, respectively, while Chapter 15 covers some of the more cutting-edge techniques for these same systems.
Part III of the book moves from gene manipulation to genomics (Fig. 1.6). Chapter 16 introduces the topic of genomics by providing a biological survey of genomes. The genomes of free-living cellular organisms range in size from less than 1 Mb for some bacteria to millions, or tens of millions, of megabases for some plants. The sheer size of the genome of even a simple bacterium is such that to handle it in the laboratory we need to break it down into smaller pieces that are propagated as clones. As stated above, one way to approach this problem is to create a genome map, which can then be populated with physical landmarks onto which the smaller DNA fragments can be assembled. Another approach is to dispense with the map and break the entire genome into pieces, sequence them, and reassemble them. The methods for mapping genomes and assembling physical clone maps are discussed in Chapter 17.
Fig. 1.5 Roadmap outlining the second section of the book, which covers advanced techniques in gene manipulation and their application to organisms other than E. coli.
Fig. 1.6 Roadmap covering the early chapters of Part III, which discuss different methodologies for mapping and sequencing genomes.
Sequencing a genome is not an end in itself. Rather, it is just the first stage in a long journey whose goal is a detailed understanding of all the biological functions encoded in that genome and their evolution. To achieve this goal it is necessary to define all the genes in the genome and the functions that they encode. There are a number of different ways of doing this, one of which is comparative genomics (Chapter 18). The premise here is that DNA sequences encoding important cellular functions are likely to be conserved whereas dispensable or non-coding sequences will not. However, comparative genomics only gives a broad overview of the capabilities of different organisms. For a more detailed view one needs to identify each gene in the genome and determine its function. Over the last few years, technology developments in this new discipline of functional genomics have been nothing short of breathtaking. The final six chapters in this section look at ways in which large-scale functional analysis can be carried out (Fig. 1.7).