Principles of Gene Manipulation and Genomics - Sandy B. Primrose - E-Book

Principles of Gene Manipulation and Genomics E-Book

Sandy B. Primrose

58,99 €


The increasing integration between gene manipulation and genomicsis embraced in this new book, Principles of Gene Manipulationand Genomics, which brings together for the first time thesubjects covered by the best-selling books Principles of GeneManipulation and Principles of Genome Analysis &Genomics. * Comprehensively revised, updated and rewritten to encompasswithin one volume, basic and advanced gene manipulation techniques,genome analysis, genomics, transcriptomics, proteomics andmetabolomics * Includes two new chapters on the applications of genomics * An accompanying website - provides instructional materials for both student and lectureruse, including multiple choice questions, related websites, and allthe artwork in a downloadable format. * An essential reference for upper level undergraduate andgraduate students of genetics, genomics, molecular biology andrecombinant DNA technology.

Sie lesen das E-Book in den Legimi-Apps auf:

von Legimi
zertifizierten E-Readern

Seitenzahl: 1747





Chapter 1 Gene manipulation in the post-genomics era


Gene manipulation involves the creation and cloning of recombinant DNA

The genomics era began in earnest in 1995 with the complete sequencing of a bacterial genome

Outline of the rest of the book

Part I Fundamental Techniques of Gene Manipulation

Chapter 2 Basic techniques


Three technical problems had to be solved before in vitro gene manipulation was possible on a routine basis

A number of basic techniques are common to most gene-cloning experiments

Gel electrophoresis is used to separate different nucleic acid molecules on the basis of their size

Blotting is used to transfer nucleic acids from gels to membranes for further analysis

Southern blotting is the method used to transfer DNA from agarose gels to membranes so that the compositional properties of the DNA can be analyzed

Northern blotting is a variant of Southern blotting that is used for RNA analysis

Western blotting is used to transfer proteins from acrylamide gels to membranes

A number of techniques have been devised to speed up and simplify the blotting process

The ability to transform E. coli with DNA is an essential prerequisite for most experiments on gene manipulation

Electroporation is a means of introducing DNA into cells without making them competent for transformation

The ability to transform organisms other than E. coli with recombinant DNA enables genes to be studied in different host backgrounds

The polymerase chain reaction (PCR) has revolutionized the way that biologists manipulate and analyze DNA

The principle of the PCR is exceedingly simple

RT-PCR enables the sequences on a mRNA molecule to be amplified as DNA

The basic PCR is not efficient at amplifying long DNA fragments

The success of a PCR experiment is very dependent on the choice of experimental variables

By using special instrumentation it is possible to make the PCR quantitative

There are a number of different ways of generating fluorescence in quantitative PCR reactions

It is now possible to amplify whole genomes as well as gene segments

Useful website

Chapter 3 Cutting and joining DNA molecules

Cutting DNA molecules

Joining DNA molecules

Chapter 4 Basic biology of plasmid and phage vectors

Plasmid biology and simple plasmid vectors

Bacteriophage λ

DNA cloning with single-stranded DNA vectors

Chapter 5 Cosmids, phasmids, and other advanced vectors


Vectors for cloning large fragments of DNA

Specialist-purpose vectors

Chapter 6 Gene-cloning strategies, 96 Introduction


Genomic DNA libraries are generated by fragmenting the genome and cloning overlapping fragments in vectors

The PCR can be used as an alternative to genomic DNA cloning

Complementary DNA (cDNA) libraries are generated by the reverse transcription of mRNA

The PCR can be used as an alternative to cDNA cloning

Many different strategies are available for library screening

Difference cloning exploits differences in the abundance of particular DNA fragments

Chapter 7 Sequencing genes and short stretches of DNA

Chapter 8 Changing genes: site-directed mutagenesis and protein engineering

Protein engineering

Chapter 9 Bioinformatics


Databases are required to store and cross-reference large biological datasets

Sequence analysis of genomic DNA involves the de novo identification of genes and other features

Caution must be exercised when using purely in silico methods to annotate genomes

Sequencing also provides new data for molecular phylogenetics

Part II Manipulating DNA in Microbes, Plants, and Animals

Chapter 10 Cloning in bacteria other than Escherichia coli

Cloning in Gram-negative bacteria other than E. coli

Cloning in Gram-positive bacteria

Cloning in Archaea

Chapter 11 Cloning in Saccharomyces cerevisiae and other fungi

Cloning and manipulating large fragments of DNA

Chapter 12 Gene transfer to animal cells, 218 Introduction


There are four major strategies for gene transfer to animal cells

There are several chemical transfection techniques for animal cells but all are based on similar principles

Physical transfection techniques have diverse mechanisms

Cells can be transfected with either replicating or non-replicating DNA

Three types of selectable marker have been developed for animal cells

Plasmid vectors for the transfection of animal cells contain modules from bacterial and animal genes

DNA can be delivered to animal cells using bacterial vectors

Chapter 13 Genetic manipulation of animals


Three major methods have been developed for the production of transgenic mice

ES cells can be used for gene targeting in mice

Applications of genetically modified mice

Applications of gene targeting

Standard transgenesis methods are more difficult to apply in other mammals and birds

Nuclear transfer technology can be used to clone animals

Gene transfer to Xenopus can result in transient expression or germline transformation

Gene transfer to fish is generally carried out by microinjection, but other methods are emerging

Gene transfer to fruit flies involves the microinjection of DNA into the pole plasma

Chapter 14 Gene transfer to plants


Plant tissue culture is required for most transformation procedures

There are four major strategies for gene transfer to plant cells

Agrobacterium-mediated transformation

Direct DNA transfer to plants

Gene targeting in plants

Inplanta transformation minimizes or eliminates the tissue culture steps usually needed for the generation of transgenic plants

Plant viruses can be used as episomal expression vectors

Chapter 15 Advanced transgenic technology


Inducible expression systems allow transgene expression to be controlled by physical stimuli or the application of small chemical modulators

Recombinant inducible systems are built from components that are not found in the host animal or plant

Site-specific recombination allows precise manipulation of the genome in organisms where gene targeting is inefficient

Many strategies for gene inactivation do not require the direct modification of the target gene

Gene inhibition is also possible at the protein level

Part III Genome Analysis, Genomics, and Beyond

Chapter 16 The organization and structure of genomes

The organization of nuclear DNA in eukaryotes

Chapter 17 Mapping and sequencing genomes

Sequencing genomes

Chapter 18 Comparative genomics

Comparative genomics of bacteria

Comparative genomics of organelles

Comparative genomics of eukaryotes

Chapter 19 Large-scale mutagenesis and interference


Genome-wide gene targeting is the systematic approach to large-scale mutagenesis

Genome-wide random mutagenesis is a strategy applicable to all organisms

Insertional mutagenesis in invertebrates

Libraries of knock-down phenocopies can be created by RNA interference

Chapter 20 Analysis of the transcriptome


The transcriptome is the collection of all messenger RNAs in the cell

Steady-state mRNA levels can be quantified directly by sequence sampling

DNA microarray technology allows the parallel analysis of thousands of genes on a convenient miniature device

As transcriptomics technology matures, standardization of data processing and presentation become important challenges

Expression profiling with DNA arrays has permeated almost every area of biology

Chapter 21 Proteomics I - Expression analysis and characterization of proteins


Protein expression analysis is more challenging than mRNA profiling because proteins cannot be amplified like nucleic acids

There are two major technologies for protein separation in proteomics

Mass spectrometry is used for protein characterization

Protein microarrays can also be used for expression analysis

Chapter 22 Proteomics II - Analysis of protein structures


Structural proteomics has required developments in structural analysis techniques and bioinformatics

International structural proteomics initiatives have been established to solve protein structures on a large scale

Chapter 23 Proteomics III - Protein interactions, 453 Introduction


Protein interactions can be inferred by a variety of genetic approaches

New methods based on comparative genomics can also infer protein interactions

Traditional biochemical methods for protein interaction analysis cannot be applied on a large scale

Library-based screening methods allow the large-scale analysis of binary interactions

Systematic analysis of protein complexes can be achieved by affinity purification and mass spectrometry

Interaction screening produces large data sets which require extensive bioinformatic support

Chapter 24 Metabolomics and global biochemical networks


Part IV Applications of Gene Manipulation and Genomics

Chapter 25 Applications of genomics: understanding the basis of polygenic disorders and identifying quantitative trait loci

Investigating discrete traits in outbreeding populations (genetic diseases of humans)

Investigating quantitative trait loci (QTLs) in inbred populations

Understanding responses to drugs (pharmacogenomics)

Chapter 26 Applications of recombinant DNA technology


Theme 1: Producing useful molecules

Theme 2: Improving agronomic traits by genetic modification

Theme 3: Using genetic modification to study, prevent, and cure disease


Appendix: the genetic code and single-letter amino acid designations


© 2006 Blackwell Publishing


350 Main Street, Malden, MA 02148-5020, USA

9600 Garsington Road, Oxford OX4 2DQ, UK

550 Swanston Street, Carlton, Victoria 3053, Australia

The rights of Sandy Primrose and Richard Twyman to be identified as the Authors of this Work have been asserted in accordance with the UK Copyright, Designs, and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs, and Patents Act 1988, without the prior permission of the publisher.

This material was originally published in two separate volumes: Principles of Gene Manipulation, 6th edition (2001) and Principles of Genetic Analysis and Genomics, 3rd edition (2003).

First published 1980Second edition published 1981Third edition published 1985Fourth edition published 1989Fifth edition published 1994Sixth edition published 2001Seventh edition published 2006

1 2006

Library of Congress Cataloging-in-Publication Data

Primrose, S.B.

Principles of gene manipulation and genomics / S.B. Primrose and R.M. Twyman.—7th ed.


Rev. ed. of: Principles of gene manipulation. 6th ed. 2001 and: Principles of genome analysis and genomics / Sandy B. Primrose, Richard M. Twyman. 3rd ed. 2003.

Includes bibliographical references and index.

ISBN 1-4051-3544-1 (pbk.: alk. paper) 1. Genetic engineering. 2. Genomics. 3. Gene mapping. 4. Nucleotide sequence.

[DNLM: 1. Genetic Engineering. 2. Base Sequence. 3. Chromosome Mapping. 4. DNA, Recombinant. 5. Genomics. QH 442 P952pa 2006] I. Twyman, Richard M. II. Primrose, S.B. Principles of gene manipulation. III. Primrose, S. B. Principles of genome analysis and genomics. IV. Title.

QH442.O42 2006



A catalogue record for this title is available from the British Library.

For further information onBlackwell Publishing, visit our


The first edition of Principles of Gene Manipulation was published over 25 years ago when the recombinant DNA era was in its infancy and the idea of sequencing the entire human genome was inconceivable. In writing the first edition, the aim was to explain a new and rapidly growing technology. The basic philosophy was to present the principles of gene manipulation, and its associated techniques, in sufficient detail to enable the non-specialist reader to understand them. However, as the techniques became more sophisticated and advanced, so the book grew in size and complexity. Eventually, recombinant DNA technology advanced to the stage where the sequencing and analysis of entire genomes became possible. This gave rise to a whole new biological discipline, known as genomics, with its own principles and associated techniques. From this emerged the first edition of another book, Principles of Genome Analysis, whose title changed to Principles of Genome Analysis andGenomics in its third edition to reflect the rapid growth of post-sequencing technologies aiming at the large-scale analysis of gene function. It is now five years since the draft human genome sequence was published and we are reaching the stage where the technologies of gene manipulation and genomics are becoming increasingly integrated. Genome mapping and sequencing technologies borrow extensively from the early recombinant DNA technologies of library construction, cloning, and amplification using the polymerase chain reaction; gene transfer to microbes, animals, and plants is now widely used for the functional analysis of genomes; and the applications of genomics and recombinant DNA are becoming difficult to separate.

This new edition, entitled Principles of Gene Manipulation and Genomics, therefore unites the themes covered formerly by the two separate books and provides for the first time a fully integrated approach to the principles and practice of gene manipulation in the context of the genomics era. As in previous editions of the two books, we have written the text at an advanced undergraduate level, assuming a basic knowledge of molecular biology and genetics but no knowledge of recombinant DNA technology or genomics. However, we are aware that the book is favored not only by newcomers to the field but also by experts, and we have tried to remain faithful to both audiences with our coverage. As before we have not changed the level at which the book is written nor the general style, but we have divided the book into sections to enable the book to be used in different ways by different readers.

The basic methodologies are presented in the first part of the book, which is devoted to cloning in Escherichia coli, while more advanced gene-transfer techniques (applying to other microbes and to animals and plants) are presented in the second part. The reader who has read and understood the material in the first part, or already knows it, should have no difficulty in understanding any of the material in the second part of the book. The third part moves from the basic gene-manipulation technologies to genomics, transcriptomics, proteomics, and metabolomics, the major branches of the high-throughput, large-scale biology that has become synonymous with the new millennium. Finally, the fourth part of the book contains two chapters that discuss how recombinant DNA technology and genomics are being applied in the fields of medicine, agriculture, diagnostics, forensics, and biotechnology.

In writing the first part of the book, we thought carefully about the inclusion of early “historical” information. Although older readers may feel that some of this material is dated, we elected to leave much of it in place because it has an important bearing on today’s methods and an understanding of it is incorrectly assumed in many of today’s publications. We have included such information where it illustrates how modern techniques and procedures have evolved, but we have tried not to catalog outmoded or redundant methods that are no longer used. This is particularly the case in the genomics section where new technologies seem to come and go every day, and few stand the test of time or become truly indispensable. We have aimed to avoid as much jargon as possible, and to explain it clearly where it is absolutely necessary. As is common in all areas of science, the principles of gene manipulation and genomics abound with acronyms and synonyms which are often confusing particularly now molecular biology is becoming increasingly commercial in both basic research and its applications. Where appropriate, we have provided lists of definitions as boxes set aside from the text. Boxes are also used to illustrate key experiments or principles, historical information, and applications. While the text is fully referenced throughout, we have also provided a list of classic papers and reviews at the end of each chapter to ease the wary reader into the scientific literature.

This book would not have been possible without the help and advice of many colleagues. Particular thanks are due to Sue Goddard and her library staff at HPA Porton for assistance with many literature searches. Sandy Primrose would like to dedicate this book to his wife Jill and Richard Twyman would like to dedicate this book to his parents, Irene and Peter, to his children Emily and Lucy, and to Liz for her endless support and encouragement.



two-dimensional gel electrophoresis




adsorption, distribution, metabolism and excretion


affected family-based control


amplified fragment length polymorphism


acute lymphoblastic leukemia


acute myeloid leukemia


avian myeloblastosis virus


acute promyelocytic leukemia


autonomously replicating sequence



-retinoic acid


bacterial artificial chromosome


Bacille Calmette–Guérin


basic fibroblast growth factor


Biomolecular Interaction Network Database


Basic Local Alignment Search Tool


Blocks Substitution Matrix


bone morphogenetic protein


base pair


bioluminescence resonance energy transfer


cleavable amplified polymorphic sequences


Critical Assessment of Structural Prediction


Class, Architecture, Topology and Homologous superfamily (database)

ccc DNA

covalently closed circular DNA


charge couple device


circular dichroism


complementary DNA


Centre d’ Etude du Polymorphisme Humain


commonly forming unit


contour-clamped homogeneous electrical field


chemically induced dimerization Also: collision-induced dissociation




cluster of orthologous groups




complementary RNA


chromosome segment substitution line




direct analysis of large protein complexes


distributed annotation system


downstream activation site




DNA Databank of Japan


Database of Interacting Proteins


Duchenne muscular dystrophy


deoxyribonucleic acid


deoxynucleoside triphosphate




double-stranded DNA


double-stranded RNA


epidermal growth factor


enzyme-linked immunosorbent sandwich assay


European Molecular Biology Laboratory




efficiency of plating


embryonic stem (cells)


electrospray ionization


expressed sequence tag


European Functional Analysis Network (consortium)


fluorescence-activated cell sorting


flap endonuclease


Fialuridine (1–2′-deoxy-2′-fluoro-β-d-arabinofuranosyl-5-iodouracil


field-inversion gel electrophoresis




situ hybridization


fingerprinted contigs


fluorescence resonance energy transfer


Fold classification based on Structure–Structure alignment of Proteins (database)


Genome Annotation aSsessment Project


granulocyte colony stimulating factor


gene external marker-based automatic congruencing


German Gene Trap Consortium


gene trap sequence tag






hypoxanthine, aminopterin and thymidine


high-density lipoprotein


human endogenous retrovirus


Human Genome Project


human leukocyte antigen


hypoxanthine phosphoribosyltransferase



II tiny fragment


haplotype tag single nucleotide polymorphism


identical by descent


isotope-coded affinity tag


interaction defective allele


isoelectric focusing


Indian hedgehog




interaction sequence tag


incremental truncation for the creation of hybrid enzymes


in vivo

expression technology




low complexity region


linkage disequilibrium


long interspersed nuclear element




of odds


long terminal repeat

m: z

mass : charge ratio


multiwavelength anomalous diffraction


microarray and gene expression


microarray and gene expression mark-up language


microarray and gene expression object model


matrix assisted laser desorption ionization


matrix attachment region




mass coded abundance tag


multiple cloning site


multiple displacement amplification


Microarray Gene Expression Database


major histocompatibility complex


minimum information about a microarray experiment


molecularly imprinted polymer


Munich Information Center for Protein Sequences


‘mismatch’ oligonucleotide


mouse mammary tumor virus


massively parallel signature sequencing


messenger RNA


mass spectrometry


tandem mass spectroscopy




Maize Targeted Mutagenesis project




multidimensional protein identification technology


Moloney murine leukemia virus


National Center for Biotechnology Information


Nucleic Acid Databank


nerve growth factor


National Institute of General Medical Sciences


near isogenic line


nuclear magnetic resonance


nuclear Overhauser effect


NOE spectroscopy



oc DNA

open circular DNA


orthogonal-field-alternation gel electrophoresis


on-line Mendelian inheritance in man


open-reading frame


orphan open-reading frame


presence/absence polymorphism


P1-derived artificial chromosome


polyacrylaminde gel electrophoresis


pathogenicity island


percentage of accepted point mutations


polymerase chain reaction


Protein Databank (database)


Protein families database of alignments


pulsed field gel electrophoresis


‘perfect match’ oligonucleotide





protein quantity loci



in situ


position shift polymorphism


Position-Specific Iterated BLAST (software)


post-transcriptional gene silencing


polyvinylidine difluoride


quantitative trait loci


rapid amplification of cDNA ends


recombinase-activated gene expression


randomly amplified polymorphic DNA


RecA-assisted restriction endonuclease


recombinant congenic (strains)


rolling circle amplification


Research Collaboratory for Structural Bioinformatics


ribosomal DNA/RNA


restriction enzyme-mediated integration


restriction fragment length polymorphism


recombinant inbred line




ribonucleic acid


RNA interference




reverse phase microcapillary liquid chromatography


Ras recruitment system


reverse transcriptase polymerase chain reaction


repeats in toxins


serial analysis of gene expression


Structural Classification of Proteins (database)


structure-based combinatorial protein engineering


sodium dodecyl sulfate


surface-enhanced laser desorption and ionization


synthetic genetic array



Gene Deletion Project


sonic hedgehog


stable-isotope labeling with amino acids in cell culture


short interspersed nuclear element


sequenced insertion sites


sequence-independent site-directed chimeragenesis


single nucleotide polymorphism


Surface Properties of protein–protein Interfaces (database)




surface plasmon resonance


synchrotron radiation circular dichroism


sequence retrieval system


SOS recruitment system


simple sequence length polymorphism


simple sequence repeat


sequence-tagged connector


signature-tagged mutagenesis


sequence-tagged site


transformation-competent artificial chromosome


transversely alternating-field electrophoresis


tandem affinity purification


transformation-associated recombination



transfer DNA


The Institute for Genomic Research


triose phosphate isomerase


time of flight


transfer RNA


Trait Utility System for Corn


upstream activation site


universal protein array


upstream repression site


ubiquitin-based split protein sensor


untranslated region


variant detector array


virus-induced gene silencing


whole-genome amplification


yeast two-hybrid


yeast artificial chromosome


yeast centromere plasmid


yeast episomal plasmid


yeast integrating plasmid


yeast replicating plasmid


Gene manipulation in the post-genomics era


Since the beginning of the last century, scientists have been interested in genes. First, they wanted to find out what genes were made of, how they worked, and how they were transmitted from generation to generation with the seemingly mythic ability to control both heredity and variation. Genes were initially thought of in functional terms as hereditary units responsible for the appearance of particular biological characteristics, such as eye or hair color in human beings, but their physical properties were unclear. It was not until the 1940s that genes were shown to be made of DNA, and that a workable physical and functional definition of the gene – a length of DNA encoding a particular protein – was achieved (Box 1.1). Next, scientists wanted to find ways to study the structure, behavior, and activity of genes in more detail. This required the simultaneous development of novel techniques for DNA analysis and manipulation. These developments began in the early 1970s with the first experiments involving the creation and manipulation of recombinant DNA. Thus began the recombinant DNA revolution.

Gene manipulation involves the creation and cloning of recombinant DNA

The definition of recombinant DNA is any artificially created DNA molecule which brings together DNA sequences that are not usually found together in nature. Gene manipulation refers to any of a variety of sophisticated techniques for the creation of recombinant DNA and, in many cases, its subsequent introduction into living cells. In the developed world there is a precise legal definition of gene manipulation as a result of government legislation to control it. In the UK, for example, gene manipulation is defined as: “… the formation of new combinations of heritable material by the insertion of nucleic acid molecules,produced by whatever means outside the cell, into any virus, bacterial plasmid or other vector system so as to allow their incorporation into a host organism in which they do not naturally occur but in which they are capable of continued propagation.” The propagation of recombinant DNA inside a particular host cell so that many copies of the same sequence are produced is known as cloning.

Cloning was a significant breakthrough in molecular biology because it became possible to obtain homogeneous preparations of any desired DNA molecule in amounts suitable for laboratory-scale experiments. A single organism, the bacterium Escherichia coli, played the dominant role in the early years of the recombinant DNA era. This bacterium had always been a popular model system for molecular geneticists and, prior to the development of recombinant DNA technology, there were already a large number of well-characterized mutants, gene regulation was understood, and many plasmids had been isolated. It is not surprising that the first cloning experiments were undertaken in E. coli and that this organism became the primary cloning host. Subsequently, cloning techniques were extended to a range of other microorganisms, such as Bacillus subtilis, Pseudomonas spp., yeasts, and filamentous fungi, and then to higher eukaryotes. Despite these advances, E. coli remains the most widely used cloning host even today because gene manipulation in this bacterium is technically easier than in any other organism. As a result, it is unusual for researchers to clone DNA directly in other organisms. Rather, DNA from the organism of choice is first manipulated in E. coli and subsequently transferred back to the original host or another organism, as appropriate. Without the ability to clone and manipulate DNA in E. coli, the application of recombinant DNA technology to other organisms would be greatly hindered.

Until the mid-1980s, all cloning was cell-based (i.e. the DNA molecule of interest had to be introduced into E. coli or another host for amplification).

Box 1.1What is a gene?
The concept of the gene as a unit of hereditary information was introduced by the Austrian monk Gregor Mendel in an 1866 paper entitled ‘Experiments in plant hybridization’. In this paper, he detailed the results of numerous crosses between pea plants of different characteristics, and from these data put forward a number of postulates concerning the principles of heredity.
Although Mendel introduced the concept, the word gene was not used until 25 years after his death. It was coined by Wilhelm Johansen in 1909 to describe a heritable factor responsible for the transmission and expression of a given biological trait. In Mendel’s work, published over 40 years earlier, these hereditary factors were given the rather less catchy name Formbildungelementen (form-building elements).
Mendel had no clear idea what his hereditary elements consisted of in a physical sense, and described them as purely mathematical entities. The first evidence as to the physical and functional nature of genes emerged in 1902. In this year, the chromosome theory of inheritance was put forward by William Sutton, after he noticed that chromosomes during meiosis behaved in the same way as Mendel’s elements. Also in 1902, Archibald Garrod showed that the metabolic disorder alkaptonurea resulted from the failure of a specific enzyme and could be transmitted in an autosomal recessive fashion. This he called an inborn error of metabolism. This was the first evidence that genes were necessary to make proteins. In 1911, Thomas Hunt Morgan and colleagues performed the first genetic linkage experiments in the fruit fly Drosophila melanogaster, and hence showed that genes were located on chromosomes and were physically linked together.
A more precise idea of the physical and functional basis for the gene emerged during the Second World War. In 1942, George Beadle and Edward Tatum found that X-ray-induced mutations in fungi often caused specific biochemical defects, reflecting the absence or malfunction of a single enzyme. This led to the one gene one enzyme model of gene function. In 1944, Oswald Avery and colleagues showed that DNA was the genetic material. Thus evolved a simple picture of the gene – a length of DNA in a chromosome which encoded the information required to produce a single enzyme.
This definition had to be expanded in the following years to encompass new discoveries. For example, not all genes encode enzymes: many encode proteins with other functions, and some do not encode proteins at all, but produce functional RNA molecules. Further complexity results from the selective use of information in the gene to generate multiple products. In eukaryotes, this often reflects alternative splicing, but in both prokaryotes and eukaryotes multiple gene products can be generated by alternative promoter or polyadenylation site usage. In more obscure cases, two or more genes may be required to generate a single polypeptide, e.g. the rare phenomenon of trans-splicing.

In 1983, there was a further mini-revolution in molecular biology with the invention of the polymerase chain reaction (PCR). This technique allowed DNA sequences to be amplified in vitro using pure enzymes. The great sensitivity and robustness of the PCR allows DNA to be prepared rapidly from very small amounts of starting material and material of very poor quality, but it is not as accurate as cell-based cloning and only works on relatively short DNA sequences. Therefore cell-based cloning and the PCR have complementary but overlapping uses in gene manipulation.

Although the initial cloning experiments generated a great deal of excitement, it is unlikely that any of the early workers in this field could have predicted the immense impact recombinant DNA technology would have on the progress of scientific understanding and indeed on society as a whole, particularly in the fields of medicine and agriculture. Today, gene manipulation underlies a multi-billion dollar industry, employing hundreds of thousands of people worldwide and offering solutions to some of mankind’s most intractable problems. The ability to insert new combinations of genetic material into microbes, animals, and plants offers novel ways to produce valuable small molecules and proteins; provides the means to produce plants and animals that are disease-resistant, tolerant of harsh environments, and have higher yields of useful products; and provides new methods to treat and prevent human disease.

Fig. 1.1 The impact of gene manipulation on the practice of medicine.

Recombinant DNA has opened new horizons in medicine

The developments in gene manipulation that have taken place in the last 30 years have revolutionized medicine by increasing our understanding of the basis of disease, providing new tools for disease diagnosis, and opening the way to the discovery or development of new drugs, treatments, and vaccines.

The first medical benefit to arise from recombinant DNA technology was the availability of significant quantities of therapeutic proteins, such as human growth hormone (HGH), which is used to treat growth defects. Originally HGH was purified from pituitary glands removed from cadavers. However, many pituitary glands are required to produce enough HGH to treat just one child. Furthermore, some children treated with pituitary-derived HGH have developed Creutzfeld–Jakob syndrome originating from cadavers. Following the cloning and expression of the HGH gene in E. coli, it became possible to produce enough HGH in a 10-liter fermenter to treat hundreds of children. Since then, many different therapeutic proteins have become available for the first time. Many of these proteins are also manufactured in E. coli but others are made in yeast or animal cells and some in plants or the milk of genetically modified animals. The only common factor is that the relevant gene has been cloned and overexpressed using the techniques of gene manipulation.

Medicine has benefited from recombinant DNA technology in other ways (Fig. 1.1). For example, novel routes to vaccines have been developed: the current hepatitis B vaccine is produced by the expression of a viral antigen on the surface of yeast cells, and a recombinant vaccine has been used to eliminate rabies from foxes in a large part of Europe. Gene manipulation can also be used to increase the levels of small molecules within microbial or plant cells. This can be done by cloning all the genes for a particular biosynthetic pathway and overexpressing them. Alternatively, it is possible to shut down particular metabolic pathways and thus redirect intermediates towards the desired end product. This approach has been used to facilitate production of chiral intermediates, antibiotics, and novel therapeutic entities. New antibiotics can also be created by mixing and matching genes from organisms producing different but related molecules in a technique known as combinatorial biosynthesis.

Gene cloning enables nucleic acid probes to be produced readily, and such probes have many uses in medicine. For example, they can be used to determine or confirm the identity of a microbial pathogen or to carry out pre- or peri-natal diagnosis of an inherited genetic disease. Increasingly, probes are being used to determine the likelihood of adverse reactions to drugs or to select the best class of drug to treat a particular illness in different groups of patients. Nucleic acids are also being used as therapeutic entities in their own right. For example, antisense nucleic acids are being used to downregulate gene expression in certain diseases, and the relatively new phenomenon of RNA interference is poised to become a breakthrough technology for the development of new therapeutic approaches. In other cases, nucleic acids are being administered to correct or repair inherited gene defects (gene therapy, gene repair) or as vaccines. In the reverse of gene repair, animals are being generated that have mutations identical to those found in human disease. These are being used as models to learn more about disease pathology and to test novel therapies.

Mapping and sequencing technologies formed a crucial link between gene manipulation and genomics

As well as techniques for DNA cloning and transfer to new host cells, the recombinant DNA revolution spawned new technologies for gene mapping (ordering genes on chromosomes) and DNA sequencing (determining the order of bases, identified by the letters A, C, G, and T, along the DNA molecule). Within the gene itself, the order of bases determines the protein encoded by the gene by specifying the order of amino acids. Thus, DNA sequencing made it possible to work out the amino acid sequence of the encoded protein without the direct analysis of the protein itself. This was extremely useful because, at the time DNA sequencing was first developed, only the most abundant proteins in the cell could be purified in sufficient quantities to facilitate direct analysis. Further elements surrounding the coding region of the gene were identified as control regions, specifying each gene’s expression profile. As more sequence data accumulated, it became possible to identify common features in related genes, both in the coding region and the regulatory regions. This type of sequence analysis was greatly facilitated by the foundation of sequence databases, and the development of computer-aided techniques for sequence analysis and comparison, a field now known as bioinformatics. Today, DNA molecules can be scanned quickly for a whole series of structural features, e.g. restriction enzyme recognition sites, matches or overlaps with other sequences, start and stop signals for transcription and translation, and sequence repeats, using programs available on the Internet.

The original goal of sequencing was to determine the precise order of nucleotides in a gene, but soon the goal became the sequence of a small genome. A genome is the complete content of genetic information in an organism, i.e. all the genes and other sequences it contains. The first target was the genome of a small virus called ϕX174, then larger plasmid and viral genomes, then chromosomes and microbial genomes until ultimately the complete genomes of higher eukaryotes were sequenced (Table 1.1). In the mid-1980s, scientists began to discuss seriously how the entire human genome might be sequenced. To put these discussions in context, the largest stretch of DNA that can be sequenced in a single pass (even today) is 600–800 nucleotides and the largest genome that had been sequenced in 1985 was that of the 172-kb Epstein–Barr virus (Baer et al. 1984). By comparison, the human genome is 3000 Mb in size, over 17,000 times bigger! One school of thought was that a completely new sequencing methodology would be required, and a number of different technologies were explored but with little success. Early on, however, it was realized that existing sequencing technology could be used if a large genome could be broken down into more manageable pieces for sequencing in a highly parallel fashion, and then the pieces could be joined together again. A strategy was agreed upon in which a map of the human genome would be used as a scaffold to assemble the sequence.

Table 1.1 Timeline of genome sequencing, showing the increasing genome sizes that have been achieved.

The problem here was that in 1985 there were not enough markers, or points of reference, on the human genome map to produce a physical scaffold on which to assemble the complete sequence. Genetic maps are based on recombination frequencies, and in model organisms they are constructed by carrying out large-scale crosses between different mutant strains. The principle of a genetic map is that the further apart two loci are on a chromosome, the more likely that a crossover will occur between them during meiosis. Recombination events resulting from crossovers can be scored in genetically amenable organisms such as the fruit fly Drosophila melanogaster and yeast by looking for new combinations of the mutant phenotypes in the offspring of the cross. This approach cannot be used in human populations because it would involve setting up large-scale matings between people with different inherited diseases. Instead, human genetic maps rely on the analysis of DNA sequence polymorphisms, i.e. naturally occurring DNA sequence differences in the population which do not have an overt, debilitating effect. A major breakthrough was the development of methods for using DNA probes to identify polymorphic sequences (Botstein et al. 1980).

Prior to the Human Genome Project (HGP), low-resolution genetic maps had been constructed using restriction fragment length polymorphisms (RFLPs). These are naturally occurring variations that create or destroy sites for restriction enzymes and therefore generate different sized bands on Southern blots (Fig. 1.2). The Southern blot is a technique for separating DNA fragments by size, see Fig. 2.6, p. 23. The problem with RFLPs was that they were too few and too widely spaced to be of much use for constructing a framework for physical mapping – the first RFLP map had just over 400 markers and a resolution of 10 cM, equivalent to one marker for every 10 Mb of DNA (Donis-Keller et al. 1987). The necessary breakthrough came with the discovery of new polymorphic markers, known as microsatellites, which were abundant and widely dispersed in the genome (Fig. 1.3). By 1992, a genetic map based on microsatellites had been constructed with a resolution of 1 cM (equivalent to one marker for every 1 Mb of DNA) which was a suitable template for physical mapping.

Fig. 1.2 Restriction fragment length polymorphisms (RFLPs) are sequence variants that create or destroy a restriction site in DNA therefore altering the length of the restriction fragment that is detected. The top panel shows two alternative alleles, in which the restriction fragment detected by a specific probe differs in length due to the presence or absence of the middle of three restriction sites (represented by vertical arrows). Alleles a and b therefore produce hybridizing bands of different sizes in Southern blots (lower panel). This allows the alleles to be traced through a family pedigree. For example child II.2 has inherited two copies of allele a, one from each parent, while child II.4 has inherited one copy of allele a and one copy of allele b.

Unlike genetic maps, physical maps are based on real units of DNA and therefore provide a basis for sequence assembly. The physical mapping phase of the HGP involved the creation of genomic DNA libraries and the identification and assembly of overlapping clones to form contigs (unbroken series of clones representing contiguous segments of the genome). When the HGP was initiated, the highest-capacity vectors available for cloning were cosmids, with a maximum insert size of 40 kb. Because hundreds of thousands of cosmid clones would have to be screened to assemble a physical map, the HGP would not have progressed very quickly without the development of novel high-capacity vectors and methods to find overlaps between them so that clone contigs could be assembled on the genomic scaffold.

Fig. 1.3 Microsatellites are sequence variants that cause restriction fragments or PCR products to differ in length due to the number of copies of a short tandem repeat sequence, 1–12 nt in length. The top panel shows four alternative alleles, in which the restriction fragment detected by a specific probe differs in length due to a variable number of tandem repeats. All four alleles produce bands of different sizes on Southern blots (lower panel) or different sized PCR products (not shown). Unlike RFLPs, multiple allelism is common for microsatellites so the precise inheritance pattern in a family pedigree can be tracked. For example, the mother and father in the pedigree have alleles b/d and a/c, respectively (the smaller DNA fragments move further during electrophoresis). The first child, II.1, has inherited allele b from his mother and allele a from his father.

The genomics era began in earnest in 1995 with the complete sequencing of a bacterial genome

The late 1980s and early 1990s saw much debate about the desirability of sequencing the human genome. This debate often strayed from rational scientific debate into the realms of politics, personalities, and egos. Among the genuine issues raised were questions such as:

Is the sequencing of the human genome an intellectually appropriate project for biologists?

Is sequencing the human genome feasible?

What benefits might arise from the project?

Will these benefits justify the cost and are there alternative ways of achieving the same benefits?

Will the project compete with other areas of biology for funding and intellectual resources?

Behind the debate was a fear that sequencing the human genome was an end in itself, much like a mountaineer who climbs a new peak just because it is there.

The publicly funded Human Genome Project was officially launched in 1990, and the scientific community began to develop new strategies to enable the large-scale mapping and sequencing that were required to complete the project, strategies which centered around high-throughput, highly parallel automated sequencing. One of the benefits of this new technology development was the completion of several pilot genome projects, beginning with that of the bacterium Hemophilus influenzae (Fleischmann et al. 1995). The net effect was that by the time the human genome had been sequenced (International Human Genome Sequencing Consortium 2001, Venter et al. 2001), the complete sequence was already known for over 30 bacterial genomes plus that of a yeast (Saccharomyces cerevisiae), the fruit fly, a nematode (Caenorhabditis elegans), and a plant (Arabidopsis thaliana).

Parallel developments in the field of bioinformatics were required to handle and analyze the exponentially increasing amounts of sequence data arising from the genome projects, but bioinformatics also facilitated the development of new sequencing strategies. For example, when a European consortium set itself the goal of sequencing the entire genome of the budding yeast S. cerevisiae (15 Mb), they segmented the task by allocating the sequencing of each chromosome to different groups. That is, they subdivided the genome into more manageable parts. At the time this project was initiated there was no other way of achieving the objective and when the resulting genomic sequence was published (Goffeau et al. 1996), it was the result of a unique multi-institution collaboration. While the S. cerevisiae sequencing project was underway, a new genomic sequencing strategy was unveiled: shotgun sequencing. In this approach, large numbers of genomic fragments are sequenced and sophisticated bioinformatics algorithms used to construct the finished sequence. In contrast to the consortium approach used with S. cerevisiae, a single laboratory set up as a sequencing factory undertook shotgun sequencing.

The first success with shotgun sequencing was the complete sequence of the bacterium H. influenzae (Fleischmann et al. 1995) and this was quickly followed with the sequences of Mycoplasmagenitalium (Fraser et al. 1995), Mycoplasma pneumoniae (Himmelreich et al. 1996) and Methanococcus jannaschii (Bult et al. 1996). It should be noted that H. influenzae was selected for sequencing because so little was known about it: there was no genetic map and not much biochemical data either. By contrast, S. cerevisiae was a well-mapped and well-characterized organism. As will be seen in Chapter 17, the relative merits of shotgun sequencing vs. ordered, map-based sequencing are still being debated today. Nevertheless, the fact that a major sequencing laboratory can turn out the entire sequence of a bacterium in 1–2 months shows the power of shotgun sequencing.

Genome sequencing greatly increases our understanding of basic biology

Fears that sequencing the human genome would be an end in itself have proved groundless. Because so many different genomes have been sequenced it is now possible to undertake comparative analyses of genomes, a topic known as comparative genomics. By comparing genomes from distantly related species we can begin to decipher the major stages in evolution. By comparing more closely related species we can begin to uncover more recent events such as genome rearrangement which have facilitated speciation (see e.g. Murphy et al. 2004). Currently, the most fertile area of comparative genomics is the analysis of bacterial genomes because so many have been sequenced. Already this analysis is throwing up some interesting questions. For example, over 25% of the genes in any one bacterial genome have no equivalents in any other sequenced genome. Is this an artifact resulting from limited sequence data or does it reflect the unique evolutionary events that have shaped the genomes of these organisms? Similarly, comparative analysis of the genomes of a wide range of thermophiles has revealed numerous interesting features, including strong evidence of extensive horizontal gene transfer. However, what is the genomic basis for thermophily? We still do not know.

One of the fascinating aspects of the classic paper by Fleischmann et al. (1995) was their analysis of the metabolic capabilities of H. influenzae, which they deduced from sequence information alone. This analysis has been extended to every other sequenced genome and is providing tremendous insight into the physiology and ecological adaptability of different organisms. For example, obligate parasitism in bacteria is linked to the absence of genes for certain enzymes involved in central metabolic pathways. Another example is the correlation between genome size and the diversity of ecological niches that can be colonized. The larger the bacterial genome, the greater are the metabolic capabilities of the host organism and this means that the organism can be found in a greater number of habitats.

Another benefit of genome mapping and sequencing that deserves mention is the proliferation of international scientific collaborations. In magnitude, the goal of sequencing the human genome was equivalent to putting a man on the moon. However, putting a man on the moon was a race between two nations and was driven by global political ambitions as much as by scientific challenge. By contrast, genome sequencing truly has been an international effort requiring laboratories in Europe, North America, and Japan to collaborate in a way never seen before. The extent of this collaboration can be seen by looking at the affiliations of the authors on many of the classic genome papers (e.g. The Arabidopsis Genome Initiative 2000, International Human Genome Sequencing Consortium 2001). The fact that one US company, Celera Genomics Inc., has successfully undertaken many sequencing projects in no way diminishes this collaborative effort. Rather, they have constantly challenged the accepted way of doing things and have increased the efficiency with which key tasks have been undertaken.

Three other aspects of genome sequencing and genomics deserve mention. First, in other branches of science such as nuclear physics and space exploration, the concept of “superfacilities” is well established. With the advent of whole genome sequencing, biology is moving into the superfacility league and a number of sequencing “factories” have been established. Secondly, high throughput methodologies have become commonplace and this has meant a partnering of biology with automation, instrumentation, and data management. Thirdly, many biologists have eschewed chemistry, physics, and mathematics but progress in genomics demands that biologists have a much greater understanding of these subjects. For example, methodologies such as mass spectrometry, X-ray crystallography, and protein structure modeling are now fundamental to the identification of gene function. The impact that this has on undergraduate recruitment in the sciences remains to be seen.

The post-genomics era aims at the complete characterization of cells at all levels

Knowing the complete genome sequence of any organism is very useful, but more important is finding the genes and determining their functions. One of the most surprising results from the early genome projects was the discovery of how little was known about even the best-characterized organisms. In the case of the bakers’ yeast (S. cerevisiae), which was considered a very well-characterized model species, only one-third of the genes identified in the sequencing project had been identified before. Over 4000 genes were discovered with no known function. Some of these could be assigned tentative functions on the basis of similarity to known genes either in the yeast or in other organisms, but this still left over 2000 genes whose function could only be established by direct experiments.

Following sequencing and annotation (gene finding) scientists then turned their attention to the functional characterization of newly identified genes. This has given rise to two new branches of biology, completely unheard of before 1995. These are transcriptomics (the large-scale study of mRNA expression) and proteomics (the large-scale study of proteins). While mRNA can yield useful information in terms of sequence, expression profile, and abundance, direct analysis of proteins is much more informative, since proteins can be analyzed not only in terms of sequence and abundance but also in terms of structure, post-translational modification, localization, and interactions with other molecules. No-one working in the 1970s, when recombinant DNA was a novel technology and protein analysis was laborious, could have imagined today’s large-scale experiments, where thousands of proteins can be separated on a high-resolution gel, digested into peptides, and identified rapidly by mass spectrometry. In the post-genomics era, it is becoming possible to carry out complete characterizations of cells, at the level of the genome, the transcriptome, the proteome, and now even the metabolome (the global profile of small-molecule metabolites in the cell).

Recombinant DNA technology and genomics form the foundation of the biotechnology industry

The early successes in overproducing mammalian proteins in E. coli suggested to a few entrepreneurial individuals that a new company should be formed to exploit the potential of recombinant DNA technology. Thus was Genentech Inc. born (Box 1.2). Since then, thousands of biotechnology companies have been formed worldwide. As soon as major new developments in the science of gene manipulation are reported, a rash of new companies is formed to commercialize the new technology. For example, many recently formed companies are hoping the data from the Human Genome Project will result in the identification of a large number of new proteins with potential for human therapy. Other companies have been founded to exploit novel technologies for recombinant protein expression or the applications of therapeutic nucleic acids.

Although there are thousands of biotechnology companies, fewer than 100 have sales of their products and even fewer are profitable. Already many biotechnology companies have failed, but the technology advances at such a rate that there is no shortage of new company start-ups to take their place. One group of biotechnology companies that has prospered is those supplying specialist reagents to laboratory workers engaged in gene manipulation, genomics, and proteomics. In the very beginning, researchers had to make their own restriction enzymes and this limited the technology to those with protein chemistry skills. Soon a number of companies were formed which catered to the needs of researchers by supplying high-quality enzymes for DNA manipulation. Despite the availability of these enzymes, many people had great difficulty in cloning DNA. The reason for this was the need for careful quality control of all the components used in the preparation of reagents, something researchers are not good at! The supply companies responded by making easy-to-use cloning kits in addition to enzymes. Today, these supply companies can provide almost everything that is needed to clone, express, and analyze DNA and have thereby accelerated the use of recombinant DNA technology in all biological disciplines. In the early days of recombinant DNA technology, the development of methodology was an end in itself for many academic researchers. This is no longer true. The researchers have gone back to using the tools to further our knowledge of biology, and the development of new methodologies has largely fallen to the supply companies.

Outline of the rest of the book

The remainder of this book is divided into four parts. Part I is devoted to the basic methodology for manipulating genes, and covers techniques for cloning and gene manipulation in E. coli as well as in vitro methods such as the PCR (Fig. 1.4). Basic techniques for gene and protein analysis are also described. Chapter 2 covers many of the techniques that are common to all cloning experiments and are fundamental to the success of the technology. Chapter 3 is devoted to methods for selectively cutting DNA molecules into fragments that can be readily joined together again. Without the ability to do this, there would be no recombinant DNA technology. If fragments of DNA are inserted into cells, they fail to replicate except in those rare cases where they integrate into the chromosome. To enable such fragments to be propagated, they are inserted into DNA molecules (vectors) that are capable of extrachromosomal replication. These vectors are derived from plasmids and bacteriophages and their basic properties are described in Chapter 4.

Box 1.2The birth of an industry
Biotechnology is not new. Cheese, bread, and yogurt are products of biotechnology and have been known for centuries. However, the stock-market excitement about biotechnology stems from the potential of gene manipulation, which is the subject of this book. The birth of this modern version of biotechnology can be traced to the founding of the company Genentech.
In 1976, a 27-year-old venture capitalist called Robert Swanson had a discussion over a few beers with a University of California professor, Herb Boyer. The discussion centered on the commercial potential of gene manipulation. Swanson’s enthusiasm for the technology and his faith in it were contagious. By the close of the meeting the decision was taken to found Genentech (Genetic Engineering Technology). Although Swanson and Boyer faced skepticism from both the academic and business communities they forged ahead with their idea. Successes came thick and fast (see Table B1.1) and within a few years they had proved their detractors wrong. Over 1000 biotechnology companies have been set up in the USA alone since the founding of Genentech but very, very few have been as successful.

Table B1.1 Key events at Genentech.

1976Genentech founded1977Genentech produced first human protein (somatostatin) in a microorganism1978Human insulin cloned by Genentech scientists1979Human growth hormone cloned by Genentech scientists1980Genentech went public, raising $35 million1982First recombinant DNA drug (human insulin) marketed (Genentech product licensed to Eli Lilly & Co.)1984First laboratory production of factor VIII for therapy of hemophilia. License granted to Cutter Biological1985Genentech launched its first product, Protropin (human growth hormone), for growth hormone deficiency in children1987Genentech launched Activase (tissue plasminogen activator) for dissolving blood clots in heart-attack patients1990Genentech launched Actimmune (interferon-γ1β) for treatment of chronic granulomatous disease1990Genentech and the Swiss pharmaceutical company Roche complete a $2.1 billion merger

Originally, the purpose of vectors was the propagation of cloned DNA but today vectors fulfil many other roles, such as facilitating DNA sequencing, promoting expression of cloned genes, facilitating purification of cloned gene products, and reporting the activity and localization of proteins. The specialist vectors for these tasks are described in Chapter 5. With this background in place it is possible to describe in detail how to clone the particular DNA sequences that one wants. There are two basic strategies. Either one clones all the DNA from an organism and then selects the very small number of clones of interest or one amplifies the DNA sequences of interest and then clones these. Both these strategies are described in Chapter 6, which focuses on methods for cloning individual genes. Once the DNA of interest has been cloned, it can be sequenced and this will yield information on the proteins that are encoded and any regulatory signals that are present (Chapter 7). There might also be a wish to modify the DNA and/or protein sequence and determine the biological effects of such changes. The techniques for sequencing and changing cloned genes and the properties of the encoded protein are described in Chapter 8. Finally, Chapter 9 provides an overview of bioinformatics, the essential computer-based methods for the analysis of genes and their products.

Fig. 1.4 Roadmap outlining the first section of the book, which covers basic techniques in gene manipulation and their relationships.

Part II of the book describes the specialist techniques for cloning in organisms other than E. coli (Fig. 1.5). Each of these chapters can be read in isolation from the other chapters in this section provided that there is a thorough understanding of the material from the first part of the book. Chapter 10 details the methods for cloning in other bacteria. Originally it was thought that some of these bacteria, e.g. B. subtilis, would usurp the position of E. coli. This has not happened and gene manipulation techniques are used simply to better understand the biology of these bacteria. Chapter 11 focuses on cloning in fungi, although the emphasis is on the yeast S. cerevisiae. Fungi are eukaryotes and are useful model systems for investigating topics such as meiosis, mitosis, and the control of cell division. Animal cells can be cultured like microorganisms and the techniques for introducing genes into them are described in Chapter 12. Chapters 13 and 14 describe basic procedures for the introduction of genes into animals and plants, respectively, while Chapter 15 covers some of the more cutting-edge techniques for these same systems.

Part III of the book moves from gene manipulation to genomics (Fig. 1.6). Chapter 16 introduces the topic of genomics by providing a biological survey of genomes. The genomes of free-living cellular organisms range in size from less than 1 Mb for some bacteria to millions, or tens of millions, of megabases for some plants. The sheer size of the genome of even a simple bacterium is such that to handle it in the laboratory we need to break it down into smaller pieces that are propagated as clones. As stated above, one way to approach this problem is to create a genome map, which can then be populated with physical landmarks onto which the smaller DNA fragments can be assembled. Another approach is to dispense with the map and break the entire genome into pieces, sequence them, and reassemble them. The methods for mapping genomes and assembling physical clone maps are discussed in Chapter 17.

Fig. 1.5 Roadmap outlining the second section of the book, which covers advanced techniques in gene manipulation and their application to organisms other than E. coli.

Fig. 1.6 Roadmap covering the early chapters of Part III, which discuss different methodologies for mapping and sequencing genomes.

Sequencing a genome is not an end in itself. Rather, it is just the first stage in a long journey whose goal is a detailed understanding of all the biological functions encoded in that genome and their evolution. To achieve this goal it is necessary to define all the genes in the genome and the functions that they encode. There are a number of different ways of doing this, one of which is comparative genomics (Chapter 18). The premise here is that DNA sequences encoding important cellular functions are likely to be conserved whereas dispensable or non-coding sequences will not. However, comparative genomics only gives a broad overview of the capabilities of different organisms. For a more detailed view one needs to identify each gene in the genome and determine its function. Over the last few years, technology developments in this new discipline of functional genomics have been nothing short of breathtaking. The final six chapters in this section look at ways in which large-scale functional analysis can be carried out (Fig. 1.7).