Genomic Approaches in Earth and Environmental Sciences - Gregory Dick - E-Book

Genomic Approaches in Earth and Environmental Sciences E-Book

Gregory Dick

0,0
108,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The first comprehensive synthesis of genomic techniques in earth sciences

The past 15 years have witnessed an explosion of DNA sequencing technologies that provide unprecedented insights into biology. Although this technological revolution has been driven by the biomedical sciences, it also offers extraordinary opportunities in the earth and environmental sciences. In particular, the application of "omics" methods (genomics, transcriptomics, proteomics) directly to environmental samples offers exciting new vistas of complex microbial communities and their roles in environmental and geochemical processes. This unique book fills the gap where there exists a lack of resources and infrastructure to educate and train geoscientists about the opportunities, approaches, and analytical methods available in the application of omic technologies to problems in the geosciences. 

Genomic Approaches in Earth and Environmental Sciences begins by covering the role of microorganisms in earth and environmental processes. It then goes on to discuss how omics approaches provide new windows into geobiological processes. It delves into the DNA sequencing revolution and the impact that genomics has made on the geosciences. The book then discusses the methods used in the field, beginning with an overview of current technologies. After that it offers in-depth coverage of single cell genomics, metagenomics, metatranscriptomics, metaproteomics, and functional approaches, before finishing up with an outlook on the future of the field. 

  • The very first synthesis of an important new family of techniques
  • Shows strengths and limitations (both practical and theoretical) of the techniques
  • Deals with both theoretical and laboratory basics
  • Shows use of techniques in a variety of applications, including various aspects of environmental science, geobiology, and evolution

Genomic Approaches in Earth and Environmental Sciences is a welcome addition to the library of all earth and environmental scientists and students working within a wide range of subdisciplines.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 328

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title page

Copyright

Preface

Acknowledgments

Abbreviations

Chapter 1: Introduction

1.1 Exploring the Microbial World

1.2 The DNA Sequencing Revolution: Historical Perspectives

References

Chapter 2: The Architecture of Microbial Genomes

Introduction

2.1 Genome Size, Organization, and Replication

2.2 Nucleotide Composition

2.3 Ecological and Evolutionary Aspects of Microbial Genomes

2.4 Genomic Diversity in Microbial Communities

2.5 Does Genomic Diversity Matter?

References

Chapter 3: Application of Omics Approaches to Earth and Environmental Sciences: Opportunities and Challenges

Introduction

3.1 New Perspectives on Microbial Biogeochemistry

3.2 A Genomic Record of Biological and Geochemical Evolution

3.3 Challenges and Limitations of Omics Approaches

References

Chapter 4: Overview of Approaches: From Whole‐Community Shotgun Sequencing to Single‐Cell Genomics

Introduction

4.1 Choosing the Right Approach

4.2 Experimental Design and Sampling Considerations

4.3 Overview of Current DNA Sequencing Technologies

4.4 Quality Control and Sequence Processing

References

Chapter 5: Genomics of Single Species and Single Cells

Introduction

5.1 Algorithms for Genome Assembly

5.2 Challenges of Genome Assembly

5.3 Scaffolding

5.4 Programs and Pipelines for Genome Assembly

5.5 Evaluation of Genome Assemblies

5.6 Single‐Cell Genomics

References

Chapter 6: Metagenomics: Assembly and Database-Dependent Approaches Introduction

Introduction

6.1 To Assemble or Not To Assemble?

6.2 Database‐Dependent Approaches

6.3 Database‐Independent Approaches:

De Novo

Assembly

6.4 Evaluation of Metagenomic Assemblies

6.5 A Philosophy of Metagenome Assemblies

References

Chapter 7: Metagenomic Binning

Introduction

7.1 Genomic Signatures of Nucleotide Composition

7.2 Binning Programs

7.3 Additional Signal and Steps for Binning: Coverage, Taxonomic Data, and Mini‐Assemblies

7.4 Identifying, Evaluating, and Assessing the Completeness of Genomic Bins

References

Chapter 8: Annotation: Gene Calling, Taxonomy, and Function

Introduction

8.1 Gene Calling

8.2 Determining Taxonomic Composition

8.3 Functional Annotation

References

Chapter 9: Metatranscriptomics

Introduction

9.1 Sample Collection

9.2 RNA Extraction and Preparation of cDNA Libraries

9.3 Assigning Transcripts to Genes or Other Features

9.4 De Novo Assembly

9.5 Absolute Versus Relative Abundance and Normalization

9.6 Detecting Differential Expression

References

Chapter 10: Metaproteomics

Introduction

10.1 Methodologies for Basic Proteomics

10.2 The Importance of Genomic Databases for Interpreting Proteomics Data

10.3 Quantitative Proteomics

10.4 Combining Stable Isotope Probing with Proteomics to Track Microbial Metabolism

References

Chapter 11: Lipidomics and Metabolomics

Introduction

11.1 Lipidomics

11.2 Metabolomics

References

Chapter 12: Downstream and Integrative Approaches and Future Outlook

Introduction

12.1 Comparative Omics

12.2 Statistical Approaches

12.3 Visualization

12.4 Cyberinfrastructure for Environmental Omics

12.5 Data and Sample Archival

12.6 Modeling

12.7 Emerging Trends and Future Outlook

References

Index

End User License Agreement

List of Tables

Chapter 2: The Architecture of Microbial Genomes

Table 2.1 Ecological and evolutionary aspects of genome size.

Chapter 4: Overview of : From Whole‐Community Shotgun Sequencing to Single‐Cell Genomics

Table 4.1 DNA sequencing technologies.*

Chapter 5: Genomics of Single Species and Single Cells

Table 5.1 Methods for

de novo

genome assembly. Note that assemblers optimized for metagenomes are shown in Table 6.1.

Table 5.2 Designating the stage of genome finishing. Note that Parks et al. (2015) have proposed simplifying this scheme to

finished

,

noncontiguous finished

, and

draft

, with the quality of draft genomes being further quantitatively described by their completeness and contamination.

Chapter 6: Metagenomics: Assembly and Database-Dependent Approaches Introduction

Table 6.1 Selected sequence assemblers for metagenomic data. Note that assemblers not optimized for metagenomic data are shown in Table 5.1.

Chapter 7: Metagenomic Binning

Table 7.1 Selected methods for metagenomic binning.

Chapter 9: Metatranscriptomics

Table 9.1 Goals and strategies for normalizing metatranscriptomic data.

List of Illustrations

Chapter 1: Introduction

Figure 1.1 Generalized structure of a bacterial or archaeal cell. Inset details translation and protein synthesis..

Figure 1.2 Macromolecules that serve as the basis for the three main omics approaches..

Figure 1.3 Major milestones in microbial community omics (

top

) and the decreasing cost and increasing throughput of DNA sequencing (

bottom

)..

Chapter 2: The Architecture of Microbial Genomes

Figure 2.1 Genome organization in bacteria and archaea. Genomes typically consist of one circular chromosome (shown here), though multiple chromosomes and/or plasmids are possible. Each circle represents one of the two strands of DNA. Arrows represent genes and blocks of arrows are operons. See text for more details.

Chapter 4: Overview of : From Whole‐Community Shotgun Sequencing to Single‐Cell Genomics

Figure 4.1 Overview of approaches and procedures for omics approaches to the Earth and environmental sciences. Sampling is conducted from various aquatic and terrestrial environments. Omics studies can be performed on the whole microbial community (

far right

) or on specific portions of the community that are targeted, for example, by (1) single‐cell approaches, (2) enrichment of populations by techniques such as flow cytometry, or (3) isolation of pure or mixed cultures.

Figure 4.2 Pipeline for quality control of next‐generation sequence data.

Figure 4.3 (a) The FASTQ format; (b) screenshot of a FASTQ file.

Chapter 5: Genomics of Single Species and Single Cells

Figure 5.1 Schematic overview of genome sequencing. (

Top

) A genome is randomly fragmented and sequenced, often with sequenced reads that are paired‐ends or mate‐paired; these reads are physically linked, and this information is useful for subsequent assembly. Solid arrows represent sequence reads; dotted lines represent unsequenced regions from DNA fragments. (

Bottom

) Schematic representation of assembled contigs and scaffolds, showing overlapping sequences from contigs and paired‐ends (

large bold arrows

) which are used to link contigs into scaffolds. Coverage, which is indicated above, is used to quantify the number of reads at each position within a contig or scaffold.

Figure 5.2 Challenges of genome assembly due to repeated sequence regions. (a) Schematic of a genome showing an identical repeat sequence region that is present in the genome in four copies (

red

). Unique sequence regions used further below are shown in other colors. For simplicity, the gray portion of the genome is not depicted in subsequent panels. (b) The genome is randomly fragmented for library preparation and sequencing. (c) Schematic of short reads (e.g., from Illumina), shown as dotted arrows in relation to genome fragments from (b). Key reads depicted below in (e) are shown in black and numbered. (d) Schematic of long reads (e.g., from PacBio), shown as dotted arrows in relation to genome fragments from (b). (e) Repeats sequenced with short read technology can result in disagreement between sequence reads that came from the repeat region at different genomic loci, leading to fragmentation of contigs. (f) Repeats sequenced with short‐read technology can also result in chimeric assemblies in which genomic loci are erroneously brought together (yellow and brown in this case). (g) Long reads can resolve such repeat regions and thus are invaluable for producing accurate assemblies. Note that the various elements are not to scale.

Figure 5.3 Identification of genome assembly problems using intrinsic features. (a) Schematic of a contig showing coverage and underlying paired‐end reads as in Figure 5.1. a. A minimum in coverage can indicate a weak join. b. A peak in coverage well above the genome average may reflect erroneous assembly of a repeat region. c. Paired‐end reads (

black

) that are improperly oriented indicate misassembly. d. Paired‐end reads that are too far apart relative to the expected insert size. (b) Screenshot of read mapping (

top

) and average coverage (

bottom

). e. A weak join and likely chimera indicated by a coverage minimum and mismatched sequences between overlapping sequence reads.

Chapter 6: Metagenomics: Assembly and Database-Dependent Approaches Introduction

Figure 6.1 An overview of approaches for analyzing metagenomic data.

Figure 6.2 A schematic example of fragment recruitment. Reads from a metagenome (or transcriptome) are plotted against a reference genome according to their percent identity. It is often the case that one population in the environment will have high similarity to the reference genome and several others will be more divergent, with a natural “gap” in sequence space in between.

Figure 6.3 Pitfalls of metagenomic assembly. (a) Schematic of microbial cells of three different species and genomes. Here we focus on the red species with circular cells, in which the thicker blocks of the genome represent identical sequence repeats between different strains. (b) The genomes are randomly fragmented for library preparation and sequencing. (c) Schematic of short reads sequencing technology (e.g., Illumina). Sequence reads are shown as dotted arrows in relation to genome fragments from (b). Key reads depicted below in (e) are shown in black and numbered. (d) Schematic of long read sequencing technology (e.g., PacBio). Sequence reads are shown as dotted arrows in relation to genome fragments from (b). (e) Repeats sequenced with short read technology can result in disagreement between sequence reads that came from the repeat region at different genomic loci, leading to fragmentation of contigs. (f) Repeats sequenced with short read technology can also result in chimeric assemblies in which genomic loci are erroneously brought together (yellow and brown in this case). (g) Long reads can resolve such repeat regions and thus are invaluable for producing accurate assemblies. Note that the various elements are not to scale.

Chapter 7: Metagenomic Binning

Figure 7.1 Schematic illustration of genome signatures of two different microorganisms. For

tetra

nucleotides, a four‐base window is slid one base at a time and the frequency of each four‐letter oligonucleotide is counted. The resulting frequency pattern (

histograms at bottom

) is conserved across the whole genome and distinct between genomes, provided sufficient evolutionary divergence. The strand of DNA sampled by metagenomics is disregarded by summing counts from pairs of reverse complementary oligonucleotides together. There are 256 possible tetranucleotides but after summing reverse complements, only 136 are unique.

Chapter 8: Annotation: Gene Calling, Taxonomy, and Function

Figure 8.1 Schematic illustration of workflow for a metagenomic sequencing project with emphasis on materials used for taxonomic and functional annotation. Short thin lines represent individual sequencing reads, with different colours from different populations, and thicker longer lines represent consensus sequences of contigs. Assembly of sequencing reads into contigs and genomic bins is preferable, but even where that is possible, a large portion of the dataset is often on short contigs or individual reads that do not assemble. Thus, functional and taxonomic annotation of individual reads is also desirable.

Chapter 9: Metatranscriptomics

Figure 9.1 Annotation of transcripts in metatranscriptomics sequencing projects. In order to match a transcript with its corresponding gene, and to facilitate annotation, transcripts are often mapped to genes from reference genomes or from paired metagenomes. The disadvantage of this approach is that genes and associated contigs and genomes may be missing from the metagenome. Another option is to perform

de novo

genome assembly on the metatranscriptomic reads.

Chapter 10: Metaproteomics

Figure 10.1 An overview of a shotgun proteomic workflow. Proteins are symbolized by chains of circles, with each circle representing an amino acid with its one‐letter abbreviation. (a) Workflow from sample collection to mass spectrometry. (b) Bioinformatic generation of databases from genomic data. See text for details.

Chapter 12: Downstream and Integrative Approaches and Future Outlook

Figure 12.1 The multidimensional nature of microbial community omics data. Double arrows reflect the need to track linkages between forms of data across all dimensions. For example, such linkages are critical to study relationships between gene abundance/expression and geochemical conditions or process rates (stored as sample metadata).

Guide

Cover

Table of Contents

Begin Reading

Pages

ix

x

xi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

27

28

29

30

31

32

33

34

35

36

37

38

39

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

59

60

61

62

63

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

81

82

83

84

85

86

87

89

90

91

92

93

94

95

96

97

98

99

101

102

103

104

105

106

107

108

109

110

111

113

114

115

116

118

119

121

122

123

124

125

127

128

130

131

132

133

134

135

136

137

138

139

140

141

142

143

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

161

162

163

New Analytical Methods in Earth and Environmental Science Series

 

Introducing New Analytical Methods in Earth and Environmental Science, a new series providing accessible introductions to important new techniques, lab and field protocols, suggestions for data handling and in-terpretation, and useful case studies.

New Analytical Methods in Earth and Environmental Science represents an invaluable and trusted source of information for researchers, advanced students, and applied earth scientists wishing to familiarize themselves with emerging techniques in their field.

All titles in this series are available in a variety of full-color, searchable e-book formats.

See below for the full list of books from the series.

 

Digital Terrain Modelling

John P. Wilson

 

Structure from Motion in the Geosciences

Jonathan L. Carrivick, Mark W. Smith, Duncan J. Quincey

 

Ground-penetrating Radar for Geoarchaeology

Lawrence B. Conyers

 

Rock Magnetic Cyclostratigraphy

Kenneth P. Kodama, Linda A. Hinnov

 

Techniques for Virtual Palaeontology

Mark Sutton, Imran Rahman, Russell Garwood

Genomic Approaches in Earth and Environmental Sciences

 

Gregory Dick

University of Michigan Michigan, USA

 

 

 

 

 

This edition first published 2019

© 2019 John Wiley & Sons

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Gregory Dick to be identified as the author of this work has been asserted in accordance with law.

Registered Office(s)

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data has been applied for

9781118708248

Cover Design: Wiley

Cover Image: © cosmin4000/Gettyimages

Preface

In recent years we have witnessed an explosion of DNA sequencing technologies that provide unprecedented insights into biology. Although this technological revolution has been driven by the biomedical sciences, it also offers extraordinary opportunities in the Earth and environmental sciences. In particular, the application of “omics” methods (genomics, transcriptomics, proteomics) directly to environmental samples offers exciting new vistas of complex microbial communities and their roles in environmental and geochemical processes. However, there is currently a lack of resources and infrastructure to educate and train geoscientists about the opportunities, approaches, and analytical methods available in the application of omic technologies to problems in the geosciences. This book aims to begin to fill this gap. Due to the rapidly advancing nature of DNA sequencing technologies, this book will almost certainly be well out of date by the time of publication. Nevertheless, my hope is that the accompanying e‐book format will allow relatively frequent updates and will serve as a foundation and a gateway for students and other scientists to access this exciting field. I apologize in advance to the many researchers whose excellent work was inevitably not cited, due to either my own ignorance or constraints on space. I welcome suggestions for citations, additions, and corrections that can be incorporated into future editions.

Gregory Dick Ann Arbor, Michigan, USA August, 2017

Acknowledgments

This book is the product of many interactions with numerous people over several years. It was developed from lecture notes for a graduate class that I teach at the University of Michigan, Earth 523, and thus benefited from numerous questions, comments, and input from students during class over the years. Several students, including Shilva Shrestha and Matthew Hoostal, provided direct detailed comments and edits for which I am most grateful. Sunit Jain was a bioinformatician in my lab who assisted with and substantially contributed to Earth 523, thus he indirectly contributed to the content of this book as well, particularly Chapter 4. Other current and former lab members including Brett Baker, Karthik Anantharaman, and Sharon Grim also provided valuable material and feedback. Vincent Denef provided insightful feedback and edits to Chapter 2, especially the section on the ecological and evolutionary aspects of microbial genomes. Mike Wilkins, Mary Ann Moran, Frank Stewart, Mak Saito, Jake Waldbauer, Ann Pearson, Murat Eren, and Titus Brown also provided thoughtful comments and suggested edits on individual chapters. Illustrations were drafted by Stephanie O'Neil, an undergraduate student at the time. Chapters 1 and 3 drew on early material from drafts of an article now published in Elements magazine (Dick and Lam 2015, Elements 11: 403–408) with permission from the Mineralogical Society of America. I am grateful for permission to reuse this material as well as figures from other papers as described herein.

I owe several people thanks for their patience. First, the publisher, Wiley, including Ian Francis, Delia Sandford, Ramya Raghaven, and Sonali Melwani, for their assistance and for tolerating the tardiness of this book. Finally, thanks to my wife, Jenna, and kids, Adeline and Ben, for their support and patience.

Abbreviations

BAC

bacterial artificial chromosome

DNA

deoxyribonucleic acid

dnGASP

de novo

genome assembly project

GAGE

genome assembly gold standard evaluations

GMG

geomicrobiology and microbial geochemistry

IGV

Integrative Genomics Viewer

mRNA

messenger RNA

OLC

overlap‐layout‐consensus

ORF

open reading frames

OTU

operational taxonomic unit

PCR

polymerase chain reaction

RAM

random access memory

RNA

ribonucleic acid

rrn

ribosomal RNA

Chapter 1Introduction

1.1 Exploring the Microbial World

Microorganisms shaped the geochemical evolution of our planet throughout its history, and they continue to play a key role in the modern world. In deep time they oxygenated Earth’s atmosphere and set the stage for life as we know it. Today, microbes mediate global biogeochemical cycles, influence the speciation and fate of pollutants, and modulate climate change through production and consumption of greenhouse gases. The field of geomicrobiology and microbial geochemistry (GMG), which studies the interplay between microbes and the Earth system, has roots in the 19th century (Druschel & Kappler 2015; Druschel et al. 2014). However, only recently has the breadth of microbial geomicrobiological processes and extent to which they shape geological, geochemical, and environmental processes become clear. Many methods and concepts central to GMG are also relevant to environmental engineering (e.g., drinking water and wastewater treatment) and medicine (e.g., human microbiome), including the omics approaches that are the focus of this book.

How to study this microbial world? Inherent challenges abound; microorganisms are small. Their cellular morphology is typically not informative of their phylogeny, physiology, or role in biogeochemical or ecological processes. Microbes often live in highly diverse microbial communities where it is hard to decipher the activities of different microorganisms or to trace specific microbial processes. Traditional microbiological approaches revolve around the cultivation of bacteria and archaea, which enables powerful laboratory‐based methods of dissecting microbial physiology, biochemistry, and genetics as they relate to geochemical processes (Newman et al. 2012). Yet most microorganisms in nature are resistant to cultivation owing to symbiotic lifestyles or unknown nutritional requirements (Staley & Konopka 1985). Further, it can be impractical to grow pure cultures due to the extremely slow growth of many microorganisms, which in the environment is perhaps more akin to stationary phase than to growing cultures (Roy et al. 2012). Comprehensive culturing is also impractical because of the stunning complexity of natural microbial communities (thousands of species). Finally, the results from pure cultures may not be representative of in situ processes (Madsen 2005).

Traditional geochemical methods of measuring process rates and products and using biological poisons or inhibitors of specific microbial enzymes offer critical quantitative data and some mechanistic insights (Madsen 2005; Oremland et al. 2005). However, these approaches provide little information with regard to the identity or nature of the microorganisms that underpin processes of interest. Exciting advances in microscopy and spectroscopy that provide opportunities to link microorganisms to biogeochemical processes are described and reviewed elsewhere (Behrens et al. 2012; Newman et al. 2012; Wagner 2009).

Recent advances in DNA sequencing technologies open up entirely new avenues to study geomicrobiology by circumventing the cultivation step and providing extensive information on microorganisms as they exist in natural settings. This data comes from the sequence of macromolecules (Box 1.1) that constitute microbial cells (Fig. 1.1). This book focuses on DNA, RNA, and protein, and also touches on lipids and the pool of small molecules within a cell (metabolites). The collection of genes that encode an organism is known as the genome. Genes are transcribed as messenger RNAs, or transcripts, the total pool of which is called the transcriptome. Transcripts are then translated into protein, which actually performs the structural and biochemical functions of the cell. The total protein content of a cell is known as the proteome. The total content of small molecules within a cell is referred to as the metabolome. These small molecules include metabolites, the substrates, intermediates, and products of biochemical reactions catalyzed by enzymes. The study of the whole collection of each of these molecules in a pure culture is referred to as genomics, transcriptomics, proteomics, and metabolomics. When such information is derived from a whole community of microorganisms, we say “community genomics” or “metagenomics“ (or metatranscriptomics, metaproteomics). Collectively, these approaches, whether applied to a single organism or a community of organisms, are referred to in shorthand as “omics.”

Figure 1.1 Generalized structure of a bacterial or archaeal cell. Inset details translation and protein synthesis..

Source: Druschel and Kappler (2015), p. 390, Fig. 1. Reproduced with permission from the Mineralogical Society of America

Box 1.1 Definitions of key macromolecules studied by omics approaches

Deoxyribonucleic acid (DNA): DNA consists of four nucleotide bases – guanine (G), adenine (A), thymine (T), and cytosine (C) – that are joined together in a sequence to form genes.

Gene: a unit of genetic information encoding protein, tRNA, or ribosomal RNA. Genes are about 1000 bases long, on average.

Genome: the genome is the collection of all genetic information in an organism, including the genes as well as elements between genes that are involved in regulating gene expression. Microbial genomes range in size from approximately 400 000 to 10 million bases and from 400 to 10 000 genes.

Ribonucleic acid (RNA): There are several major forms of RNA, including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal rRNA (rRNA). mRNA is an intermediate between DNA and protein (see Fig. 1.1); rRNA is a structural and catalytic component of ribosomes, the machinery that translates mRNA into protein. tRNA are small molecules that recognize the three‐base code of mRNA and translate it into amino acids during protein synthesis.

Protein: proteins are polymers (long chains) of amino acids. The two main roles of proteins are (1) to provide structure or scaffolding, e.g., in cell wall or protein synthesis; (2) to catalyze biochemical reactions in the cell, including those required for energy metabolism, biosynthesis of macromolecules, transport of elements into and out of the cell, and generation of biogenic minerals (“biomineralization”). Proteins can also “sense” the environment and transduce signals that elicit cellular responses.

Lipids: hydrocarbons, often with polar head groups, that are the primary constituents of cell membranes. In some cases, specific lipids are diagnostic of specific microbial groups or metabolisms. Unlike other biological macromolecules, lipids may be preserved in sediments over geological time (millions to billions of years), so they have great value in potentially providing information on ancient ecosystems. Like other macromolecules, the synthesis of lipids is conducted by proteins that are encoded by genes. Hence, the “lipidome” can theoretically be predicted from the genome.

Carbohydrates: macromoleucles consisting of carbon, hydrogen. Carbohydrates decorate the cell surface and are an important interface between the cells and their environment. Because they are often negatively charged, they can play important roles in binding cations and influencing biomineralization.

Whereas genomes encode all the proteins that could possibly be made in a given cell, a genome does not give any information about which proteins and RNA are actually being produced at any given time, or about the quantities in which they are produced. Transcriptomics and proteomics provide this information. DNA, RNA, and protein have different lifetimes based on the stability of the molecules and the biochemical mechanisms that degrade them. Thus these molecules provide information at different time scales (Fig. 1.2). Genomes also provide a “molecular fossil record” of how genes and organisms have evolved over the billions of years of life on Earth (David & Alm 2011; Macalady and Banfield 2003; Zerkle et al. 2005).

Figure 1.2 Macromolecules that serve as the basis for the three main omics approaches..

Source: Dick and Lam (2015), p. 404, Fig. 1, with permission from the Mineralogical Society of America

1.2 The DNA Sequencing Revolution: Historical Perspectives

The “meta‐omics” revolution has its roots in the pioneering work of Carl Woese and colleagues, who sequenced microbial rRNA genes in order to uncover their phylogenetic relationships (Woese & Fox 1977). This work recognized that, because rRNA genes serve critical functions, they are present in every organism and are highly conserved at the sequence level. Thus, they hold invaluable information about the evolutionary relationships of microorganisms. Through painstaking labor, the sequence of rRNA genes from a wide range of organisms was deciphered, leading to an astonishing discovery: methane‐producing microorganisms previously assumed to be bacteria were actually a new and completely separate domain of life – the archaea (Sapp & Fox 2013). This transformed our understanding of the tree of life by revealing that it is composed of three domains: bacteria, archaea, and eukarya (Woese & Fox 1977). The advent of rRNA gene sequencing also provided a practical and objective tool for classifying microorganisms, a task which had been declared impossible previously (Woese & Goldenfeld 2009).

Soon after, Pace and colleagues applied sequencing to rRNA genes purified directly from uncultured communities of microorganisms (Stahl et al. 1984). Subsequent application of polymerase chain reaction (PCR) to the amplification of rRNA genes (with an explicit focus on one of these genes, known as 16S rRNA) directly from the environment increased the throughput of this approach and revealed startling insights into the microbial world in seawater and other environments (DeLong 1992; Fuhrman et al. 1992). Spurred by rapidly advancing technologies and the ever declining costs and increasing throughput of DNA sequencing technologies (Loman et al. 2012), the culture‐independent approach quickly revealed the staggering diversity of the microbial world (Pace 2009). This work revealed that only a tiny fraction of microbial groups have been studied in culture (Baker & Dick 2013; Pace 2009).

In parallel with the explosion of 16S rRNA gene sequencing, faster, cheaper DNA sequencing also enabled a new era of sequencing whole microbial genomes (Land et al. 2015). Information on the complete gene content theoretically provides a picture of the metabolic and physiological potential of microorganisms (however, see the caveats and challenges discussed in Chapter 3). The first bacterial genomes were published in 1995 (Fleischmann et al. 1995; Fraser et al. 1995), and the number of microbial genomes sequenced has expanded exponentially ever since (Fournier et al. 2013).

A major initial finding of these sequencing efforts was that microbial genomes have startling variability of gene content (Tettelin et al. 2005; Welch et al. 2002). This led to concepts of the pangenome, core genome, and flexible genome (Cordero & Polz 2014) (see Chapter 2). Genome sequences from cultured organisms are valuable because they enable studies of the links between genotype and phenotype and represent taxonomic and functional anchors in the tree of life for interpreting metagenomic data. Particularly valuable are genomes from type strains that have been validly described and named, which are estimated to account for a substantial portion (~15%) of phylogenetic diversity (Kyrpides et al. 2014). However, despite the microbial genome sequencing revolution, less than 3% of these type strains have had their genomes sequenced (Kyrpides et al. 2014). Thus, even the genomic coverage of cultured microbial life remains woefully inadequate, and of course, the cultured portion is just a small fraction of the total microbial world. The Microbial Earth Project (www.microbial‐earth.org/) was recently launched to track the inventory of type strains of bacteria and archaea and their genome sequencing projects.

At the confluence of environmental 16S rRNA gene sequencing of microbial communities and whole genome sequencing of cultured microbes is the direct retrieval of genomes from uncultured microbial communities. Early metagenomic approaches used cloning of environmental DNA followed by sequencing and/or screening of expressed products for functions of interest (Riesenfeld et al. 2004; Stein et al. 1996). The term “metagenomics” was first coined in 1998, in the context of accessing natural products (e.g., antibiotics) from uncultured soil microorganisms (Handelsman et al. 1998). The power of the functional metagenomics approach lies in the direct connection of sequence to function and was illustrated beautifully by the discovery of bacterial light‐driven proton pumps as a new form of phototrophy in the oceans (Béjà et al. 2000). This method can also provide valuable insights by directly linking phylogenetic marker genes to function (Pham et al. 2008), which is particularly valuable when the cloned fragments are large, as in BAC or fosmid libraries. However, because of the cost and labor involved in constructing and screening such clone libraries, this approach was not readily scalable. The “functional metagenomic” approach also faces practical challenges such as genetic and biochemical incompatibility between environmental genes and hosts (e.g., differences in codon bias, required co‐factors). Some of these issues can be overcome by recent synthetic genomic approaches, but they still limit the throughput of exploratory, discovery‐driven functional screening.

Shotgun metagenomics, in which community DNA is randomly fragmented and sequenced, was then demonstrated as a viable and valuable approach (Tyson et al. 2004; Venter et al. 2004) and quickly emerged as the dominant method used in metagenomics studies. For the first time, whole genomes of uncultured organisms could be reconstructed from microbial communities, revealing their metabolic potential (Tyson et al. 2004) and evolutionary processes (Allen & Banfield 2005). Several spectacular discoveries, including the linking of ammonia oxidation to archaea (Venter et al. 2004), demonstrated the power and promise of metagenomics. A vision for the potential advances that metagenomics could bring to science and society was beginning to come into view (National Research Council 2007). Hugenholtz and Tyson (2008) recount a brief history and highlights of these early stages and different approaches of metagenomics. For a more in‐depth historical account see Handelsman (2004) and Gilbert and Dupont (2011). The rapid decrease in costs and increase in throughput of DNA sequencing has enabled shotgun sequencing of more complex microbial communities (Fig. 1.3). Recent papers report the reconstruction of thousands of genomes from metagenomes (Anantharaman et al. 2016).

Figure 1.3 Major milestones in microbial community omics (top) and the decreasing cost and increasing throughput of DNA sequencing (bottom)..

Source: Modified from Dick and Lam (2015), p. 406, Fig. 3, permission from the Mineralogical Society of America

While the genomic sequence provides information on the metabolic and physiological potential of microorganisms, it does not indicate whether those functions are being carried out at a particular point in time or space. To address this question, characterizing the expression of mRNA or protein is required. Metatranscriptomics was applied with great success to surface seawater microbial communities, revealing that flexible genes are highly expressed in the environment (Frias‐Lopez et al. 2008). Critically, this paper also used qPCR to independently evaluate the accuracy of RNA amplification, which is required to obtain sufficient cDNA for many sequencing applications (see Chapter 9).

Whereas the DNA‐ and RNA‐based analyses described above rely on the sequencing of nucleotides, proteomic tools use mass spectrometry to accurately measure the masses of small peptide fragments and even individual amino acids. The matching of these measured masses with calculated peptide masses derived from genomic information enables the identification of protein fragments. Metaproteomics is challenging because analytical methods for translating MS/MS spectra into protein sequence are complex and largely reliant on having the corresponding genomic sequence for interpretation. Similarly, the recovery of total protein from many environmental samples is more challenging than the extraction of DNA and RNA. Not surprisingly, initial progress on application of proteomics to microbial communities was accomplished in low‐diversity communities for which genomic sequence was available (Ram et al. 2005; Verberkmoes et al. 2009). Indeed, with sufficient genomic information, protein expression from very closely related strains can be differentiated (Lo et al. 2007). These studies yielded insights into the biochemical mechanisms of iron oxidation, a central process sustaining primary production and pyrite dissolution in acid mind drainage, and showed that among the most highly expressed proteins are “hypothetical” and “conserved hypothetical” proteins (Ram et al. 2005). With growing databases of genomic sequence and improving algorithms for interpreting MS/MS spectra, metaproteomics is now a viable approach for studying more complex microbial communities (see Chapter 10).

References

Allen, E. E. & Banfield, J. F. (2005) Community genomics in microbial ecology and evolution.

Nature Reviews Microbiology

,

3

, 489–498.

Anantharaman, K., Brown, C. T., Hug, L. A., et al. (2016) Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system.

Nature Communications

,

7

, 13219.

Baker, B. J. & Dick, G. J. (2013) Omic approaches in microbial ecology: charting the unknown.

Microbe

,

8

, 353–360.

Behrens, S., Kappler, A. & Obst, M. (2012) Linking environmental processes to the in situ functioning of microorganisms by high‐resolution secondary ion mass spectrometry (NanoSIMS) and scanning transmission X‐ray microscopy (STXM).

Environmental Microbiology

,

14

, 2851–2869.

Béjà, O., Aravind, L., Koonin, E. V., et al. (2000) Bacterial rhodopsin: evidence for a new type of phototrophy in the sea.

Science

,

289

, 1902–1906.

Cordero, O. X. & Polz, M. F. (2014) Explaining microbial genomic diversity in light of evolutionary ecology.

Nature Reviews Microbiology

,

12

, 263–273.

David, L. A. & Alm, E. J. (2011) Rapid evolutionary innovation during an Archaean genetic expansion.

Nature

,

469

, 93–96.

Delong, E. F. (1992) Archaea in coastal marine environments.

Proceedings of the National Academy of Sciences of the United States of America

89

, 5685–5689.

Dick, G. J. & Lam, P. (2015) Omics approaches to microbial geochemistry.

Elements

,

11

, 403–408.

Druschel, G. K. & Kappler, A. (2015) Geomicrobiology and microbial geochemistry.

Elements

,

11

, 389–394.

Druschel, G. K., Dick, G. J. & Boyd, E. S. (2014) Geomicrobiology and Microbial Geochemistry 2014 Workshop Report. Available at: https://dx.doi.org/10.6084/m9.figshare.3083524.v1 (accessed 25 October 2017).

Fleischmann, R. D., Adams, M. D., White, O., et al. (1995) Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd.

Science

,

269

, 496–512.

Fournier, P. E., Drancourt, M., Colson, P., Rolain, J. M., La Scola, B. & Raoult, D. (2013) Modern clinical microbiology: new challenges and solutions.

Nature Reviews Microbiology

,

11

, 574–585.

Fraser, C. M., Gocayne, J. D., White, O., et al. (1995) The minimal gene complement of Mycoplasma–Genitalium.

Science

,

270

, 397–403.

Frias‐Lopez, J., Shi, Y., Tyson, G. W., et al. (2008) Microbial community gene expression in ocean surface waters.

Proceedings of the National Academy of Sciences of the United States of America

105

, 3805–3810.

Fuhrman, J. A., Mccallum, K. & Davis, A. A. (1992) Novel major archaebacterial group from marine plankton.

Nature

,

356

, 148–149.

Gilbert, J. A. & Dupont, C. L. (2011) Microbial metagenomics: beyond the genome.

Annual Review of Marine Science

,

3

, 347–371.

Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms.

Microbiology and Molecular Biology Reviews

,

68

, 669–685.

Handelsman, J., Rondon, M. R., Brady, S. F., Clardy, J. & Goodman, R. M. (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products.

Chemistry and Biology

,

5

, R245–259.

Hugenholtz, P. & Tyson, G. W. (2008) Metagenomics.

Nature

,

455

, 481–483.

Kyrpides, N. C., Hugenholtz, P., Eisen, J. A., et al. (2014) Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.

Plos Biology

,

12

, e1001920.

Land, M., Hauser, L., Jun, S. R., et al. (2015) Insights from 20 years of bacterial genome sequencing.

Functional and Integrative Genomics

,

15

, 141–161.

Lo, I., Denef, V. J., Verberkmoes, N. C., et al. (2007) Strain‐resolved community proteomics reveals recombining genomes of acidophilic bacteria.

Nature

,

446

, 537–541.

Loman, N. J., Constantinidou, C., Chan, J. Z. M., et al. (2012) High‐throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity.

Nature Reviews Microbiology

,

10

, 599–606.

Macalady, J. & Banfield, J. F. (2003) Molecular geomicrobiology: genes and geochemical cycling.

Earth and Planetary Science Letters

,

209

, 1–17.

Madsen, E. L. (2005) Identifying microorganisms responsible for ecologically significant biogeochemical processes.

Nature Reviews Microbiology

,

3

, 439–446.

National Research Council (2007)