Functional Genomics in Aquaculture - Marco Saroglia - E-Book

Functional Genomics in Aquaculture E-Book

Marco Saroglia

0,0
193,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Genomics has revolutionized biological research over the course of the last two decades. Genome maps of key agricultural species have offered increased understanding of the structure, organization, and evolution of animal genomes. Building upon this foundation, researchers are now emphasizing research on genome function. Published with the World Aquaculture SocietyFunctional Genomics in Aquaculture looks at the advances in this field as they directly relate to key traits and species in aquaculture production.

Functional Genomics in Aquaculture opens with two chapters that provide a useful general introduction to the field of functional genomics. The second section of the book focuses on key production traits such as growth, development, reproduction, nutrition, and physiological response to stress and diseases. The final five chapters focus on a variety of key aquaculture species. Examples looking at our understanding of the functional genomes of salmonids, Mediterranean sea bass, Atlantic cod, catfish, shrimp, and molluscs, are included in the book.

Providing valuable insights and discoveries into the functional genomes of finfish and shellfish species, Functional Genomics in Aquaculture, will be an invaluable resource to researchers and professionals in aquaculture, genetics, and animal science.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 826

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Title Page

Copyright

List of Contributors

Preface

Chapter 1: Functional Genomics Research in Aquaculture: Principles and General Approaches

Introduction

The Concept of Functional Genomics

Approaches to Functional Genomics

Functional Genomics Approaches Suitable for Aquaculture

Acknowledgments

Chapter 2: Genomic Resources for Functional Genomics in Aquaculture Species

Introduction

Polymorphic DNA Markers

Expressed Sequence Tags (ESTs)

Microarrays

Sequence Read Archive (SRA)

SNP Genotyping Platforms

Databases for Aquaculture Species

Whole Genome Sequence Assemblies

Chapter 3: Production, Growth, and Insulin-Like Growth Factor-I (IGF-I) Gene Expression as an Instantaneous Growth Indicator in Nile Tilapia Oreochromis Niloticus

Introduction

Aquaculture and Health Benefits

Economic Prominence of Tilapia

Growth Trials

The Relationship of Insulin-Like Growth Factor-I (IGF-I) to Growth

Practical Utility of IGF-I mRNA as an Indicator for Growth Rate

Conclusions

Acknowledgments

Chapter 4: Gene Expression Pattern during European Sea Bass Larvae Development: Impact of Dietary Vitamins

Introduction

Larval to Juvenile Development: Morphological Aspects

Development of Genomic Resources and Tools

Transcriptomic Investigation of European Sea Bass Larvae Development

Impact of Nutritional Factors

Conclusions

Acknowledgments

Chapter 5: Transcriptomics of the Compensatory Growth in European Sea Bass Dicentrarchus labrax

Introduction

Key Players in Fatty Acid Metabolism

Key Players in Intestinal Oligopeptides Transport

Molecular Cloning and Sequencing

In Silico Analysis

Phylogenetic Analysis

Dietary Manipulation of PepT1 and Δ6 Desaturase Gene Expression

Conclusions

Chapter 6: Functional Genomic Analysis of the Nutritional and Hormonal Regulation of Fish Glucose and Lipid Metabolism

Introduction

Carbohydrate Metabolism

Lipid Metabolism

Chapter 7: Genomic Responses to Stress Challenges in Fish

The Nature of the Stress Response

Genomic Approaches to Evaluate the Stress Response in Fish

Studies on Individual Genes

Microarray-Based Studies

Conclusions

Acknowledgments

Chapter 8: Functional Genomic Analysis of Sex Determination and Differentiation in Teleost Fish

Reproduction-Related Problems in Finfish Aquaculture

Introduction to Sex Determination and Differentiation

Genotypic Sex Determination

Environmental Sex Determination

Evolution of Sex Determining Systems

Sex Differentiation

Approaches to Study Genomics of Fish Sex Determination and Differentiation

Growth–Sex Differentiation Relationships

Contribution of Epigenetics

Concluding Remarks and Future Prospects

Acknowledgments

Chapter 9: Functional Genomics of Stress: Molecular Biomarkers for Evaluating Fish CNS Activity

Introduction

Aquaculture and Model Organisms

Genomic Resources for Fish

Molecular Biomarkers for Animal Welfare

BDNF: Gene Structure, mRNA, and Protein Quantification after Stress

Chapter 10: The SoLute Carrier (SLC) Family Series in Teleost Fish

Introduction

Transport Processes in Teleost Fish via SLC Transporters

Function in Teleost Fish SLC Transporters

Conclusions

Chapter 11: Next-Generation Sequencing and Functional Genomic Analysis in Rainbow Trout

Introduction

Next-Generation Sequencing Technology

Applications of Next-generation Sequencing in Rainbow Trout

Conclusion and Future Perspectives

Chapter 12: Functional Genomics Research of Atlantic Cod Gadus morhua

Introduction

Functional Genomics Research on Atlantic Cod Defense Responses

Atlantic Cod DNA Microarray Platforms for Functional Genomics Research

Conclusions and Future Perspectives

Acknowledgments

Chapter 13: Catfish Functional Genomics: Progress and Perspectives

Introduction

Catfish Industry Overview

Focus Areas for Catfish Functional Genomics Research

Historical Development of Catfish Functional Genomic Resources

Utilization of Transcriptome Resources for Macro- and Microarray-based Studies

Next-Gen Perspective and Future Directions

Chapter 14: Functional Genomics in Shrimp Disease Control

Introduction

Captive Breeding Programs

Specific Pathogen-Free (SPF) Shrimp Programs

Linkage Maps of Shrimp

DNA Markers for Disease Resistance in Shrimp

Differentially Expressed Genes as Candidate Markers for Disease Resistance in Shrimp

Conclusions

Chapter 15: Applications of Functional Genomics in Molluscs Aquaculture

Introduction

Sequencing Methods and Bivalve Genomes

Functional Genomics

Future Applications of Mollusc Genomics to Aquaculture

Acknowledgments

Index

This edition first published 2012 © 2012 by John Wiley & Sons, Inc.

Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical and Medical business with Blackwell Publishing.

Editorial offices: 2121 State Avenue, Ames, Iowa 50014-8300, USA The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell.

Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by Blackwell Publishing, provided that the base fee is paid directly to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license by CCC, a separate system of payments has been arranged. The fee codes for users of the Transactional Reporting Service are ISBN-13: 978-0-4709-6008-0/2012.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data Functional genomics in aquaculture / edited by Marco Saroglia and Zhanjiang (John) Liu. p. cm. Includes bibliographical references and index. ISBN 978-0-470-96008-0 (hardcover : alk. paper) 1. Fishes–Breeding. 2. Shellfish–Breeding. 3. Genomics. 4. Aquaculture. I. Saroglia, Marco. II. Liu, Zhanjiang. SH155.5.F86 2012 639.2–dc23 2012007484

A catalogue record for this book is available from the British Library.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Disclaimer

The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read.

1 2012

List of Contributors

Amilcare BarcaDepartment of Biological and Environmental Sciences and Technologies (DBEST) Laboratory of General Physiology University of Salento Via Provinciale Lecce-Monteroni Lecce I-73100, Italy

Giovanni BernardiniDepartment of Biotechnology and Life Sciences University of Insubria Via J.H. Dunant 3 Verese I-21100, Italy and Inter-University Centre for Research in Protein Biotechnologies “The Protein Factory” Polytechnic University of Milan and University of Insubria Via J.H. Dunant 3 Varese I-21100, Italy

Remedios B. BolivarCentral Luzon State University Science City of Muñoz Nueva Ecija 3120, The Philippines

Russell J. BorskiDepartment of Biology North Carolina State University Raleigh, NC 27695, USA

Christopher L. BrownUS Department of Commerce NOAA Aquaculture and Enhancement Division The Milford Laboratory Milford, CT 06460, USA

Chantal CahuIfremer Fish Nutrition Laboratory Plouzané 29200, France

Samuela CoraDepartment of Biotechnology and Life Sciences University of Insubria Via J.H. Dunant 3 Varese I-21100, Italy

M.M. CostaInstituto Investigaciones Marinas (IIM) Consejo Superior de Investigaciones Cientificas (CSIC) Eduardo Cabello 6 Vigo 36208, Spain

Emmanuel M. Vera CruzCentral Luzon State University Science City of Muñoz 3120, The Philippines

Maria DariasIRTA Centre de Sant Carles de la Ràpita (IRTA-SCR) Unitat de Cultius Experimentals Crta. del Poble Nou s/n Sant Carles de la Ràpita 43540, Spain

Arun K. DharViracine Therapeutics Corporation Columbia, MD 21046, USA

Noelia DíazInstitut de Ciències del Mar Consejo Superior de Investigaciones Científicas (CSIC) Barcelona, Spain

Ignacio FernandezIRTA Centre de Sant Carles de la Ràpita (IRTA-SCR) Unitat de Cultius Experimentals Crta. del Poble Nou s/n Sant Carles de la Ràpita 43540, Spain

A. FiguerasInstituto Investigaciones Marinas (IIM) Consejo Superior de Investigaciones Cientificas (CSIC) Eduardo Cabello 6 Vigo 36208, Spain

Enric GisbertIRTA Centre de Sant Carles de la Ràpita (IRTA-SCR) Unitat de Cultius Experimentals Crta. del Poble Nou s/n Sant Carles de la Ràpita 43540, Spain

Rosalba GornatiDepartment of Biotechnology and Life Sciences University of Insubria, Via J.H. Dunant 3 Verese I-21100, Italy and Inter-University Centre for Research in Protein Biotechnologies “The Protein Factory” Polytechnic University of Milan and University of Insubria Via J.H. Dunant 3 Varese I-21100, Italy

Yanliang JiangThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Chao LiThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Shikai LiuThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Zhanjiang (John) LiuThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Paulino MartínezDepartamento de Genética Universidad de Santiago de Compostela Lugo 27002, Spain

David MazuraisIfremer Fish Nutrition Laboratory Plouzané 29200, France

B. NovoaInstituto Investigaciones Marinas (IIM) Consejo Superior de Investigaciones Cientificas (CSIC) Eduardo Cabello 6 Vigo 36208, Spain

Stéphane PanseratINRA, UMR1067 Nutrition Aquaculture et Génomique Saint-Pée-sur-Nivelle F-64310, France

Eric PeatmanThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Francesc PiferrerInstitut de Ciències del Mar Consejo Superior de Investigaciones Científicas (CSIC) Barcelona, Spain

Paola PisaniDepartment of Biological and Environmental Sciences and Technologies (DBEST) Laboratory of General Physiology University of Salento Via Provinciale Lecce-Monteroni Lecce I-73100, Italy

Laia RibasInstitut de Ciències del Mar Consejo Superior de Investigaciones Científicas (CSIC) Barcelona, Spain

Matthew L. RiseOcean Sciences Center Memorial University of Newfoundland A1C 5S7, Canada

Refugio Robles-SikisakaUniversity of California Department of Pathology School of Medicine San Diego, CA 92182, USA

Alessandro RomanoDepartment of Biological and Environmental Sciences and Technologies (DBEST) Laboratory of General Physiology University of Salento Via Provinciale Lecce-Monteroni Lecce I-73100, Italy

Mohamed SalemLaboratory of Animal Biotechnology and Genomics Division of Animal and Nutritional Sciences West Virginia University Morgantown, WV 26506-6108, USA

Carlo StorelliDepartment of Biological and Environmental Sciences and Technologies (DBEST) Laboratory of General Physiology University of Salento Via Provinciale Lecce-Monteroni Lecce I-73100, Italy

Marco SarogliaDepartment of Biotechnology and Molecular Sciences Animal Biotechnology and Aquaculture Unit University of Insubria Via J.H. Dunant 3 Varese I-21100, Italy

Fanyue SunThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Genciana TerovaDepartment of Biotechnology and Life Sciences University of Insubria,

Via J.H. Dunant 3 Verese I-21100, Italy and Inter-University Centre for Research in Protein Biotechnologies “The Protein Factory” Polytechnic University of Milan and University of Insubria Via J.H. Dunant 3 Varese I-21100, Italy

Tiziano VerriDepartment of Biological and Environmental Sciences and Technologies Laboratory of General Physiology University of Salento Via Provinciale Lecce-Monteroni Lecce I-73100, Italy

Ana ViñasDepartamento de Genética, Facultad de Biología (CIBUS) Santiago de Compostela, Spain

Ruijia WangThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Jose-Luis Zambonino-InfanteIfremer Fish Nutrition Laboratory Plouzané 29200, France

Jiaren ZhangThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Yu ZhangThe Fish Molecular Genetics and Biotechnology Laboratory Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences Aquatic Genomics Unit Auburn University Auburn, AL 36849, USA

Preface

Genomics as a discipline has achieved more in less than 30 years in generating a vast amount of biological information than the accumulation of all biological information in history. Currently, the genetic composition of a large number of organisms has been or is being deciphered, including a large number of aquaculture species. As increased understanding is gained with the structure, organization, and evolution of genomes, scientists finally come to the point to ask some really serious questions concerning gene functions. Genomics is now undergoing a transition or expansion from the mapping and sequencing of genomes to an emphasis on genome functions. Along with the transition, the major goals of genomics shift from structural genomics to functional genomics.

Functional genomics represents a new phase of genome analysis. It requires the development of innovative technologies that make use of the vast resource of structural genomics information. Specifically, functional genomics refers to the development and application of genome-wide experimental approaches to assess gene functions by making use of the information and resources provided by structural genomics. It is characterized by high throughput or large-scale experimental methodologies combined with statistical and computational analysis of results. The fundamental strategy in functional genomics is to expand the scope of biological investigation from studying a single gene or proteins, or a few genes, to studying all genes or proteins at once on a genome-wide scale. Such operations allow the generation of tremendously large data sets that demand additional capacities of data analysis to draw information relevant to biology. Computational biology will play a critical and expanding role in functional genomics, as it is characterized by mining the data sets for valuable biological information.

Functional analyses of genomes in aquaculture include all performance and production traits such as growth, development, nutrition, sex determination, reproduction, immunity and disease resistance, and response to various environmental and physiological conditions. Initially, such genome-scale issues can be addressed through association of genome expression with the performance, and then the functions of the genes involved in specific traits can be characterized into gene pathways. Eventually, however, the functions for the vast majority of aquaculture-related traits can be approached by the combination of transcriptome profiling for expression candidates, genome-wide association studies for positional candidates, comparative genome analysis for functional candidates, quantitative genomic analysis, and finally, directly testing the candidate genes to validate their functions. Because aquaculture species involve a wide spectrum of organisms ranging from invertebrate to vertebrate animals, and many of the involved species have very special biological characteristics such as extremely high fecundity, extremely high and low tolerance to low dissolved oxygen, among many others, functional genomics research of aquaculture species will undoubtedly have a profound impact on biology.

This book has 15 chapters. The first two chapters provide a general introduction to functional genomics and general approaches for functional genomics research in aquaculture, and existing genomic resources for functional genomics in aquaculture species. Chapters 3–10 are devoted to functional genomics of growth, development, reproduction, nutrition, and responses to stress and diseases; and the last five chapters are specific chapters that cover functional genomic studies in rainbow trout, Atlantic cod, catfish, shrimp, and molluscs.

This book covers the current status, technologies, progress, and perspectives of functional genomics in aquaculture. It should be useful to academic professionals, research scientists, graduate students and college students in agriculture, as well as to students of aquaculture and fisheries. We are grateful to all the contributors of this book. It is their great experience and efforts that has made this book possible. We have had another year of pleasant experience interactingwith Ms. Susan Engelken, Editorial Program Coordinator; Anna Ehler, Senior Editorial Assistant; and Justin Jeffryes, Commissioning Editor for Plant Science, Agriculture, and Aquaculture with Wiley-Blackwell of John Wiley & Sons.

During the course of editing this book, we have worked extremely hard to fulfill our duties as professors and administrators. We would like to thank our families, particularly Genciana Terova and Dongya Gao, for their persistent support and love.

Marco Saroglia John Liu

Chapter 1

Functional Genomics Research in Aquaculture: Principles and General Approaches

Shikai Liu, Yu Zhang, Fanyue Sun, Yanliang Jiang, Ruijia Wang, Chao Li, Jiaren Zhang, and Zhanjiang (John) Liu

Abstract: Functional analysis has always been more difficult, but it is nothing compared with structural analysis. This is especially true when the number of genes under study is increased to cover various systems and pathways on a genome scale. In this chapter, we provide an overview of functional genomics focusing on general approaches for functional genomics to include (1) Functional inference based on expression profiling such as analysis of expressed sequence tag (EST) analysis, microarray analysis, and RNA-Seq; (2) Functional inference of gene functions based on positional analysis such as genome-wide association studies (GWAS), quantitative trait loci (QTL) mapping, and expression quantitative trait loci (eQTL) mapping; (3) Functional inference of gene functions by comparative genome analysis; (4) Gene pathway analysis; and (5) Experimental determination of gene functions using novel technologies, such as the zinc finger nuclease (ZFN) technology. We also provide a section on epigenetics and analysis of protein–DNA interactions. At the end of the chapter, we offer our assessment of the potential of various technologies for functional genomics in aquaculture.

Introduction

Genomics as a branch of science started to make headway during the early to mid-1980s.

At the beginning, it started with a major research project “The Human Genome Project.” As the genomic information was accumulated, and data was analyzed, a series of specific genomic methodologies were developed. With the rapid advances in technology, particularly the advances in PCR and sequencing technologies, a series of highly efficient approaches for genomic studies were developed. As a result of scientific demand and technological advances, a very specific branch of science evolved that is now called Genomics.

To gain better understanding of genomics, we must examine its roots. The term “genome” itself is more than 75 years old and refers to the entire genetic material of an organism, or its complete set of genes located on chromosomes (Hieter and Boguski, 1997). In 1986, “genomics” was coined by Thomas Roderick to describe the scientific discipline of mapping, sequencing, and analyzing genomes (Mckusick, 1989). The term of genomics has become universally accepted over the past two decades. However, genomics is now undergoing a transition or expansion from the mapping and sequencing of genomes to an emphasis on genome functions. To reflect this shift, genome analysis may be generally divided into “structural genomics” and “functional genomics.” Structural genomics represents an initial phase of genome analysis, studies the structure, organization, and evolution of genomes, while functional genomics, which studies expression and functions of the genomes. Structural genomics has a clear end point—the construction of high-resolution genetic, physical, and sequence maps of an organism. The ultimate map of an organism is its complete DNA sequence with a resolution of every single base pair (Hieter and Boguski, 1997). Although, genomics in its major research objectives can be divided into structural genomics and functional genomics, there is no clear separation of these subdisciplines. Furthermore, structural genomics is the basis for functional genomics.

Functional genomics represents a new phase of genome analysis (Hieter and Boguski, 1997). It requires the development of innovative technologies that make use of the vast resource of structural genomics information. Specifically, functional genomics refers to the development and application of genome-wide experimental approaches to assess gene functions by making use of the information and reagents provided by structural genomics. It is characterized by high throughput or large-scale experimental methodologies combined with statistical and computational analysis of the results. The fundamental strategy in a functional genomics approach is to expand the scope of biological investigation from studying single gene or protein to studying all genes or proteins at once on a genome-wide scale. Such operations allow the generation of tremendously large data sets that demand additional capacities of data analysis to draw information relevant to biology. Assistance is needed from all areas of biology, and more so from disciplines outside biology that can handle large data sets. Computer sciences and mathematics are among the first disciplines genomics has demanded cooperation from. Computational biology will play a critical and expanding role in this area: structural genomics has been characterized by data generation and management, whereas functional genomics will be characterized by mining the data sets for valuable biological information. Functional genomics promises to rapidly narrow the gap between sequence and function and to yield new insights into the behavior of biological systems (Hieter and Boguski, 1997).

The goal of this chapter is to provide some basic concepts of functional genomics, and provide a general description of approaches for functional genomics research in aquaculture.

The Concept of Functional Genomics

Genes and Gene Functions

Most eukaryotic organisms harbor tens of thousands of genes, and in most cases, perhaps more than 20,000 but fewer than 40,000 genes. Each gene has its own functions. Historically, gene functions were determined by observations of a phenotypic mutation followed by genetic mapping of the mutated phenotype and eventually trace to the gene controlling the trait. Such an approach is usually considered to be a forward genetics approach. This approach is highly straightforward; however, mutations, whether as a result of spontaneous mutation, or induced mutation, are rare and mutated phenotypes are oftentimes difficult to be observed in the first place. With the rapid progress of DNA sequencing technologies, scientists started to know much quicker about a specific gene sequence and its protein structures than the functions of the gene. As a result, a new set of reverse genetics approaches was developed. In reverse genetics, the simple concept is to inactivate the gene, and then determine the changes of the phenotypes; alternatively, to add more of the gene products, and then determine the changes of the phenotypes. In the former, genes can be specifically targeted to “knockout” the gene. This worked really well with some model species where embryonic stem cell technology is available. For instance, with rat or mouse, a specific gene can be knocked out in an embryonic cell line, and then individuals can be developed from the cells, and phenotypes can be observed in the “knocked out” animals. However, this approach is to date not applicable to aquaculture species because embryonic stem cell technologies have not been developed for aquaculture species.

In recent years, the discovery of RNA interference (RNAi) has lent research tools for the study of gene functions. RNAi is an RNA-dependent gene silencing process that has been applied to knockdown gene expression (Hannon, 2002). This is particularly useful for some aquaculture species, as RNAi technologies have been applied to investigate gene function as well as to develop antiviral agents to combat various infections in some fish and crustaceans species (Acosta et al., 2005; Copf et al., 2006; Liu et al., 2006; Wargelius et al., 1999; Kelly and Hurlstone, 2011). For instance, the highly efficient gene knockdown was observed to result in similar embryonic defects to the known mutant phenotypes in zebrafish (Wargelius et al., 1999). In a study of genes expressed differentially in freshwater crayfish infected with the white spot syndrome virus (WSSV), the anti-lipopolysaccharide factor gene (AF) was shown to protect against WSSV infection, and knockdown of AF by RNAi specifically resulted in higher rates of viral propagation (Liu et al., 2006).

In addition to gene knockout and knockdown technologies, transgenic technology has been widely used to demonstrate the functions of genes. The basic principle is that if a gene has certain functions, its over-expression should cause changes in phenotypes. Such an approach was best demonstrated by transgenic fish harboring the growth hormone gene that grow much faster and bigger than their non-transgenic controls (Du et al., 1992; Gross et al., 1992; Devlin et al., 1994; Rahman et al., 1998; Rahman et al., 2001).

All the traditional approaches for the study of gene functions are effective, but they have fatal problems that include the following:

1. Not all genes can cause a visible phenotype.
2. Knockout of one gene may cause numerous other changes in genome expression; many of these are compensatory or consequential, making analysis of gene functions very difficult.
3. Functional study of gene function by “one gene at a time” is too laborious, too expensive, and too slow.

As a result, scientists have explored to study gene functions on the genomic scale that leads to the emergence of functional genomics.

Concept of Functional Genomics

Functional genomics can be defined as a discipline for the understanding of gene functions and regulation on a genome-wide scale. It is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and their interactions. Unlike structural genomics and proteomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein–protein interactions and interactions between proteins, DNA, and RNA. As opposed to the static aspects of the genomic information such as nucleotide and amino acid sequences or structures, functional genomics attempts to answer questions about the functions of DNA at the levels of genes, RNA transcripts, and protein products (Hackett and Clark, 2007). The ultimate goal for functional genomics is to bridge the gap between the blueprint (genome sequence or genotype) and the living organism (trait or phenotype) under various environmental conditions (Cogburn et al., 2007). Spawned by the technological revolution in genome sequencing (Venter et al., 1996; Rowen et al., 1997), functional genomics studies currently are greatly stimulated by the high-throughput sequencing and screening technologies. Therefore, a key characteristic of functional genomics studies is the genome-wide approach generally involving high-throughput technologies rather than the traditional “gene-by-gene” approach.

Goals of Functional Genomics

The goals of functional genomics are to gain better understanding of the roles of functional elements in the genome that directly or indirectly affect the development, growth, metabolism, immunity, behavior, reproduction, and various other processes of an organism. One of the primary tasks of functional genomics is assigning specific functions to genes, noncoding RNAs, and cis-Acting DNA elements involved in the processes explained earlier (Hackett and Clark, 2007). The goals of functional genomics vary depending on organism and project. In the broad sense, functional genomics studies are conducted with the following objectives:

1. To discover functional elements from genomic sequence of organisms, including protein coding genes and regulatory noncoding regions;
2. To obtain the global assessment of how the expression of all genes in the genome varies under changing conditions;
3. To generate resources and develop methodologies for genome-wide mutagenesis deducing the functions of novel genes by mutating them and studying the mutant phenotype;
4. To genetically manipulate organisms for specific purposes;
5. To understand the evolution of genomes in relation to biology of organisms across a spectrum of evolutionarily-related species.

Approaches to Functional Genomics

Numerous approaches with sophisticated experimental techniques have been developed for functional genomics studies. The huge numbers of genes, and even larger numbers of transcripts contribute to the complexity of functional genomics analysis. Such complexities are amplified again by an even greater diversity of polypeptides and their post-translational modifications, the variation in expression from different tissues under different developmental stages, and under various environmental conditions (Hackett and Clark, 2007). High throughput methodologies are required to simultaneously examine the expression and functions of all genes of an organism. Clearly, high-speed computation is an essential feature of most functional genomics techniques. In the later sections, we describe some of the general approaches currently used in functional genomics studies, with the understanding that more powerful methods are constantly emerging, especially as computational power continues to improve. The improvements include tools and instruments for digital recording of information and newer algorithms for sorting and analyzing information developed as our understanding of gene function increases. Because of the large quantity of data generated by these techniques and the desire to discover biologically meaningful patterns, bioinformatics is crucial to functional genomics data analysis (Hackett and Clark, 2007).

While whole genome sequencing is now a routine, functional analysis continues to be a great challenge. There are several general strategies for elucidating the functions and regulation of gene expression. In terms of strategies, functional genomics approaches can be divided into four general categories: (1) Functional correlation of gene expression and thereby inference of gene functions; (2) Functional correlation of gene positions and thereby inference of candidate gene functions; (3) Functional assignment by association with gene pathways; and (4) Direct testing of gene functions.

Functional Correlation of Gene Expression Profiling

Short of direct testing that is often very difficult, gene functions can be inferred from their correlations of expression with function, or their correlations of position with function. The basis for functional inference from expression correlation is expression profiling. The idea is that if a gene is involved in a specific trait, e.g., defense response, its expression would respond to infection. Gene expression profiling is the measurement of expression levels for thousands of genes to paint a general global picture of gene expression under a specific developmental stage, environmental condition, or treatment. In practice, gene expression profiling experiments often involve measuring the relative amount of mRNA expressed in two or more experimental conditions (“treatment”). The correlation of “treatment” and gene expression profile could provide inference on gene functions.

Various technologies have been developed to quantify gene expression, including hybridization-based and sequence-based approaches. Hybridization-based approaches typically involve incubating fluorescently-labeled complementary DNA (cDNA) with custom-made microarrays or commercial high-density oligo microarrays. In contrast to microarray methods, sequence-based approaches determine gene expression levels by directly sequencing cDNAs. The relative abundance of cDNAs reflect gene expression levels. Initially, Sanger sequencing of cDNA or expressed sequence tag (EST) libraries were used (Gerhard et al., 2004), and then tag-based methods were developed to improve the throughput, including serial analysis of gene expression (SAGE) (Velculescu et al., 1995; Harbers and Carninci, 2005) and massively parallel signature sequencing (MPSS) (Brenner et al., 2000). These tag-based sequencing approaches are high throughput and can provide precise, digital gene expression (DGE) levels. Recently, the development of high-throughput DNA sequencing technologies provides additional strengths for gene expression profiling. This method, termed as RNA-Seq, has clear advantages over previous sequence-based approaches and is expected to revolutionize the manner in which transcriptomes are analyzed for both gene discovery and global gene expression profiling.

In terms of functional genomics, the largest portion of published literature to date involves gene expression profiling through sequencing-based approaches such as EST analysis, SAGE, MPSS, and more recently developed RNA-Seq, and hybridization-based microarray analysis. We will limit our discussion to EST analysis, microarrays, and RNA-Seq as these are the most commonly used approaches.

Analysis of Expressed Sequence Tags (ESTs)

ESTs are single-pass sequences of random cDNA clones from cDNA libraries. They are traditionally generated using Sanger sequencing and therefore the resultant sequences are approximately 500 to 800 base pairs in length. Several years ago, because sequencing was relatively cheap, large numbers of ESTs can be generated at a reasonably low cost from either the 5′ or 3′ end of a cDNA clone to get an insight into transcriptionally active regions. ESTs were used as a primary resource for human gene discovery (Adams et al., 1991). Thereafter, there has been an exponential growth in the generation and accumulation of EST data in public databases for various organisms, with approximately 71 million ESTs now available in public databases (http://www.ncbi.nlm.nih.gov/dbEST/, September 2011, all species). Readers can refer to Chapter 2 of this volume for availability of ESTs among various aquaculture species.

EST analysis is an effective genomic approach for rapid identification of expressed genes, and has been widely used in genome-wide gene expression studies in various tissues, developmental stages or under different environmental conditions (Adams et al., 1995; Ronning et al., 2003). In addition, the availability of cDNA sequences has accelerated further molecular characterization of genes of interest and provided sequence information for microarray construction and genome annotation (Bailey et al., 1998; Lo et al., 2003; Kim et al., 2006).

Gene expression analysis plays an important role in identifying differentially expressed genes under different environmental conditions and gene expression regulation, shedding light on gene functions. EST analysis has been demonstrated effective for detection of differential expression and regulation of certain genes. Without normalization or subtraction in library construction, the number of the sequenced ESTs for a given gene reflected the abundance of the gene expression at the corresponding scenario (e.g., environmental conditions, developmental stages, treatments, etc.).

Direct EST sequencing is inefficient in discovery of rarely expressed genes. To solve this problem, the method to construct normalized cDNA libraries was developed (Soares et al., 1994; Bonaldo et al., 1996). The basic principle is using hybridization to reduce redundant genes and increase the representation of rarely expressed genes.

Initial annotation of ESTs can be conducted by simple sequence similarity comparisons. Further annotation analysis can be carried out after obtaining the consensus sequences (putative unigenes), such as determination of gene identity based on homology search, open reading frame (ORF) identification, Gene Ontology (GO) annotation and gene-enrichment analysis (e.g., Nakaya et al., 2007).

In order to assign gene identity to contigs and singletons, homology search is widely used. Such an approach is especially helpful for newly-studied species. BLAST is the most widely used program to obtain high throughput EST analysis and annotation results. BLAST package provides different flavors of algorithm for sequence similarity searching. BLASTX is used to search against protein database by translated consensus EST sequences while BLASTN is used to search against nucleotide sequence databases. NCBI, ENSEMBL and Swiss-Prot are three important databases for BLAST search. For instance, Swiss-Prot database have fully manually curated and annotated unigene database, Uniprot, which can be used for identifying putative function for unigene by BALSTX. NCBI provide dbEST database that can be used to search novel transcript by BLASTN. dbEST is a main ESTs resource database including ESTs for over 200 aquaculture species. ENSEMBL database can provide chromosome location information of genes, which is a useful tool for comparative genome analysis. However, BLAST sequence similarity comparison provides only sequence homology information, and one cannot purely rely on BLAST for gene identification. Detailed phylogenetic analysis and/or orthology analysis is needed to determine the identities of genes.

For a greater level of annotation, ORF is identified to determine the full or portion of coding region in the unigene. The unigene with a full ORF usually represent a full-length cDNA. There are some useful tools for ORF detection. For example, ESTScan (Iseli et al., 1999) can extract coding regions from low-quality ESTs and correct frame shift errors. OrfPredictor (Min et al., 2005) is another program for identification of protein-coding sequences from ESTs through predicting most probable coding regions from all the six translation frames.

GO annotation can provide description of gene products behaving in a cellular context. Gene functions are placed into three categories: biological processes, cellular components, and molecular functions. Consensus sequences can be linked to GO terms and assigned a possible function by Blast2GO (Conesa et al., 2005).

GO enrichment analysis is to cluster most relevant GO terms associated with certain biological pathway. GOEAST (Zheng and Wang, 2008), Ontologizer (Bauer et al., 2008), GeneTrail (Backes et al., 2007), and DAVID functional annotation tool (Huang et al., 2009) are useful tools for these analysis.

EST analysis is an efficient approach for gene discovery and gene identification. For instance, during 2001 to 2007, catfish ESTs increased from 10,000 to 44,000 and the putative genes number increased from 5905 to 25,000 (Li et al., 2007). In Pacific oyster (Crassostrea gigas), 40,845 high-quality ESTs represented 29,745 unique transcribed sequences (Fleury et al., 2009). In gilthead sea bream (Sparus auratus), 30,000 ESTs represented 18,196 putative unigenes (Louro et al., 2010). Currently, there are over 180 aquaculture species having more than 100 ESTs in dbEST (see Chapter 2 of this volume on existing genome resources for details).

EST analysis can provide comparisons of gene expression profiling in different tissues and conditions. For instance, in a recent study with rainbow trout (Oncorhynchus mykiss), Kondo et al. (2011) sequenced over 30,000 ESTs from rainbow trout adipose tissue. These ESTs were used to search adipokine-related genes. The result showed that none of them encoded adipokine and PPAR-γ gene, which play important roles in mammalian adipocytes. Further qRT-PCR result confirmed EST analysis results, that is, rainbow trout adiponectin transcripts were weakly detected in adipose tissue but strongly detected in muscle, suggesting the difference of energy metabolism between fish and mammal (Kondo et al., 2011). Chini et al. (2008), constructed normalized cDNA libraries from liver, ovary and testis in blue fin tuna (Thunnus thynnus), identifying several sequences with known function in other organisms, but not previously described in this species. Also, sequences were described being expressed in one, two, or more tissue libraries. Similarly, Zou et al. (2011) constructed normalized cDNA libraries from testis, ovary, and mixed organs of mud crab (Scylla paramamosain). Through EST analysis, sex-specific transcripts were identified.

EST resources provide sequence information for microarray development. For instance, in a recent study, Booman et al. (2011) developed a large-scale oligonucleotide microarray platform containing 20,000 features (20K), which was used to study immune response of the Atlantic cod spleen with stimulation of formalin-killed, atypical Aeromonas salmonicida (Booman et al., 2011). Similarly, oligo microarray for gilthead sea bream (Sparus aurata) was developed based on ESTs, and the microarray was used to identify 1050 differentially expressed genes between two developmental stages (Ferraresso et al., 2008).

Although EST analysis has been important for transcriptome characterization, it is now becoming expensive, relative to several of the most recently developed approaches, as described in the following text. However, EST resources still have a great value to serve as reference for RNA-Seq analysis. We found that ESTs are essential for high-quality reference-guided assembly of next-generation sequencer-generated short reads (Liu et al., 2011).

Microarrays

Microarray is a powerful tool that allows analysis of global gene expression in individual cells or tissues under different conditions (Schena et al., 1995). The core principle of microarray is dense placement of gene target sequences in a small area and hybridization. Tens of thousands of DNA sequences termed probes are anchored or spotted onto the solid surface of a chip. Fluorescence-labeled probes are used to hybridize with the features on the microarray. Microarray combines simple nucleic acid hybridization with high-density spotting robots, fluorescence-based signal detection and high-resolution laser scanners (Peatman and Liu, 2007). High-density spotting robots and photolithography allow each feature to be placed accurately on the slide in high densities. Fluorescence-labeled probe provides much clearer signal than the traditional radiation labeling. Moreover, the high-resolution laser scanner allows accurate fluorescence-signal quantification.

Based on the construction and sample labeling, there are two primary approaches to DNA microarray used in aquaculture species: the spotted arrays (or printed arrays) and the in situ arrays. Spotted arrays are constructed by spotting cDNA, small fragments of PCR products or long oligos using robot. This technique is adapted by most researchers to produce “in-house” printed microarray, because it is relatively low-cost and flexible. The researchers can decide the probes, generate their own probes, spot the array, hybridize the samples to the array, and scan the arrays with their own machine. However, it is labor-consuming. The number of spots (features) is limited to avoid cross- contamination. Two-color fluorescence, such as Cy3 and Cy5, are usually used for sample labeling for spotted array (Schena et al., 1996).

In situ arrays are constructed by synthesizing short oligos directly onto the slide surface by photolithography instead of depositing intact sequences. The oligo probes may be longer, like 65-mer (Mathavan et al., 2005), or shorter, like 24-mer (Peatman et al., 2007). Longer probes are more specific, whereas the shorter ones are cheaper and can be spotted in higher densities. In order to overcome the short probes to improve the specificity and sensitivity, in situ arrays contain high-density features, usually multiple probes per target (Miller and Tang, 2009). A perfect match (PM) and mismatch (MM) system is used to further improve the specificity (Irizarry et al., 2003; Han et al., 2004). MM probes contain one or more mismatched nucleotides within the PM probe sequences and act as a negative control to detect the false-positive signal resulted from the nonspecific cross hybridization.

Affymetrix (http://www.affymetrix.com/) is one of the most widely known industries for in situ array. Semiconductor-based photochemical synthesis and photolithographic masks are used to synthesize oligo probes for Affymetrix GeneChip. The photolithographic masks either block or allow light to reach the microarray surface. In the area the mask covers, the addition of the nucleotides will be prevented since the UV light has been blocked, whereas in the area exposed to the UV light, the specific nucleotide can be added. After many cycles of unmasked, addition of nucleotide and masked, the sequences of every oligo probes are fully constructed. Another commercial microarray manufacturer Roche NimbleGen (http://www.nimblegen.com/) has developed a maskless array synthesis, which uses digital mirrors instead of the photolithographic masks (Nuwaysir et al., 2002) at a significantly low start-up cost. Agilent Technologies construct the oligo probes for in situ array use glass slides and inkjet printing, neither photolithographic masks nor digital mirrors. Instead of Cy3/Cy5 labeling system, the sample using in situ array is usually Biotin–Streptavidin labeled. No matter which platform is used, spotted array or in situ array, the basic procedure for gene expression experiment is similar, starting with RNA. As shown in Figure 1.1, the RNA is extracted from the sample that we are interested and reverse transcribed to cDNA after quantification and quality check. The cDNA is fluorescently labeled and hybridized to the probes on the microarray. The hybridization will result in fluorescence signal, which can be measured by a fluorescence scanner and then be analyzed by using software, e.g., R/Bioconductor (http://www.bioconductor.org). Background correction and data normalization are conducted then to minimize variation caused by nonbiological effects (Xiang and Chen, 2000), followed by cluster analysis (Eisen et al., 1998), which establish gene expression patterns and define the relationships between gene expression profiles across different samples (e.g., treatment sample vs. control sample).

Figure 1.1 A schematic presentation of microarray experiment where expression of two samples are compared with A being the sample under treatment, and B being the sample for control. RNA is separately isolated from the samples, and fluorescently-labeled probes are made separately with different labels, e.g., cy3 for treatment (green), and cy5 for control (red). The probes are simultaneously used to hybridize an array containing the features representing the transcripts of the organism. After hybridization, signals are scanned and analyzed. If the signals are high in the treatment, green will be detected, and if the signals are high in control, red will be detected; if the signals are equal, yellow will be detected. Based on the relative signals of red and green, expression levels are determined.

When starting the microarray experiment, researchers need to keep in mind that all designs should meet the standards of the Microarray Gene Expression Data Society (MGED) and be compliant with the Minimum Information About a Microarray Experiment (MIAME) guidelines (http://www.mged.org/Workgroups/MIAME/miame.html). Depending on biological questions they are interested and availability of the financial and genetic resources, the researchers need to make appropriate decisions to construct the microarray experiment.

In recent years, along with more and more genetic resources available, such as ESTs, transcriptome sequences, whole genome sequences and so on, microarray technologies have been dramatically advanced and broadly applied. A variety of microarrays including low-density or high-density cDNA arrays and oligo arrays have been developed in aquaculture species, such as zebrafish (Ton et al., 2002; Mathavan et al., 2005), Salmonidae (von Schalburg et al., 2005; Koop et al., 2008), catfish (Li and Waldbieser, 2006; Peatman et al., 2007), Atalantic cod (Booman et al., 2011; Edvardsen et al., 2011), shrimp (Wongsurawat et al., 2010; Aoki et al., 2011; Leelatanawit et al., 2011), oyster (Wang et al., 2010; Dheilly et al., 2011), and so on. See Chapter 2 for information on the detail microarray resources. The applications of microarray in aquaculture species are mainly focused on the following aspects: (1) Development: Determining how genes interact and change the expression level during developmental process has been the main goal for development biology. Microarray, an efficient technology to globally identify the patterns of gene expression, is well utilized in aquaculture for development especially embryogenesis. Ton et al. (2002) constructed a zebrafish cDNA array to reveal dynamic change in levels of gene expression involved in development (Ton et al., 2002). An Atlantic cod cDNA microarray representing 7000 genes were used to analyze the temporal activity of the transcriptome during early cod embryogenesis (Drivenes et al., 2011). (2) Immunity or disease resistance: Many studies have been done by using microarray to screen and identifying genes that are involved in the immune system and disease resistance. For instance, Meijer et al. (2005) use a zebrafish oligo array containing 16K features to analyze a host transcriptome response to mycobacterium Mycobacterium marinum infection at the organismal level (Meijer et al., 2005). 28K oligo arrays have been developed in catfish to identify immune-related genes in catfish (Peatman et al., 2007). (3) Response to environmental variation or stress: Environmental variations, such as hypoxia, water temperature, and salinity, will cause changes in physiology, genomics, and gene expression for aquaculture species. A microarray containing 8046 medaka unigenes was developed to measure gene expression profiling in the brain, gill, and liver of medaka after exposed to hypoxia (Ju et al., 2007). Microarray analyzes also have been conducted on gene expression change in water temperature (Kassahn et al., 2007; Hirayama et al., 2008). (4) Reproduction: Reproduction is an important trait in aquaculture industry. Karoonuthaisiri et al. (2009) utilized a cDNA microarray to screen reproduction-related genes in giant tiger shrimp, and several transcripts were identified that play important roles during shrimp ovarian development.

Currently, microarrays are mainly used to accelerate gene expression analysis under various experimental conditions. In the future, it looks promising to use microarray for single nucleotide polymorphisms (SNP) analysis, quantitative trait loci (QTL) mapping, and disease diagnosis. However, microarray study in aquaculture species is in its infancy, mostly because of the incomplete whole genome sequences in most aquaculture species. It is an essential task for the aquaculture community to exploit and adapt the advances for their respective species.

High Throughput Sequencing of mRNA (RNA-Seq)

RNA-Seq takes advantage of high-throughput DNA sequencing technology to capture the complete set of mRNA transcripts in a cell of an organism (Nagalakshmi et al., 2010). In this approach, mRNA is reverse transcribed into cDNA and fragmented, then sequenced using a next-generation technology to generate reads that can be assembled to cover a good portion of the transcripts, if not the full length of transcripts (see illustrations in Figure 1.2). Based upon different choices of sequencing technology, the sequencing yields and read lengths vary.

Figure 1.2 A schematic presentation of RNA-Seq. The extracted RNA is first converted into a library of complementary DNA (cDNA fragments through either RNA fragmentation (left) or DNA fragmentation (right). Sequencing adaptors (depicted by short red bars and short purple bars) are subsequently ligated to each cDNA fragment (green lines) and short sequence reads (single end or paired ends) from each cDNA are generated using high-throughput sequencing technology. The resulting sequence reads [short lines beneath the genome sequence with three genes shown (fat blue bars)] are aligned with the reference genome to evaluate gene expression by counting mapped reads. In the given example, the gene on the very left are expressed at a very high level, and the gene in the middle is expressed at a relatively lower level.

Currently, three main next-generation sequencing platforms are widely used in the RNA-Seq, the 454, Illumina and ABI SOLiD. Among these platforms, the throughput varies from hundreds of thousands of reads for the 454 system to hundreds of millions of reads for the Illumina and ABI SOLiD systems (Marguerat and Bahler, 2010). The read lengths typically range from 30–100 bp for Illumina and SOLiD to 200–500 bp for 454. In general, Illumina and SOLiD platforms are relatively inexpensive, while the 454 technology offers longer reads, but is more expensive per run. Illumina, SOLiD and 454 technologies can be combined in a “hybrid assembly” strategy: short reads that are sequenced at a greater depth are assembled into contigs, and long reads are subsequently used to scaffold the contigs and resolve variants (Martin and Wang, 2011).

Two main approaches can be used for RNA-Seq data analysis. One way is to map the resulting reads to a reference genome or reference transcriptome. This is usually taken in well-studied species with sequenced genome. The other way is to do the de novo assembly for species without reference genome or transcriptome. Consequently, a genome-scale map that is composed of both the transcriptional structure and/or level of expression for each gene can be generated (Wang et al., 2009).

RNA-Seq, as a way of high-throughput sequencing method, is being widely used in functional genomics studies in aquaculture species and their related model fish species such as zebrafish (Hegedus et al., 2009; Aanes et al., 2011; Bontems et al., 2011; Ordas et al., 2011; Rosel et al., 2011; Vesterlund et al., 2011), catfish (Liu et al., 2011), Japanese sea bass (Xiang et al., 2010), Atlantic cod (Johansen et al., 2011), large yellow croaker (Mu et al., 2010), rainbow trout (Lewis et al., 2010; Salem et al., 2010; Purcell et al., 2011), European eel (Coppe et al., 2010), and spotted gar (Amores et al., 2011). The applications of RNA-Seq in aquaculture species are focused on these aspects: (1) Gene expression profiling, (2) Transcriptome characterization and gene annotation, and (3) Identification of gene-associated markers.

RNA-Seq can be used to identify differentially expressed genes under different treatments by measuring the expression level. For instance, in the study of transcriptome changes in zebrafish with mycobacterium infection (Hegedus et al., 2009) and zebrafish embryos with Salmonella infection (Ordas et al., 2011), Illumina's DGE system revealed the high degree of transcriptional complexity of the host response to both infections and resulted in the discovery of a common set of infection-responsive genes with induced expression in infected individuals. Stockhammer et al. (2010) used the combination of microarray analysis and whole transcriptome deep sequencing to analyze the response to bacterial infection with emphasis on identification of a gene set whose responsiveness during infection is highly dependent on Traf6. In a study with large yellow croaker infected with Aeromonas hydrophila, changes of multiple signaling pathways involved in immunity were revealed, which will facilitate the comprehensive understanding of the mechanisms involved in the immune response to bacterial infection (Mu et al., 2010). Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Japanese sea bass provided insight into the immune-relevant genes in marine fish. In the study, over 1000 strong infection-responsive transcripts were identified as significantly up- or down-regulated genes, suggesting the considerable alteration of the host transcriptome profile after the Vibrio harveyi infection (Xiang et al., 2010). In a study focusing on gene expression changes during development, RNA-Seq is used to compare the transcription profiles of four early developmental stages in zebrafish on a global scale. An enrichment of gene transcripts with molecular functions of DNA binding, protein folding and processing as well as metal ion binding was observed with progression of development (Vesterlund et al., 2011).

Transcriptome characterization and gene annotation is another area RNA-Seq can be applied. Transcriptome data generated through RNA-Seq can provide accurate and effective reagents for annotating the protein-coding genes. Despite the availability of complete genome sequences, a complete genome annotation would require knowledge of all transcription start and polyadenylation sites, exon–intron boundaries, splice variants, and regulatory sequences (Morozova et al., 2009). Because of its longer read lengths compared with other new sequencing technologies, 454 has been effectively used for de novo assembly of the transcriptome in several aquaculture species including lake sturgeon (Hale et al., 2009), rainbow trout (Salem et al., 2010), Atlantic cod (Johansen et al., 2011), and Yesso scallop (Hou et al., 2011). In an early study to characterize gene expression of gonad transcriptome in polyploid lake sturgeon using 454 sequencing, thousands of contigs were assembled and characterized from 454 reads providing an overview of transcription in lake sturgeon gonads, including the discovery of the genes and SNPs (Hale et al., 2009). High-throughput sequencing of the rainbow trout transcriptome using 454 sequencing technology significantly increased the suite of ESTs available for rainbow trout, allowing improved assembly and annotation of the transcriptome (Salem et al., 2010). A more recent study in the guppy, de novo assembly of the guppy (Poecilia reticulata) transcriptome using 454 sequence reads were conducted to detect sex-specific transcripts and provide a reference for gene expression analysis (Fraser et al., 2011).

Although shorter reads produced by Illumina or SOLiD compared with the 454 technology may be more challenging for de novo sequence assembly, the preexisting ESTs produced by Sanger sequencing can be used to facilitate the assembly (Liu et al., 2011), and the algorithms for short reads de novo assembly are being developed (e.g., Grabherr et al., 2011). Xiang et al. assembled the short reads from Illumina RNA-Seq deep sequencing to generate the nonredundant consensus which is subsequently used as references for DGE profile analysis (Xiang et al., 2010). RNA deep sequencing of the Atlantic cod transcriptome was conducted using the combination of 454, Illumina and ABI SOLiD platforms to increase the efficiency of assembly (Johansen et al., 2011).

RNA-Seq has been extensively used for the identification of gene-associated markers. In catfish, hundreds of thousands of gene-associated SNPs have been identified by deep sequencing of RNA from many individuals of both channel catfish and blue catfish, which will be used in development of high-density catfish SNP chips for genome-wide association studies (GWAS) (Liu et al., 2011). In the study to understand the adaptive divergence between dwarf and normal lake whitefish species, 454 sequencing was used with the aim to generate a set of SNP markers, 89 SNPs showed pronounced allele frequency differences between sympatric normal and dwarf whitefish (Renaut et al., 2010).

Comparisons of Gene Expression Profiling Techniques

Hybridization-based approaches represented by microarrays are currently most popular for gene expression profiling and are readily affordable for many laboratories. Various commercial and academic microarray platforms have been developed that vary in genome coverage, availability, specificity, and sensitivity (e.g., Affymetrix, Agilent, and NimbleGen). Microarray approaches are high throughput and relatively inexpensive. However, these methods have several limitations, including relying on prior knowledge about genome sequence; high background levels owing to cross-hybridization (Royce et al., 2007); and a limited dynamic range of detection because of both background and saturation of signals. Moreover, comparing gene expression levels across different microarray experiments is often difficult and can require complicated normalization methods (Wang et al., 2009), although metagenomic analysis is possible.

In contrast to microarrays, direct sequencing of cDNAs was a digital method for gene expression measurement by counting mRNA molecules in the sample. Sequencing of cDNA or EST libraries was initially conducted using Sanger sequencing technology (Gerhard et al., 2004), but Sanger sequencing is relatively low throughput, expensive, and laborious. Therefore, EST analysis using Sanger sequencing for gene expression profiling is no longer a good choice.

More recently, RNA-Seq has become an alternative to microarrays (Wang et al., 2009). RNA-Seq provides many advantages over the traditional tag-based transcriptome analysis or microarray analysis including (1) Similarity between traditional SAGE analysis and MPSS, where RNA-Seq does not require any prior knowledge of genome sequence information; (2) Its extremely high throughput greatly improves the coverage of the transcriptome. Currently, Illumina HiSeq 2000 can generate over 200 million reads per lane, and the reads can be increased three times more using the newest chemistry, i.e., over 600 million reads per lane, allowing capture of the vast majority of transcriptome including many of the rarely expressed transcripts; (3) It overcomes many of the shortcomings of microarrays such as biases introduced during hybridization of microarrays; (4) Its cost is relatively low. One lane of RNA-Seq costs approximately $4000 while the construction of a comprehensive microarray often costs more; (5) The high throughput of reads allow technical assembly of sequences into contigs for additional studies such as gene structures. However, such a strength is also a weakness as assembly of RNA-Seq reads without reference genome or transcriptome poses challenges. De novo assembly, in particular, requires a greater level of bioinformatic expertise.

While microarrays have limitations for in-depth gene expression analyzes, they have the advantage being very useful for the high throughput analysis of multiple samples (Reinartz et al., 2002). Therefore, it may be helpful to consider the microarray and RNA-Seq as being complementary in nature which can be used as different tools for different types of experiments. For instance, to generate in-depth and quantitative gene expression data for species lacking genome information such as most of aquaculture species, RNA-Seq would be the best technology of choice. After the generation of expressed sequence data, it may be necessary to examine whether sets of genes are differentially expressed in a large number of samples (e.g., individual gene expression variation) or under different conditions (e.g., different treatments), microarray analysis may be a better choice in terms of results and costs.

Functional Correlation of Gene Positions

In addition to expression correlation, gene functions can also be inferred from correlations of gene positions with the traits that are genetically also mapped to the same genomic location. In order to locate the positions of genes that are responsible for a certain trait, GWAS can be conducted. GWAS is a quantitative approach to analyze the association of whole genome DNA polymorphisms and a phenotypic trait, thereby localizing the genes underlining the trait.

Genome-Wide Association Studies (GWAS)

GWAS is a holistic whole-genome approach to robustly determine the association of DNA polymorphisms with correlated phenotypic traits. Most often, GWAS requires use of genome-wide polymorphic markers such as SNPs and at least hundreds of individuals with the phenotype information. We must stress the number of markers used because the more the markers, the better the markers over the entire genome; we also must stress the number of individuals because use of fewer than 200 individuals does not provide the confidence for the association. Many scientists feel that SNP association studies that are conducted on fewer than 200 animals that do not have either a confirmation population or functional data to support the association data are not reliable (e.g., James Reecy, Iowa State University, personal communication). The basis for GWAS analysis is that on a genome scale, most SNPs are distributed randomly in relation to the trait of interest, and only those SNPs tightly linked with the genes underlining the traits are in linkage disequilibrium (LD) in relation to the trait. GWAS usually involve tens or hundreds of thousands of SNPs that are tested on hundreds or thousands of individuals. These studies normally compare the DNA of two groups of participants: individuals with the trait, e.g., resistant fish, and individuals without the trait, e.g., susceptible fish.

GWAS has been extensively used for human disease research. For instance, in 2005, an association was found between age-related macular degeneration (ARMD) and a variation in the gene for complement factor H (CFH). Complement is a protein that regulates inflammation. Use of variation in the CFH gene, along with four other variants, can predict half the risk of ARMD between siblings, and this work was regarded as among the most successful examples of GWAS (Klein et al., 2005). Similarly, a GWAS involving genotyping of around 400K SNPs in a French case–control cohort allowed detection of an association between Type 2 diabetes and a variation in several SNPs in the genes TCF7L2, SLC30A8 and others (Sladek et al., 2007), which can explain a substantial portion of disease risk. In 2007, the Wellcome Trust Case Control Consortium carried out a GWAS of 14,000 cases of seven common diseases including coronary heart disease, Type 1 diabetes, Type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder, and hypertension. This study was successful in uncovering many new disease genes underlying these diseases (The Wellcome Trust Case Control Consortium, 2007).

In spite of being very powerful, GWAS has not been applied to aquaculture species. The primary reason was the lack of genome-wide polymorphic markers until recently. Now that a large number of SNPs are available for a number of aquaculture species, future application of GWAS in aquaculture species is clearly technically feasible. However, challenges related to low funding with aquaculture species are still paramount as genotyping of a large number of polymorphic markers with a large number of individuals is expensive.

The problems associated with functional inference based on genomic positions come from the inaccuracy of GWAS analysis. On the one hand, the genomic location that are “in suspicion” to be involved in the trait can still involve large genomic segments, e.g., millions of base pairs that include many genes within the segment. On the other hand, GWAS may point to several or even many genomic locations for the trait of interest, complicating further functional analysis.

Analysis of Quantitative Trait Loci (QTL)