Computational and Statistical Methods for Protein Quantification by Mass Spectrometry - Ingvar Eidhammer - E-Book

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry E-Book

Ingvar Eidhammer

0,0
80,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The definitive introduction to data analysis in quantitative proteomics

This book provides all the necessary knowledge about mass spectrometry based proteomics methods and computational and statistical approaches to pursue the planning, design and analysis of quantitative proteomics experiments. The author’s carefully constructed approach allows readers to easily make the transition into the field of quantitative proteomics. Through detailed descriptions of wet-lab methods, computational approaches and statistical tools, this book covers the full scope of a quantitative experiment, allowing readers to acquire new knowledge as well as acting as a useful reference work for more advanced readers.

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry:

  • Introduces the use of mass spectrometry in protein quantification and how the bioinformatics challenges in this field can be solved using statistical methods and various software programs.
  • Is illustrated by a large number of figures and examples as well as numerous exercises.
  • Provides both clear and rigorous descriptions of methods and approaches.
  • Is thoroughly indexed and cross-referenced, combining the strengths of a text book with the utility of a reference work.
  • Features detailed discussions of both wet-lab approaches and statistical and computational methods.

With clear and thorough descriptions of the various methods and approaches, this book is accessible to biologists, informaticians, and statisticians alike and is aimed at readers across the academic spectrum, from advanced undergraduate students to post doctorates entering the field.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 510

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Title Page

Copyright

Preface

Terminology

Acknowledgements

Chapter 1: Introduction

1.1 The composition of an organism

1.2 Homeostasis, physiology, and pathology

1.3 Protein synthesis

1.4 Site, sample, state, and environment

1.5 Abundance and expression – protein and proteome profiles

1.6 The importance of exact specification of sites and states

1.7 Relative and absolute quantification

1.8 In vivo and in vitro experiments

1.9 Goals for quantitative protein experiments

1.10 Exercises

Chapter 2: Correlations of mRNA and protein abundances

2.1 Investigating the correlation

2.2 Codon bias

2.3 Main results from experiments

2.4 The ideal case for mRNA-protein comparison

2.5 Exploring correlation across genes

2.6 Exploring correlation within one gene

2.7 Correlation across subsets

2.8 Comparing mRNA and protein abundances across genes from two situations

2.9 Exercises

2.10 Bibliographic notes

Chapter 3: Protein level quantification

3.1 Two-dimensional gels

3.2 Protein arrays

3.3 Western blotting

3.4 ELISA – Enzyme-Linked Immunosorbent Assay

3.5 Bibliographic notes

Chapter 4: Mass spectrometry and protein identification

4.1 Mass spectrometry

4.2 Isotope composition of peptides

4.3 Presenting the intensities – the spectra

4.4 Peak intensity calculation

4.5 Peptide identification by MS/MS spectra

4.6 The protein inference problem

4.7 False discovery rate for the identifications

4.8 Exercises

4.9 Bibliographic notes

Chapter 5: Protein quantification by mass spectrometry

5.1 Situations, protein, and peptide variants

5.2 Replicates

5.3 Run – experiment – project

5.4 Comparing quantification approaches/methods

5.5 Classification of approaches for quantification using LC-MS/MS

5.6 The peptide (occurrence) space

5.7 Ion chromatograms

5.8 From peptides to protein abundances

5.9 Protein inference and protein abundance calculation

5.10 Peptide tables

5.11 Assumptions for relative quantification

5.12 Analysis for differentially abundant proteins

5.13 Normalization of data

5.14 Exercises

5.15 Bibliographic notes

Chapter 6: Statistical normalization

6.1 Some illustrative examples

6.2 Non-normally distributed populations

6.3 Testing for normality

6.4 Outliers

6.5 Variance inequality

6.6 Normalization and logarithmic transformation

6.7 Exercises

6.8 Bibliographic notes

Chapter 7: Experimental normalization

7.1 Sources of variation and level of normalization

7.2 Spectral normalization

7.3 Normalization at the peptide and protein level

7.4 Normalizing using sum, mean, and median

7.5 MA-plot for normalization

7.6 Local regression normalization – LOWESS

7.7 Quantile normalization

7.8 Overfitting

7.9 Exercises

7.10 Bibliographic notes

Chapter 8: Statistical analysis

8.1 Use of replicates for statistical analysis

8.2 Using a set of proteins for statistical analysis

8.3 Missing values

8.4 Prediction and hypothesis testing

8.5 Statistical significance for multiple testing

8.6 Exercises

8.7 Bibliographic notes

Chapter 9: Label based quantification

9.1 Labeling techniques for label based quantification

9.2 Label requirements

9.3 Labels and labeling properties

9.4 Experimental requirements

9.5 Recognizing corresponding peptide variants

9.6 Reference free vs. reference based

9.7 Labeling considerations

9.8 Exercises

9.9 Bibliographic notes

Chapter 10: Reporter based MS/MS quantification

10.1 Isobaric labels

10.2 iTRAQ

10.3 TMT – Tandem Mass Tag

10.4 Reporter based quantification runs

10.5 Identification and quantification

10.6 Peptide table

10.7 Reporter based quantification experiments

10.8 Exercises

10.9 Bibliographic notes

Chapter 11: Fragment based MS/MS quantification

11.1 The label masses

11.2 Identification

11.3 Peptide and protein quantification

11.4 Exercises

11.5 Bibliographic notes

Chapter 12: Label based quantification by MS spectra

12.1 Different labeling techniques

12.2 Experimental setup

12.3 MaxQuant as a model

12.4 The MaxQuant procedure

12.5 Exercises

12.6 Bibliographic notes

Chapter 13: Label free quantification by MS spectra

13.1 An ideal case – two protein samples

13.2 The real world

13.3 Experimental setup

13.4 Forms

13.5 The quantification process

13.6 Form detection

13.7 Pair-wise retention time correction

13.8 Approaches for form-tuple detection

13.9 Pair-wise alignment

13.10 Using a reference run for alignment

13.11 Complete pair-wise alignment

13.12 Hierarchical progressive alignment

13.13 Simultaneous iterative alignment

13.14 The end result and further analysis

13.15 Exercises

13.16 Bibliographic notes

Chapter 14: Label free quantification by MS/MS spectra

14.1 Abundance measurements

14.2 Normalization

14.3 Proposed methods

14.4 Methods for single abundance calculation

14.5 Methods for relative abundance calculation

14.6 Comparing methods

14.7 Improving the reliability of spectral count quantification

14.8 Handling shared peptides

14.9 Statistical analysis

14.10 Exercises

14.11 Bibliographic notes

Chapter 15: Targeted quantification – Selected Reaction Monitoring

15.1 Selected Reaction Monitoring – the concept

15.2 A suitable instrument

15.3 The LC-MS/MS run

15.4 Label free and label based quantification

15.5 Requirements for SRM transitions

15.6 Finding optimal transitions

15.7 Validating transitions

15.8 Assay development

15.9 Exercises

15.10 Bibliographic notes

Chapter 16: Absolute quantification

16.1 Performing absolute quantification

16.2 Label based absolute quantification

16.3 Label free absolute quantification

16.4 Exercises

16.5 Bibliographic notes

Chapter 17: Quantification of post-translational modifications

17.1 PTM and mass spectrometry

17.2 Modification degree

17.3 Absolute modification degree

17.4 Relative modification degree

17.5 Discovery based modification stoichiometry

17.6 Exercises

17.7 Bibliographic notes

Chapter 18: Biomarkers

18.1 Evaluation of potential biomarkers

18.2 Evaluating threshold values for biomarkers

18.3 Exercises

18.4 Bibliographic notes

Chapter 19: Standards and databases

19.1 Standard data formats for (quantitative) proteomics

19.2 Databases for proteomics data

19.3 Bibliographic notes

Chapter 20: Appendix A: Statistics

20.1 Samples, populations, and statistics

20.2 Population parameter estimation

20.3 Hypothesis testing

20.4 Performing the test – test statistics and p-values

20.5 Comparing means of populations

20.6 Comparing variances

20.7 Percentiles and quantiles

20.8 Correlation

20.9 Regression analysis

20.10 Types of values and variables

Chapter 21: Appendix B: Clustering and discriminant analysis

21.1 Clustering

21.2 Discriminant analysis

21.3 Bibliographic notes

Bibliography

Index

This edition first published 2013 © 2013 John Wiley & Sons, Ltd

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data applied for.

A catalogue record for this book is available from the British Library.

ISBN: 978-1-119-96400-1

Preface

Mass spectrometry based proteomics has quickly become the method of choice for the high-throughput analysis of entire proteomes. Through the advent of a variety of powerful approaches for quantitative proteomics, these experiments now also yield large volumes of quantitative data on protein expression levels.

The ability to quantify a proteome in high-throughput opens up many new possibilities for research, including protein biomarker discovery, fundamental insight into the dynamics of a cell or tissue proteome over time, and analysis of expression changes in function of (external) perturbations. Such analyses in turn benefit both biological insight as well as clinical applications, and form a crucial element of the emerging field of systems biology.

Adding quantitative data to the already complex output of a proteomics experimentation does however further increase the importance of correct data processing and results evaluation. As such, the computational methods employed to analyze such data are increasingly seen as a crucial step in the overall workflow.

With this in mind the intention of this book is to systemize and describe the different experimental approaches for protein quantification by mass spectrometry, and to present the corresponding computational and statistical methods used to analyze data from such experiments.

The first three chapters act as an introduction, introducing terms, concepts, and protein quantification as well as its relation to mRNA quantification. Chapter 4 then provides a brief introduction to mass spectrometry based proteomics, of which a more detailed description can be found in Eidhammer et al. (2007). Chapter 5 systematizes and classifies the different quantification approaches that are described in more detail in Chapters 9 to 15, each of which are dedicated to a specific approach. Statistical data processing and analysis for quantitative proteomics experiments form the subject of Chapters 6, 7, and 8. The final four Chapters (16 to 19) deal with more specific quantification tasks and a few orthogonal topics associated with protein quantification.

Given that statistics form an essential part of the computational methods used to process, analyze, and interpret quantitative mass spectrometry data, we have also included two appendices that can be used as a reference on the statistical knowledge required for a complete understanding of the methods described in the various chapters.

It is not the goal of this book to provide an exhaustive theoretical foundation for the computational analysis of quantitative proteomics, but rather to present the main challenges, and extract and systematize the common principles used in solving these problems. The presentation is illustrated throughout by numerous figures and examples.

We have tried to restrict the description of a given subject to one section or chapter, but for some of the subjects it is unavoidable to treat them several times in different contexts. In these cases we have included the necessary cross references.

Note that we have also included many references and websites, although we are aware that websites can change rapidly.

The book is directed at biologists, (bio-)informaticians, and statisticians alike, and is aimed at readers across the academic spectrum, from advanced undergraduate students to post docs first entering the field of protein quantification.

Terminology

Many of the terms used in molecular biology and proteomics do not have a unique or commonly accepted definition. We have mainly tried to follow the IUPAC (International Union of Pure and Applied Chemistry) Compendium of analytical nomenclature. They have an ongoing revision of the terms used in the larger field of mass spectrometry, and a first draft exists.

IUPAChttp://www.iupac.org/publications/analytical_compendiumOngoing revisionhttp://www.msterms.com/First drafthttp://www.sgms.ch/links/IUPAC_MS_Terms_Draft.pdf

We have endeavored to employ customarily used terms wherever possible, defining our own terms only where it was necessary or more appropriate for clarity. When several synonyms exist for a term, these are also given.

Acknowledgements

Numerous papers and websites have been used in the writing of this book, and citations have been given for these works. However, in order to enhance readability we have tried to minimize literature references in the text, and have instead included a bibliographic section at the end of each chapter. The sources we consulted are detailed there, as well as additional literature that may be interesting or relevant to the reader. It is our sincere hope that this structure will serve to provide the authors of the source literature with the appropriate acknowledgments.

In addition, there are a lot of people whom we have consulted during the work, and we especially want to thank Rein Aasland, Thin Thin Aye, Frode Berven, Niklaas Colaert, Sven Degroeve, Olav Mjaavatten, Jill Anette Opsahl, Eystein Oveland, Kjell Petersen, Pål Puntervoll, and An Staes for helpful discussions and for sharing their insights and profound knowledge.

IE dedicates his work to his wife Maria.

GEE thanks the Centre for Clinical Research, Haukeland University Hospital and the Departement of Public Health and Primary Health Care, University of Bergen for approving and providing facilities to work with this book. Also he thanks his colleagues in the Western Regional Health Authorities Statisticians Network and in the Life Style Epidemiology Research Group for inspiring work environment, as well as his co-workers in numerous medical and health related research projects through the years. Finally, GEE sincerely thanks his wife Kirsti, his family and his friends for continuing love and support.

HB would like to thank his friends and colleagues at the Proteomics Unit at the Department of Biomedicine, and at the Department of Informatics / the Computational Biology Unit for numerous interesting discussions and collaborations over the last years. Thanks also to my international collaborators in the extended Computational Omics and Systems Biology group and the PRIDE team at the European Bioinformatics Institute. Our ongoing collaborations make it a lot easier to see how the work I do every day constitutes a small part of the bigger puzzle that is proteomics. Finally, I would like to thank my friends and family for their continuing support.

LM would like to thank Ghent University, VIB, and his Computational Omics and Systems Biology (CompOmics) group members for creating a vibrant and intellectually stimulating atmosphere, and for many useful discussions. Special thanks go out to my wife Leen, and two sons, Ruben and Alexander, for their patience and understanding during the long hours of writing. Ruben, at 4 years of age, was fond of asking why daddy was always working on his computer, while Alexander at 9 months was simply looking bemused at the incessant sound of fingers tapping away at the keyboard. For better or for worse, both of them will now have to learn to live with a lot more fatherly attention!

3

Protein level quantification

Quantitative proteomics experiments have been performed for many years, using different methods and techniques. In this chapter we briefly describe the most common methods that do not rely on the combination of liquid chromatography and mass spectrometry.

3.1 Two-dimensional gels

In general, 2D SDS-PAGE gel quantification is based on the signal intensity of the spot in which the protein has been found. It is important to note that this intensity is an indirect measure, since protein spots are first stained or labeled in order to become visible, and it is the intensity of the stain or label that is subsequently measured. Obviously, spots containing more than one protein present challenges to any such quantitative study as the unexpected proteins add staining intensity that will be incorrectly assumed to be derived from the identified protein. And even if the spot is known to hold more than one protein, it is not possible to deconvolute the contributions of the individual proteins to the total amount of staining.

On the other hand, different variants such as truncated or modified forms of the protein are typically spread over different spots, making the global quantification of a protein’s expression level problematic. Indeed, such an analysis requires the addition of all intensities across all the different variant spots. Missed spots, that is, spots that could not be identified for whatever reason, will thus result in an underestimation of the quantification. Also remember that each individual spot has a chance of containing contaminant proteins that can contribute to staining intensity. The fact that different forms often localize in different spots can however be beneficial if one wants to compare different variants with one another, for example, comparing a phosphorylated form of a protein to its unphosphorylated form.

Another difficulty for 2D gel based quantification lies in the usable range of the staining procedures used for quantification. Several staining procedures do not stain at all below a certain amount of protein present, and become saturated above a certain amount of protein present. The interval between these amounts can be considered ‘covered’ by the staining procedure and is referred to as the dynamic range.

In addition to high sensitivity, it is desirable to have a linear response in staining intensity versus amount of protein present for the dynamic range. Most nonfluorescent staining agents have relatively poor dynamic range, with even the best nonfluorescent technique (silver staining) only yielding a dynamic range of about one order of magnitude. Fluorescent stains can attain the sensitivity of silver staining when used under optimal conditions, and can result in a much better dynamic range, providing up to five orders of magnitude. The fluorescent detection does come with its own caveats however, including fading signals if the fluorophore is (slowly) decomposed by the influx of light, and problems with the interference of fluorescence of flourophores that become associated with detergent micelles rather than proteins.

3.1.1 Comparing results from different experiments – DIGE

One of the most serious issues of 2D gel based quantification lies in the poor reproducibility of gels. As a result, comparing gels is not a simple process. A widely-used technique to avoid the typical low reproducibility between different gels is the fluorescence-based Differential In-Gel Electrophoresis (DIGE) approach. In this technique, the proteins in the different samples are labeled with different fluorescent dyes having different excitation wavelengths. The samples are then mixed, and the proteins separated on the same 2D PAGE gel. Because of the different excitation wavelengths of the labels, separate gel patterns can be obtained for each sample. This effectively allows the samples to be run under identical circumstances, greatly limiting the factors that can cause variation, and the spots from two samples are therefore much more directly comparable. Figure 3.1 illustrates the process.

Figure 3.1 Illustration of the DIGE procedure. Two samples plus a reference sample (usually obtained by pooling) are each labeled with different dyes and then mixed. A 2D gel is then used to separate the proteins. Using different excitation wavelengths of light, three different gel images can be obtained from the same gel, one for each sample.

3.2 Protein arrays

Another often-used method for discovery oriented protein quantification is provided by protein arrays. The use of protein arrays can be explained by considering three components:

a glass plate or slide in a regular grid;a set of capture molecules that are fixed to the plate;a set of binding molecules that are to be bound to the capture molecules.

The capture molecules are typically antibodies, proteins, or peptides, although other (macro-)molecules such as DNA can be used as well. The different types of capture molecules enable different types of analyses to be carried out. We can divide the protein arrays into forward arrays and reverse arrays.

3.2.1 Forward arrays

In forward arrays the proteins to be analyzed are the binding molecules.

Antibody arrays

In this design the capture molecules are a collection of antibodies. Upon sample application, these antibodies bind the proteins they were raised against, and bound proteins can subsequently be detected. Despite the straightforward design, there are several problems with this type of array. First and foremost, there must be a sufficiently specific and sensitive antibody available for each protein to be measured. This is for instance not the case for every human protein, let alone for less well-studied species. Furthermore, antibodies tend to be expensive, and can often suffer from a specific binding (leading to a falsely exaggerated signal) or poor binding efficiency (leading to a falsely subdued signal). Indeed, antibodies that function quite well in simpler situations such as Western blotting (next Section), may not function as well when confronted with a whole-proteome mixture. Since proteins cover a very broad range of physiochemical parameters, it is furthermore difficult to optimize binding and washing conditions to ensure reasonable capture of all proteins.

Other types of forward arrays

Since antibody arrays can be challenging from a practical perspective, other types of forward arrays have been developed, where the capture molecules consist of peptides, small molecules or stretches of DNA sequence. Usually, these arrays are designed to capture a specific subset of proteins, such as DNA-binding, or drug-binding proteins.

3.2.2 Reverse arrays

As the name implies, reverse arrays take the opposite approach, and the proteins to be analyzed are the capture molecules. The objective here is often antibody profiling, where a sample of plasma or serum from a patient is run over the array to detect antibodies against the spotted proteins. Such arrays are particularly useful for the detection of auto-antibodies, that are aimed at the patient’s own proteins and form important factors in a variety of auto-immune diseases. These arrays are more easily produced than antibody arrays, but still pose the challenge that the relevant proteins must all be (recombinantly) expressed and purified. Reverse arrays that cover most of the proteome of various model organisms, and of course humans, are now commercially available.

3.2.3 Detection of binding molecules

Detection of the binding molecules is usually carried out using fluorescence, although more advanced techniques such as surface plasmon resonance and atomic force microscopy can also be used. In the case of fluorescent detection, the fluorophore can either be attached directly to the sample proteins after lysis, or it can be added after capture through the use of a secondary antibody in so-called sandwich designs.

3.2.4 Analysis of protein array readouts

Since protein arrays closely resemble RNA microarrays, the processing of readout data is commonly handled in the same fashion. Protein arrays typically follow design guidelines copied from RNA microarrays as well, including the use of on-array replicate spots that are scattered over the array.

3.3 Western blotting

Where 2D gel electrophoresis and protein arrays provide discovery oriented methods used to detect and quantify as many proteins as possible in a sample, Western blotting provides a targeted means to quantify a single protein in a sample. In Western blotting, the protein complement of a sample after lysis is typically separated on a single gel dimension, and the separated proteins are subsequently transferred (blotted) onto a membrane with a certain affinity for proteins (typically nitrocellulose or polyvinylidene fluoride (PVDF)).

This membrane is then probed with an affinity reagent that binds specifically to the protein of interest. This affinity reagent is most commonly an antibody specific to the protein. Detection of the bound antibody can be performed through a secondary antibody carrying a linked enzyme, or directly through a linked enzyme on the primary antibody.

The linked enzyme is chosen so it performs an observable reaction, most commonly horseradish peroxidase cleaving a chemoluminescent reagent, producing detectable light in proportion to the amount of bound enzyme, in turn dependent on the amount of protein. Other types of detection are possible, for instance using antibodies connected to a fluorophore, and the use of radioactively labeled antibodies. This last method however, is expensive, laborious, and dangerous, and is only used in those cases where exquisite sensitivity is required.

3.4 ELISA – Enzyme-Linked Immunosorbent Assay

Another popular means of targeted protein quantification is offered by ELISA (Enzyme-Linked Immunosorbent Assay). Here, a whole proteome lysate is affixed to a solid substrate (often a microtiter plate) and is then probed by a specific affinity reagent, usually an antibody. This affinity reagent is then either directly or indirectly (via a secondary antibody for instance) detected through a coupled enzyme (often horseradish peroxidase, similar to Western blotting).

ELISA is often used in clinical assays, and the development of an ELISA is therefore often the endpoint of a biomarker discovery pipeline.

3.5 Bibliographic notes

2D gel analysis Rabilloud et al. (2010).
Protein arrays