E-Book
60,99 €

Biostatistical Design and Analysis Using R E-Book

Murray Logan

0,0

60,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

R -- the statistical and graphical environment is rapidly emerging as an important set of teaching and research tools for biologists. This book draws upon the popularity and free availability of R to couple the theory and practice of biostatistics into a single treatment, so as to provide a textbook for biologists learning statistics, R, or both. An abridged description of biostatistical principles and analysis sequence keys are combined together with worked examples of the practical use of R into a complete practical guide to designing and analyzing real biological research. Topics covered include: * simple hypothesis testing, graphing * exploratory data analysis and graphical summaries * regression (linear, multi and non-linear) * simple and complex ANOVA and ANCOVA designs (including nested, factorial, blocking, spit-plot and repeated measures) * frequency analysis and generalized linear models. Linear mixed effects modeling is also incorporated extensively throughout as an alternative to traditional modeling techniques. The book is accompanied by a companion website www.wiley.com/go/logan/r with an extensive set of resources comprising all R scripts and data sets used in the book, additional worked examples, the biology package, and other instructional materials and links.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 813

Veröffentlichungsjahr: 2011

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Contents

Preface

R quick reference card

General key to statistical methods

1 Introduction to R

1.1 Why R?

1.2 Installing R

1.3 The R environment

1.4 Object names

1.5 Expressions, Assignment and Arithmetic

1.6 R Sessions and workspaces

1.7 Getting help

1.8 Functions

1.9 Precedence

1.10 Vectors - variables

1.11 Matrices, lists and data frames

1.12 Object information and conversion

1.13 Indexing vectors, matrices and lists

1.14 Pattern matching and replacement (character search and replace)

1.15 Data manipulation

1.16 Functions that perform other functions repeatedly

1.17 Programming in R

1.18 An introduction to the R graphical environment

1.19 Packages

1.20 Working with scripts

1.21 Citing R in publications

1.22 Further reading

2 Data sets

2.1 Constructing data frames

2.2 Reviewing a data frame - fix()

2.3 Importing (reading) data

2.4 Exporting (writing) data

2.5 Saving and loading of R objects

2.6 Data frame vectors

2.7 Manipulating data sets

2.8 Dummy data sets - generating random data

3 Introductory statistical principles

3.1 Distributions

3.2 Scale transformations

3.3 Measures of location

3.4 Measures of dispersion and variability

3.5 Measures of the precision of estimates - standard errors and confidence intervals

3.6 Degrees of freedom

3.7 Methods of estimation

3.8 Outliers

3.9 Further reading

4 Sampling and experimental design with R

4.1 Random sampling

4.2 Experimental design

5 Graphical data presentation

5.1 The plot()function

5.2 Graphical Parameters

5.3 Enhancing and customizing plots with low-level plotting functions

5.4 Interactive graphics

5.5 Exporting graphics

5.6 Working with multiple graphical devices

5.7 High-level plotting functions for univariate (single variable) data

5.8 Presenting relationships

5.9 Presenting grouped data

5.10 Presenting categorical data

5.11 Trellis graphics

5.12 Further reading

6 Simple hypothesis testing – one and two population tests

6.1 Hypothesis testing

6.2 One- and two-tailed tests

6.3 t-tests

6.4 Assumptions

6.5 Statistical decision and power

6.6 Robust tests

6.7 Further reading

6.8 Key for simple hypothesis testing

6.9 Worked examples of real biological data sets

7 Introduction to Linear models

7.1 Linear models

7.2 Linear models in R

7.3 Estimating linear model parameters

7.4 Comments about the importance of understanding the structure and parameterization of linear models

8 Correlation and simple linear regression

8.1 Correlation

8.2 Simple linear regression

8.3 Smoothers and local regression

8.4 Correlation and regression in R

8.5 Further reading

8.6 Key for correlation and regression

8.7 Worked examples of real biological data sets

9 Multiple and curvilinear regression

9.1 Multiple linear regression

9.2 Linear models

9.3 Null hypotheses

9.4 Assumptions

9.5 Curvilinear models

9.6 Robust regression

9.7 Model selection

9.8 Regression trees

9.9 Further reading

9.10 Key and analysis sequence for multiple and complex regression

9.11 Worked examples of real biological data sets

10 Single factor classification (ANOVA)

10.1 Null hypotheses

10.2 Linear model

10.3 Analysis of variance

10.4 Assumptions

10.5 Robust classification (ANOVA)

10.6 Tests of trends and means comparisons

10.7 Power and sample size determination

10.8 ANOVA in R

10.9 Further reading

10.10 Key for single factor classification (ANOVA)

10.11 Worked examples of real biological data sets

11 Nested ANOVA

11.1 Linear models

11.2 Null hypotheses

11.3 Analysis of variance

11.4 Variance components

11.5 Assumptions

11.6 Pooling denominator terms

11.7 Unbalanced nested designs

11.8 Linear mixed effects models

11.9 Robust alternatives

11.10 Power and optimisation of resource allocation

11.11 Nested ANOVA in R

11.12 Further reading

11.13 Key for nested ANOVA

11.14 Worked examples of real biological data sets

12 Factorial ANOVA

12.1 Linear models

12.2 Null hypotheses

12.3 Analysis of variance

12.4 Assumptions

12.5 Planned and unplanned comparisons

12.6 Unbalanced designs

12.7 Robust factorial ANOVA

12.8 Power and sample sizes

12.9 Factorial ANOVA in R

12.10 Further reading

12.11 Key for factorial ANOVA

12.12 Worked examples of real biological data sets

13 Unreplicated factorial designs – randomized block and simple repeated measures

13.1 Linear models

13.2 Null hypotheses

13.3 Analysis of variance

13.4 Assumptions

13.5 Specific comparisons

13.6 Unbalanced un-replicated factorial designs

13.7 Robust alternatives

13.8 Power and blocking efficiency

13.9 Unreplicated factorial ANOVA in R

13.10 Further reading

13.11 Key for randomized block and simple repeated measures ANOVA

13.12 Worked examples of real biological data sets

14 Partly nested designs: split plot and complex repeated measures

14.1 Null hypotheses

14.2 Linear models

14.3 Analysis of variance

14.4 Assumptions

14.5 Other issues

14.6 Further reading

14.7 Key for partly nested ANOVA

14.8 Worked examples of real biological data sets

15 Analysis of covariance (ANCOVA)

15.1 Null hypotheses

15.2 Linear models

15.3 Analysis of variance

15.4 Assumptions

15.5 Robust ANCOVA

15.6 Specific comparisons

15.7 Further reading

15.8 Key for ANCOVA

15.9 Worked examples of real biological data sets

16 Simple Frequency Analysis

16.1 The chi-square statistic

16.2 Goodness of fit tests

16.3 Contingency tables

16.4 G-tests

16.5 Small sample sizes

16.6 Alternatives

16.7 Power analysis

16.8 Simple frequency analysis in R

16.9 Further reading

16.10 Key for Analysing frequencies

16.11 Worked examples of real biological data sets

17 Generalized linear models (GLM)

17.1 Dispersion (over or under)

17.2 Binary data - logistic (logit) regression

17.3 Count data - Poisson generalized linear models

17.4 Assumptions

17.5 Generalized additive models (GAM’s) - non-parametric GLM

17.6 GLM and R

17.7 Further reading

17.8 Key for GLM

17.9 Worked examples of real biological data sets

Bibliography

R index

Statistics index

Companion website for this book: wiley.com/go/logan/r

Companion website

A companion website for this book is available at:

www.wiley.com/go/logan/r

The website includes figures from the book for downloading.

A John Wiley & Sons, Inc., Publication

Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientific, Technical and Medical business to form Wiley-Blackwell.

Registered office: John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UKThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK111 River Street, Hoboken, NJ 07030-5774, USA

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloguing-in-Publication Data

Logan, Murray.Biostatistical design and analysis using R : a practical guide / Murray Logan.p. cm.Includes bibliographical references and index.ISBN 978-1-4443-3524-8 (hardcover : alk. paper) – ISBN 978-1-4051-9008-4 (pbk. : alk. paper)1. Biometry. 2. R (Computer program language) I. Title.QH323.5.L645 2010570.1′5195–dc22

2009053162

A catalogue record for this book is available from the British Library.

Typeset in 10.5/13pt Minion by Laserwords Private Limited, Chennai, India

Preface

R is a powerful and flexible statistical and graphical environment that is freely distributed under the GNU Public Licencea for all major computing platforms (Windows, MacOSX and Linux). This open source licence along with a relatively simple scripting syntax has promoted diverse and rapid evolution and contribution. As the broader scientific community continues to gain greater instruction and exposure to the overall project, the popularity of R as a teaching and research tool continues to accelerate.

It is now widely acknowledged that R proficiency as a scientific skill set is becoming increasingly more desirable and useful throughout the scientific community. However, as with most open source developments, the emphasis of the R project remains on the expansive development of tools and features. Applied documentation still remains somewhat sparse and somewhat incomprehensible to the average biologist. Whilst there are a number of excellent texts on R emerging, the bulk of these texts are devoted to the R language itself. Any featured examples therein are used primarily for the purpose of illustrating the suite of commonly used R features and procedures, rather than to illustrate how R can be used to perform common biostatistical analyses.

Coinciding with the increasing interest in R as both a learning and research tool for biostatistics, has been the success of a relatively new major biostatistics textbook (Quinn and Keough, 2002). This text provides detailed coverage of most of the major statistical concepts and tests that biologists are likely to encounter with an emphasis on the practical implementation of these concepts with real biological data. Undoubtedly, a large part of the appeal of this book is attributable to the extensive use of real biological examples to augment and reinforce the text. Furthermore, by concentrating on the information biologists need to implement their research, and avoiding the overuse of complex mathematical descriptions, the authors have appealed to those biologists who don’t require (or desire) a knowledge of performing or programming entire analyses from scratch. Such biologists tend to use statistical software that is already available and specifically desire information that will help them achieve reliable statistical and biological outcomes. Quinn and Keough (2002) also advocate a number of alternative texts that provide more detailed coverage of specific topics and that also adopt this real example approach.

Typically, most biostatistical texts focus on the principles of design and analysis without extending into the practical use of software to implement these principles. Similarly, R/S-plus texts tend to concentrate on documenting and showcasing the features of R without providing much of a biostatistical account of the principles behind the features or illustrating how these tools can be extended to achieve comprehensive real world analyses. Consequently, many biological students and professionals struggle to translate the theoretical advice into computational outcomes. Although some of these difficulties can be addressed after extensively reading through a number of software references, many of the difficulties remain. The inconsistency and incompatibility between theory texts and software reference texts is mainly the result of differing intentions of the two genres and is a source of great frustration.

The reluctance of biostatistical texts to promote or instruct on any particular statistical software (except for extremely specialized cases where historically only a single dedicated program was available) is in part an acknowledgment of the diversity of software packages available (each of which differs substantially in the range of features offered as well as the user interface and output provided). Furthermore, software upgrades generally involve major alternations to the way in which preexisting tasks are performed and thus being associated with a single software package tends to restrict the longevity and audience of the text. In contrast, although contributers are constantly extending the feature set of R environments, overall the project maintains a consistent user interface. Consequently, there is currently both a need and opportunity for a text that fills the gap between biostatistics texts and software texts, so as to assist biologists with the practical side of performing statistical analysis.

Many biological researchers and students have at one stage or another used one or other of the major biostatistics texts and gained a good understanding of the principles. However, from time to time (and particularly when preparing to generate a new design or analyse a new data set), they require a quick refresher to help remind them of the issues and principles relevant to their current design and/or analysis scenarios. In most cases, they do not need to re-read the more discursive texts and in many cases express a reluctance to invest large amounts of valuable research time doing so. Therefore, there is also a need for a quick reference that summarizes the key concepts of contemporary biostatistics and leads users step-wise through each of the analysis procedures and options. Such a guide would also help users to identify their areas of statistical naivete and enable them to return to a more comprehensive text with a more focused and efficient objective.

Therefore, the intended focus of this book will be to highlight the major concepts, principles and issues in contemporary biostatistics as well as demonstrate how to use R (as a research design, analysis and presentation tool) to complete examples from major biostatistics textbooks. In so doing, this proposed text acknowledges the important role that statistical software and real examples play in reinforcing statistical principles and practices.

Hence in summary, the intentions of the book are three-fold

(i)To provide very brief refresher summaries of the main concepts, issues and options involved in a range of contemporary biostatistical analyses

(ii)To provide key guides that steps users through the procedures and options of a range of contemporary biostatistical analyses

(iii)To provide detailed R scripts and documentation that enable users to perform a range of real worked examples from statistics texts that are popular among biological and environmental scientists

Worked examples

Where possible and appropriate, this book will make use the same examples that appear in the popular biostatistical texts so as to take advantage of the history and information surrounding those examples as well as any familiarity that users may have with those examples. Having said this however, access to these other texts will not be necessary to get good value out of the materials.

Website

This book is augmented by a website (http://www.wiley.com./go/logan/r) which includes:

raw data sets and R analysis scripts associated with all worked examplesthe biologypackage that contains many functions utilized in this bookan R reference card containing links to pages within the book

Typographical convensions

Throughout this book, all R language objects and functions will be printed in courier (monospaced)typeface. Commands will begin with the standard R command prompt (<) and lines continuing on from a previous line will begin with the continuation prompt (+). In syntax used within the chapter keys, datasetis used as an example and should be replaced by the name of the actual data frame when used. Similarly, all vector names should be replaced by the names used to denote the various variables in your data set.

Acknowledgements

The inspiration for this book came primarily from Gerry Quinn and Mick Keough towards whom I am both indebted and infuriated (in equal quantities). As authors of a statistical piece themselves, they should known better than to encourage others to attempt such an undertaking! I also wish to acknowledge the intellectualizing and suggestions of Patrick Baker and Andrew Robinson, the former of whom’s regular supply of ideas remains a constant source of material and torment. Countless numbers of students and colleagues have also helped refine the materials and format of this book. As almost all of the worked examples in this book are adapted from the major biostatistical texts, the contributions of these other authors cannot be overstated. Finally, I would like to thank Nat, Kara, Saskia and Anika for your support and tolerance while I wrote this “extremely quite boring book with rid-ic-li-us pictures’’ (S. Logan, age 7).

a This is an open source licence that ensured that the application as well as its source code is freely available to use, modify and redistribute.

R quick reference card

Session management

> q() Quitting R (see page 8)

> ls() List the objects in the current environment (see page 7)

> rm(...) Remove objects from the current environment (see page 7)

> setwd(dir) Set the current working directory (see page 7)

> getwd() Get the current working directory (see page 7)

Getting help

> ?function Getting help on a function (see page 8)

> help(function) Getting help on a function (see page 8)

> example(function) Run the examples associated with the manual page for the function (see page 8)

> demo(topic) Run an installed demonstration script (see page 8)

> apropos("topic") Return names of all objects in search list that match “topic” (see page 9)

> help.search("topic") Getting help about a concept (see page 9)

> help.start() Launch R HTML documentation (see page 9)

Built in constants

> LETTERS the 26 upper-case letters of the English alphabet (see page 17)

> letters the 26 lower-case letters of the English alphabet (see page 17)

> month.name English names of the 12 months of the year

> month.abb Abbreviated English names of the 12 months of the year

> piπ – the ratio of a circles circumference to diameter (see page 105)

Packages

> installed.packages() List of all currently installed packages (see page 44)

> update.packages() Update installed packages (see page 44)

> install.packages(pkgs) Install package(s) (pkgs) from CRAN mirror (see page 45)

R CMD INSTALL package Install an add-on package (see page 43)

> library(package) Loading an add-on package (see page 45)

> data(name) Load a data set or structure inbuilt into R or a loaded package.

Importing/Exporting

> source("file") Input, parse and sequentially evaluate the file (see page 45)

> sink("file") Redirect non-graphical output to file

> read.table() Read data in table format and create a data frame, with variables in columns (see page 51)

> read.table() Read data left on the clipboard in table format and create a data frame, with variables in columns (see page 51)

> read.systat("file.syd", to.data.frame=T) Read SYSTAT data file and create a data frame (see page 52)

> read.spss("file.sav", to.data.frame=T) Read SPSS data file and create a data frame (see page 52)

> as.data.frame(read.mtp("file.mtp")) Read Minitab Portable Worksheet data file and create a data frame (see page 52)

> read.xport("file") Read SAS XPORT data file and create a data frame (see page 52)

> write.table() Write the contents of a dataframe to file in table format (see page 53)

> save(object, file="file.RData") Write the contents of the object to file (see page 53)

> load(file="file.RData") Load the contents of a file (see page 53)

> dump(object, file="file") Save the contents of an object to a file (see page 53)

Generating Vectors

> c(...) Concatenate objects (see page 6)

> seq(from, to, by=, length=) Generate a sequence (see page 12)

> rep(x, times, each) Replicate each of the values of x (see page 13)

Character vectors

> paste(..., ) Combine multiple vectors together after converting them into character vectors (see page 13)

> substr(x, start, stop) Extract substrings from a character vector (see page 14)

Factors

> factor(x) Convert the vector (x) into a factor (see page 15)

> factor(x, levels=c()) Convert the vector (x) into a factor and define the order of levels (see page 15)

> gl(levels, reps, length, labels=) Generate a factor vector by specifying the pattern of levels (see page 15)

> levels(factor) Lists the levels (in order) of a factor (see page 54)

> levels(factor) <- Sets the names of the levels of a factor (see page 54)

Matrices

> matrix(x,nrow, ncol, byrow=F) Create a matrix with nrow and/or ncol dimensions out of a vector (x) (see page 16)

> cbind(...) Create a matrix (or data frame) by combining the sequence of vectors, matrices or data frames by columns (see page 16)

> rbind(...) Create a matrix (or data frame) by combining the sequence of vectors, matrices or data frames by rows (see page 16)

> rownames(x) Read (or set with <-) the row names of the matrix (x) (see page 17)

> colnames(x) Read (or set with <-) the column names of the matrix (x) (see page 17)

Lists

> list(...) Generate a list of named (for arguments in the form name=x) and/or unnamed (for arguments in the form (x) components from the sequence of objects (see page 17)

Data frames

> data.frame(...) Convert a set of vectors into a data frame (see page 49)

> row.names(dataframe) Read (or set with <-) the row names of the data frame (see page 49)

> fix(dataframe) View and edit a dataframe in a spreadsheet (see page 49)

Indexing

Vectors

> x[i] Select the ith element (see page 21)

> x[i:j] Select the ith through jth elements inclusive see page 21)

> x[c(1,5,6,9)] Select specific elements (see page 21)

> x[-i] Select all except the element (see page 21)

Lesen Sie weiter in der vollständigen Ausgabe!