Mastering Scientific Computing with R - Paul Gerrard - E-Book

Mastering Scientific Computing with R E-Book

Paul Gerrard

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

If you want to learn how to quantitatively answer scientific questions for practical purposes using the powerful R language and the open source R tool ecosystem, this book is ideal for you. It is ideally suited for scientists who understand scientific concepts, know a little R, and want to be able to start applying R to be able to answer empirical scientific questions. Some R exposure is helpful, but not compulsory.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 496

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Mastering Scientific Computing with R
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Programming with R
Data structures in R
Atomic vectors
Operations on vectors
Lists
Attributes
Factors
Multidimensional arrays
Matrices
Data frames
Loading data into R
Saving data frames
Basic plots and the ggplot2 package
Flow control
The for() loop
The apply() function
The if() statement
The while() loop
The repeat{} and break statement
Functions
General programming and debugging tools
Summary
2. Statistical Methods with R
Descriptive statistics
Data variability
Confidence intervals
Probability distributions
Fitting distributions
Higher order moments of a distribution
Other statistical tests to fit distributions
The propagate package
Hypothesis testing
Proportion tests
Two sample hypothesis tests
Unit root tests
Summary
3. Linear Models
An overview of statistical modeling
Model formulas
Explanatory variables interactions
Error terms
The intercept as parameter 1
Updating a model
Linear regression
Plotting a slope
Analysis of variance
Generalized linear models
Generalized additive models
Linear discriminant analysis
Principal component analysis
Clustering
Summary
4. Nonlinear Methods
Nonparametric and parametric models
The adsorption and body measures datasets
Theory-driven nonlinear regression
Visually exploring nonlinear relationships
Extending the linear framework
Polynomial regression
Performing a polynomial regression in R
Spline regression
Nonparametric nonlinear methods
Kernel regression
Kernel weighted local polynomial fitting
Optimal bandwidth selection
A practical scientific application of kernel regression
Locally weighted polynomial regression and the loess function
Nonparametric methods with the np package
Nonlinear quantile regression
Summary
5. Linear Algebra
Matrices and linear algebra
Matrices in R
Vectors in R
Matrix notation
The physical functioning dataset
Basic matrix operations
Element-wise matrix operations
Matrix subtraction
Matrix addition
Matrix sweep
Basic matrixwise operations
Transposition
Matrix multiplication
Multiplying square matrices for social networks
Outer products
Using sparse matrices in matrix multiplication
Matrix inversion
Solving systems of linear equations
Determinants
Triangular matrices
Matrix decomposition
QR decomposition
Eigenvalue decomposition
Lower upper decomposition
Cholesky decomposition
Singular value decomposition
Applications
Rasch analysis using linear algebra and a paired comparisons matrix
Calculating Cronbach's alpha
Image compression using direct cosine transform
Importing an image into R
The compression technique
Creating the transformation and quantization matrices
Putting the matrices together for image compression
DCT in R
Summary
6. Principal Component Analysis and the Common Factor Model
A primer on correlation and covariance structures
Datasets used in this chapter
Principal component analysis and total variance
Understanding the basics of PCA
How does PCA relate to SVD?
Scaled versus unscaled PCA
PCA for dimension reduction
PCA to summarize wine properties
Choosing the number of principal components to retain
Formative constructs using PCA
Exploratory factor analysis and reflective constructs
Familiarizing yourself with the basic terms
Matrices of interest
Expressing factor analysis in a matrix model
Basic EFA and concepts of covariance algebra
Concepts of EFA estimation
The centroid method
Multiple actors
Direct factor extraction by principal axis factoring
Performing principal axis factoring in R
Other factor extraction methods
Factor rotation
Orthogonal factor rotation methods
Quartimax rotation
Varimax rotation
Oblique rotations
Oblimin rotation
Promax rotation
Factor rotation in R
Advanced EFA with the psych package
Summary
7. Structural Equation Modeling and Confirmatory Factor Analysis
Datasets
Political democracy
Physical functioning dataset
Holzinger-Swineford 1939 dataset
The basic ideas of SEM
Components of an SEM model
Path diagram
Matrix representation of SEM
The reticular action model (RAM)
An example of SEM specification
An example in R
SEM model fitting and estimation methods
Assessing SEM model fit
Using OpenMx and matrix specification of an SEM
Summarizing the OpenMx approach
Explaining an entire example
Specifying the model matrices
Fitting the model
Fitting SEM models using lavaan
The lavaan syntax
Comparing OpenMx to lavaan
Explaining an example in lavaan
Explaining an example in OpenMx
Summary
8. Simulations
Basic sample simulations in R
Pseudorandom numbers
The runif() function
Bernoulli random variables
Binomial random variables
Poisson random variables
Exponential random variables
Monte Carlo simulations
Central limit theorem
Using the mc2d package
One-dimensional Monte Carlo simulation
Two-dimensional Monte Carlo simulation
Additional mc2d functions
The mcprobtree() function
The cornode() function
The mcmodel() function
The evalmcmod() function
Data visualization
Multivariate nodes
Monte Carlo integration
Multiple integration
Other density functions
Rejection sampling
Importance sampling
Simulating physical systems
Summary
9. Optimization
One-dimensional optimization
The golden section search method
The optimize() function
The Newton-Raphson method
The Nelder-Mead simplex method
More optim() features
Linear programming
Integer-restricted optimization
Unrestricted variables
Quadratic programming
General non-linear optimization
Other optimization packages
Summary
10. Advanced Data Management
Cleaning datasets in R
String processing and pattern matching
Regular expressions
Floating point operations and numerical data types
Memory management in R
Basic R memory commands
Handling R objects in memory
Missing data
Computational aspects of missing data in R
Statistical considerations of missing data
Deletion methods
Listwise deletion or complete case analysis
Pairwise deletion
Visualizing missing data
An overview of multiple imputation
Imputation basic principles
Approaches to imputation
The Amelia package
Getting estimates from multiply imputed datasets
Extracting the mean
Extracting the standard error of the mean
The mice package
Imputation functions in mice
Summary
Index

Mastering Scientific Computing with R

Mastering Scientific Computing with R

Copyright © 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2015

Production reference: 1270115

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78355-525-3

www.packtpub.com

Cover image by Jason Dupuis Mayer (<[email protected]>)

Credits

Authors

Paul Gerrard

Radia M. Johnson

Reviewers

Laurent Drouet

Ratanlal Mahanta

Mzabalazo Z. Ngwenya

Donato Teutonico

Commissioning Editor

Kartikey Pandey

Acquisition Editor

Greg Wild

Content Development Editor

Akshay Nair

Technical Editors

Rosmy George

Ankita Thakur

Copy Editors

Shivangi Chaturvedi

Pranjali Chury

Puja Lalwani

Adithi Shetty

Project Coordinator

Mary Alex

Proofreaders

Simran Bhogal

Martin Diver

Ameesha Green

Paul Hindle

Bernadette Watkins

Indexer

Priya Subramani

Graphics

Sheetal Aute

Disha Haria

Abhinash Sahu

Production Coordinator

Conidon Miranda

Cover Work

Conidon Miranda

About the Authors

Paul Gerrard is a physician and healthcare researcher who is based out of Portland, Maine, where he currently serves as the medical director of the cardiopulmonary rehabilitation program at New England Rehabilitation Hospital of Portland. He studied business economics in college. After completing medical school, he did a residency in physical medicine and rehabilitation at Harvard Medical School and Spaulding Rehabilitation Hospital, where he served as chief resident and stayed on as faculty at Harvard before moving to Portland. He continues to collaborate on research projects with researchers at other academic institutions within the Boston area and around the country. He has published and presented research on a range of topics, including traumatic brain injury, burn rehabilitation, health outcomes, and the epidemiology of disabling medical conditions.

I would like to thank my beautiful wife, Deirdre, and my son, Patrick. My work on this book is dedicated to the loving memory of Fiona.

Radia M. Johnson has a doctorate degree in immunology and currently works as a research scientist at the Institute for Research in Immunology and Cancer at the Université de Montréal, where she uses genomics and bioinformatics to identify and characterize the molecular changes that contribute to cancer development. She routinely uses R and other computer programming languages to analyze large data sets from ongoing collaborative projects. Since obtaining her PhD at the University of Toronto, she has also worked as a research associate at the University of Cambridge in Hematology, where she gained experience using system biology to study blood cancer.

I would like to thank Dr. Charlie Massie for teaching me to love programming in R and Dr. Phil Kousis for all his support through the years. You are both excellent mentors and wonderful friends!

About the Reviewers

Laurent Drouet holds a PhD in economics and social sciences from the University of Geneva, Switzerland, and a master's degree in applied mathematics from the Institute of Applied Mathematics of Angers, France. He was also a postdoctoral research fellow at the Research Lab of Economics and Environmental Management at the Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. He was also a researcher at the Public Research Center Tudor, Luxembourg. He is currently a senior researcher at Fondazione Eni Enrico Mattei (FEEM) and a research affiliate at Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC), Italy.

His main research is related to integrated assessment modeling and energy modeling. For more than a decade, he designed scientific tools to perform data analysis for this type of modeling. He also built optimization frameworks to couple models of many kinds (such as climate models, air quality models, and economy models). He created and developed the bottom-up techno-economic energy model ETEM to study optimal energy policies at urban or national levels.

I want to thank my wife for her support every day both in my private life and professional life.

Ratanlal Mahanta holds an MSc in computational finance. He is currently working at GPSK Investment Group as a senior quantitative analyst. He has 4 years of experience in quantitative trading and strategies developments for sell side and risk consulting firms. He is an expert in high frequency and algorithmic trading. He has expertise in these areas: quantitative trading (FX, equities, futures and options, and engineering on derivatives); algorithms—partial differential equations, stochastic differential equations, the finite difference method, Monte Carlo, and Machine Learning; code—R programming, C++, MATLAB, HPC, and scientific computing; data analysis—Big Data analytic [EOD to TBT], Bloomberg, Quandl, and Quantopian; and strategies—vol-arbitrage, vanilla and exotic options modeling, trend following, mean reversion, co-integration, Monte Carlo simulations, ValueatRisk, stress testing, buy side trading strategies with high Sharpe ratio, credit risk modeling, and credit rating.

He has reviewed Mastering R for Quantitative Finance, Packt Publishing. He is currently reviewing two other books for Packt Publishing: Mastering Python for Data Science and Machine Learning with R Cookbook.

Mzabalazo Z. Ngwenya holds a postgraduate degree in mathematical statistics from the University of Cape Town. He has worked extensively in the field of statistical consulting, wherein he utilized varied statistical software including R. His area of interest are primarily centered around statistical computing. Previously, he was involved in reviewing Learning RStudio for R Statistical Computing, Mark P.J. van der Loo and Edwin de Jonge; R Statistical Application Development Example Beginner's Guide, Prabhanjan Narayanachar Tattar; Machine Learning with R, Brett Lantz; R Graph Essentials, David Alexandra Lillis, and R Object-oriented Programming, Kelly Black, all by Packt Publishing. He currently works as a biometrician.

Donato Teutonico has several years of experience in modeling and the simulation of drug effects and clinical trials in industrial and academic settings. He received his PharmD degree from the University of Turin, Italy, specializing in chemical and pharmaceutical technology, and his PhD in pharmaceutical sciences from Paris-Sud University, France.

He is the author of two R packages for pharmacometrics, CTStemplate and panels-for-pharmacometrics, which are both available on Google Code. He is also the author of Instant R Starter, Packt Publishing.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

As an open source computing environment, R is rapidly becoming the lingua franca of the statistical computing community. R's powerful base functions, powerful statistical tools, open source nature, and avid user community have led to R having an expansive library of powerful, cutting-edge quantitative methods not yet available to users of other high-cost statistical programs.

With this book, you will learn not just about R, but how to use R to answer conceptual, scientific, and experimental questions.

Beginning with an overview of fundamental R concepts, including data types, R program flow, and basic coding techniques, you'll learn how R can be used to achieve the most commonly needed scientific data analysis tasks, including testing for statistically significant differences between groups and model relationships in data. You will also learn parametric and nonparametric techniques for both difference testing and relationship modeling.

You will delve into linear algebra and matrix operations with an emphasis not on the R syntax, but on how these operations can be used to address common computational or analytical needs. This book also covers the application of matrix operations for the purpose of finding a structure in high-dimensional data using the principal component, exploratory factor, and confirmatory factor analysis in addition to structural equation modeling. You will also master methods for simulation, learn about an advanced analytical method, and finish by going to the next level with advanced data management focused on dealing with messy and problematic datasets that serious analysts deal with daily.

By the end of this book, you will be able to undertake publication-quality data analysis in R.

What this book covers

Chapter 1, Programming with R, presents an overview of how data is stored and accessed in R. Then, we will go over how to load data into R using built-in functions and useful packages for easy import from Excel worksheets. We will also cover how to use flow control statements and functions to reduce complexity and help you program more efficiently.

Chapter 2, Statistical Methods with R, presents an overview of how to summarize your data and get useful statistical information for downstream analysis. We will show you how to plot and get statistical information from probability distributions and how to test the fit of your sample distribution to well-defined probability distributions.

Chapter 3, Linear Models, covers linear models, which are probably the most commonly used statistical methods to study the relationships between variables. The Generalized linear model section will delve into a bit more detail than typical R books, discussing the nature of link functions and canonical link functions.

Chapter 4, Nonlinear Methods, reviews applications of nonlinear methods in R using both parametric and nonparametric methods for both theory-driven and exploratory analysis.

Chapter 5, Linear Algebra, covers algebra techniques in R. We will also learn linear algebra operations including transposition, inversion, matrix multiplication, and a number of matrix transformations.

Chapter 6, Principal Component Analysis and the Common Factor Model, helps you understand the application of linear algebra to covariance and correlation matrices. We will cover how to use PCA to account for total variance in a set of variables and how to use EFA to model common variance among these variables in R.

Chapter 7, Structural Equation Modeling and Confirmatory Factor Analysis, covers the fundamental ideas underlying structural equation modeling, which are often overlooked in other books discussing SEM in R, and then delve into how SEM is done in R.

Chapter 8, Simulations, explains how to perform basic sample simulations and how to use simulations to answer statistical problems. We will also learn how to use R to generate random numbers, and how to simulate random variables from several common probability distributions.

Chapter 9, Optimization, explores a variety of methods and techniques to optimize a variety of functions. We will also cover how to use a wide range of R packages and functions to set up, solve, and visualize different optimization problems.

Chapter 10, Advanced Data Management, walks you through the basic techniques for data handling and some basic memory management considerations.

What you need for this book

The software that we require for this book is R Version 3.0.1 or higher, OpenMx Version 1.4, and RStudio.

Who this book is for

If you want to learn how to quantitatively answer scientific questions for practical purposes using the powerful R language and the open source R tool ecosystem, this book is ideal for you. It is ideally suited for scientists who understand scientific concepts, know a little R, and want to start applying R to be able to answer empirical scientific questions. Some R exposure is helpful, but not compulsory.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "You can also retrieve additional information on the objects stored in your environment using the str() function."

A block of code is set as follows:

> integer_vector <- c(1L, 2L, 12L, 29L) > integer_vector [1] 1 2 12 29

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "To install R on Windows, click on Download R for Windows, and then click on base for the download link and installation instructions."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output.

You can download this file from: https://www.packtpub.com/sites/default/files/downloads/5253OS_ColoredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the ErrataSubmissionForm link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.