The Essential R Reference - Mark Gardener - E-Book

The Essential R Reference E-Book

Mark Gardener

0,0
28,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An essential library of basic commands you can copy and paste into R

The powerful and open-source statistical programming language R is rapidly growing in popularity, but it requires that you type in commands at the keyboard rather than use a mouse, so you have to learn the language of R. But there is a shortcut, and that's where this unique book comes in. A companion book to Visualize This: The FlowingData Guide to Design, Visualization, and Statistics, this practical reference is a library of basic R commands that you can copy and paste into R to perform many types of statistical analyses.

Whether you're in technology, science, medicine, business, or engineering, you can quickly turn to your topic in this handy book and find the commands you need.

  • Comprehensive command reference for the R programming language and a companion book to Visualize This: The FlowingData Guide to Design, Visualization, and Statistics
  • Combines elements of a dictionary, glossary, and thesaurus for the R language
  • Provides easy accessibility to the commands you need, by topic, which you can cut and paste into R as needed
  • Covers getting, saving, examining, and manipulating data; statistical test and math; and all the things you can do with graphs
  • Also includes a collection of utilities that you'll find useful

Simplify the complex statistical R programming language with The Essential R Reference.

.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 533

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Theme 1: Data

Data Types

Creating Data

Importing Data

Saving Data

Viewing Data

Summarizing Data

Distribution of Data

Theme 2: Math and Statistics

Mathematical Operations

Summary Statistics

Differences Tests

Correlations and Associations

Analysis of Variance and Linear Modeling

Miscellaneous Methods

Theme 3: Graphics

Making Graphs

Adding to Graphs

Graphical Parameters

Theme 4: Utilities

Install

Using R

Programming

About the Author

About the Technical Editor

Credits

Acknowledgments

Introduction

Theme 1: Data

R is an object-oriented language; that means that it deals with named objects. Most often these objects are the data that you are analyzing. This theme deals with making, getting, saving, examining, and manipulating data objects.

Topics in this Theme

Data Types

Creating Data

Importing Data

Saving Data

Viewing Data

Summarizing Data

Distribution of Data

Commands in this Theme:

[]
$
addmargins
aggregate
apply
array
as.data.frame
as.xxxx
attach
attr
attributes
c
case.names
cbind
character
class
colMeans
colnames
colSums
comment
cummax
cummin
cumprod
cumsum
data
data.frame
detach
dget
dim
dimnames
dir
dput
droplevels
dump
dxxxx
ecdf
factor
file.choose
fivenum
ftable
getwd
gl
head
inherits
integer
interaction
IQR
is
is.xxxx
lapply
length
levels
list
list.files
load
logical
ls
ls.str
lsf.str
mad
margin.table
matrix
mean
median
mode
names
NCOL
ncol
nlevels
NROW
nrow
numeric
objects
order
prop.table
ptukey
pxxxx
qtukey
quantile
qxxxx
range
rank
raw
rbind
read.csv
read.csv2
read.delim
read.delim2
read.spss
read.table
read.xls
read.xlsx
relevel
remove
reorder
resample
rep
rm
RNGkind
row.names
rowMeans
rownames
rowsum
rowSums
rxxxx
sample
sapply
save
save.image
scan
sd
search
seq
seq_along
seq_len
set.seed
setwd
sort
source
storage.mode
str
subset
sum
summary
sweep
table
tabulate
tail
tapply
ts
typeof
unclass
unlist
var
variable.names
vector
View
which
with
within
write
write.csv
write.csv2
write.table
xtabs

Data Types

R recognizes many kinds of data, and these data can be in one of several forms. This topic shows you the commands relating to the kinds of data and how to switch objects from one form to another.

What’s In This Topic:

Types of data

The different types/forms of data objectsCreating blank data objects

Altering data type

s

Switching data from one type to another

Testing data type

s

How to tell what type an object is

Types of Data

Data can exist as different types and forms. These have different properties and can be coerced from one type/form into another.

Command Name

array

An array is a multidimensional object.

SEEdrop for reducing dimensions of arrays in Theme 2, “Math and Statistics: Matrix Math.”

Common Usage

Related Commands

as.array
is.array
dim
dimnames
drop

Command Parameters

Examples

Command Name

character

Data in text form (not numbers) is called character data. The command creates a blank data object containing empty text data items.

Common Usage

Related Commands

as.character
is.character
numeric
integer
factor
data.frame
matrix
list
table

Command Parameters

Examples

Command Name

data.frame

SEE also data.frame in “Adding to Existing Data.”

A data.frame is a two-dimensional, rectangular object that contains columns and rows. The columns can contain data of different types (some columns can be numbers and others text). The command makes a data frame from named objects.

Common Usage

Related Commands

matrix
list
table

Command Parameters

Examples

Command Name

factor

This command creates factor objects. These appear without quotation marks and are used in data analyses to indicate levels of a treatment variable.

SEEsubset for selecting sub-sets and droplevels for omitting unused levels.

Common Usage

Related Commands

as.factor
is.factor
character
numeric
gl
rep
interaction

Command Parameters

Examples

Command Name

ftable

Creates a “flat” contingency table.

SEEftable in “Summary Tables.”

Command Name

integer

Data objects that are numeric (not text) and contain no decimals are called integer objects. The command creates a vector containing the specified number of 0s.

Common Usage

Related Commands

as.integer
is.integer
character
factor

Command Parameters

Examples

Command Name

list

A list object is a collection of other R objects simply bundled together. A list can be composed of objects of differing types and lengths. The command makes a list from named objects.

Common Usage

list(...)

Related Commands

vector
as.list
is.list
unlist
data.frame
matrix

Command Parameters

...

Objects to be bundled together as a list. Usually named objects are separated by commas.

Examples

Command Name

logical

A logical value is either TRUE or FALSE. The command creates a vector of logical values (all set to FALSE).

Common Usage

Related Commands

as.logical
is.logical
vector

Command Parameters

Examples

Command Name

matrix

A matrix is a two-dimensional, rectangular object with rows and columns. A matrix can contain data of only one type (either all text or all numbers). The command creates a matrix object from data.

SEE also matrix in “Adding to Existing Data.”

Common Usage

Related Commands

data.frame
as.matrix
is.matrix
cbind
rbind
nrow
ncol
dimnames
colnames
rownames
dim

Command Parameters

Examples

Command Name

numeric

Data that are numeric are numbers that may contain decimals (not integer values). The command creates a new vector of numbers (all 0).

Common Usage

Related Commands

as.numeric
is.numeric
integer
character
factor

Command Parameters

Examples

Command Name

raw

Data that are raw contain raw bytes. The command creates a vector of given length with all elements 00.

Common Usage

Related Commands

as.raw
is.raw
vector

Command Parameters

Examples

Command Name

table

The table command uses cross-classifying factors to build a contingency table of the counts at each combination of factor levels.

SEE also table in “Summary Tables.”

Related Commands

ftable

xtabs

Command Name

ts

A time-series object contains numeric data as well as information about the timing of the data. The command creates a time-series object with either a single or multiple series of data. The resulting object will have a class attribute "ts" and an additional "mts" attribute if it is a multiple series. There are dedicated plot and print methods for the "ts" class.

Common Usage

Related Commands

as.ts
is.ts

Command Parameters

Examples

Command Name

vector

Common Usage

Related Commands

as.vector
is.vector
matrix
data.frame

Command Parameters

Examples

Command Name

xtabs

This command carries out cross tabulation, creating a contingency table as a result.

SEE also xtabs in “Summary Tables.”

Altering Data Types

Each type of data (for example, numeric, character) can potentially be switched to a different type, and similarly, each form (for example, data frame, matrix) of data object can be coerced to a new form. In general, a command of the form as.xxxx (where xxxx is the name of the required data type) is likely to be what you need.

Command Name

as.array as.character as.data.frame as.factor as.integer as.list as.logical as.matrix as.numeric as.raw as.table as.ts as.vector

These commands attempt to coerce an object into the specified form. This will not always succeed.

SEE also as.data.frame.

Common Usage

as.character(x)

Related Commands

is.xxxx

Command Parameters

x

The object to be coerced to the new form.

Examples

Command Name

as.data.frame

This command attempts to convert an object into a data frame. For example, this can be useful for cross tabulation by converting a frequency table into a data table.

SEE also xtabs in “Summarizing Data: Summary Tables.”

Testing Data Types

You can determine what sort of data an object contains and also the form of the data object. Generally, a command of the form is.xxxx (where xxxx is the object type to test) is required. The result is a logical TRUE or FALSE.

Command Name

class

Returns the class attribute of an object.

SEEclass in “Data Object Properties.”

Command Name

inherits

Tests the class attribute of an object. The return value can be a logical value or a number (0 or 1).

Common Usage

Related Commands

is
is.xxxx
class

Command Parameters

Examples

Command Name

is

Determines if an object holds a particular class attribute.

Common Usage

is(object, class2)

Related Commands

inherits
class
is.xxxx

Command Parameters

object

An R object.

class

2

The name of the

class

to test. If this name is in the

class

attribute of the object,

TRUE

is the result.

Examples

Command Name

is.array is.character is.data.frame is.factor is.integer is.list is.logical is.matrix is.numeric is.raw is.table is.ts is.vector

These commands test an object and returns a logical value (TRUE or FALSE) as the result.

Common Usage

is.character(x)

Related Commands

as.xxxx

Command Parameters

x

The object to be tested. The result is a logical

TRUE

or

FALSE

.

Examples

Creating Data

Data can be created by typing in values from the keyboard, using the clipboard, or by importing from another file. This topic covers the commands used in creating (and modifying) data from the keyboard or clipboard.

What’s In This Topic:

Creating

data f

rom the keyboard

Use the keyboard to make data objects

Creating

data f

rom the clipboard

Use the clipboard to transfer data from other programs

Adding

to existing data

Add extra data to existing objectsAmend data in existing objects

Creating Data from the Keyboard

Relatively small data sets can be typed in from the keyboard.

Command Name

c

This command is used whenever you need to combine items. The command combines several values/objects into a single object. Can be used to add to existing data.

SEE also data.frame in “Adding to Existing Data.”

Common Usage

c(...)

Related Commands

scan
read.table
dget
data
source
load

Command Parameters

...

Objects to be joined together (concatenated); names are separated by commas.

Examples

Command Name

cbind

Adds a column to a matrix.

SEEcbind in “Adding to Existing Data.”

Command Name

gl

Generates factor levels. This command creates factor vectors by specifying the pattern of their levels.

Common Usage

Related Commands

rep
seq
factor
levels
nlevels
interaction

Command Parameters

Examples

Command Name

interaction

This command creates a new factor variable using combinations of other factors to represent the interactions. The resulting factor is unordered. This can be useful in creating labels or generating graphs.

SEEpaste in Theme 4, “Utilities,” for alternative ways to join items in label making.

Common Usage

Related Commands

gl
factor
rep

Command Parameters

Examples

USE the pw data in the Essential.RData file for these examples.

Command Name

rep

Creates replicated elements. Can be used for creating factor levels where replication is unequal, for example.

Common Usage

rep(x, times, length.out, each)

Related Commands

seq
gl
factor
interaction

Command Parameters

x

A vector or other object suitable for replicating. Usually a vector, but lists, data frames, and matrix objects can also be replicated.

times

A vector giving the number of times to repeat. If

times

is an integer, the entire object is repeated the specified number of

times

. If

times

is a vector, it must be the same length as the original object. Then the individual elements of the vector specify the repeats for each element in the original.

length.out

The total length of the required result.

each

Specifies how many times each element of the original are to be repeated.

Examples

Command Name

rbind

Adds a row to a matrix.

SEErbind in “Adding to Existing Data.”

Command Name

seq seq_along seq_len

These commands generate regular sequences. The seq command is the most flexible. The seq_along command is used for index values and the seq_len command produces simple sequences up to the specified length.

Common Usage

Related Commands

rep
gl
factor

Command Parameters

Examples

Command Name

scan

This command can read data items from the keyboard, clipboard, or text file.

SEEscan in “Importing Data” and scan in “Creating Data from the Clipboard.”

Creating Data from the Clipboard

It is possible to use the clipboard to transfer data into R; the scan command is designed especially for this purpose.

Command Name

scan

This command can read data items from the keyboard, clipboard, or text file.

SEEscan in “Importing Data.”

Adding to Existing Data

If you have an existing data object, you can append new data to it in various ways. You can also amend existing data in similar ways.

Command Name

$

Allows access to parts of certain objects (for example, list and data frame objects). The $ can access named parts of a list and columns of a data frame.

SEE also $ in “Selecting and Sampling Data.”

Common Usage

object$element

Related Commands

[]
c
cbind
rbind
data.frame
unlist

Command Parameters

element

The

$

provides access to named elements in a list or named columns in a data frame.

Examples

Command Name

[]

Square brackets enable sub-setting of many objects. Components are given in the brackets; for vector or list objects a single component is given: vector[element]. For data frame or matrix objects two elements are required: matrix[row, column]. Other objects may have more dimensions. Sub-setting can extract elements or be used to add new elements to some objects (vectors and data frames).

SEE also [] in “Selecting and Sampling Data.”

Common Usage

object[elements]

Related Commands

$
c
cbind
rbind
data.frame

Command Parameters

elements

Named elements or index number. The number of elements required depends on the object. Vectors and list objects have one dimension. Matrix and data frame objects have two dimensions:

[row, column]

. More complicated tables may have three or more dimensions.

Examples

Command Name

c

Combines items. Used for many purposes including adding elements to existing data objects (mainly vector objects).

SEE also “Creating Data from the Keyboard.”

Common Usage

c(...)

Related Commands

$
[]
cbind
rbind
data.frame

Command Parameters

...

Objects to be combined.

Examples

Command Name

cbind

Binds together objects to form new objects column-by-column. Generally used to create new matrix objects or to add to existing matrix or data frame objects.

Common Usage

Related Commands

rbind
matrix
data.frame
[]
$

Command Parameters

Examples

Command Name

data.frame

Used to construct a data frame from separate objects or to add to an existing data frame.

SEE also “Types of Data.”

Common Usage

Related Commands

$
[]
c
cbind
rbind
matrix

Command Parameters

Examples

Command Name

matrix

A matrix is a two-dimensional, rectangular object with rows and columns. A matrix can contain data of only one type (all text or all numbers). The command creates a matrix object from data or adds to an existing matrix.

Common Usage

Related Commands

data.frame
cbind
rbind

Command Parameters

Examples

Command Name

rbind

Binds together objects to form new objects row-by-row. Generally used to create new matrix objects or to add to existing matrix or data frame objects.

Common Usage

Related Commands

cbind
matrix
data.frame
c
[]
$

Command Parameters

Examples

Command Name

within

Objects may contain separate elements. For example, a data frame contains named columns. These elements are not visible in the search path and will not be listed as objects by the ls command. The within command allows an object to be opened up temporarily so that the object can be altered.

Common Usage

within(data, expr)

Related Commands

with
attach
$

Command Parameters

Examples

Importing Data

Data can be imported to R from disk files. Usually these files are plain text (for example, CSV files), but it is possible to import data saved previously in R as a binary (data) file.

What’s In This Topic:

Import

ing

data

from text files

Import data as plain text (e.g., TXT or CSV)

Import

ing

data

from data files

Import data previously saved by R

Importing Data from Text Files

Most programs can write data to disk in plain text format. The most commonly used format is CSV; that is, comma-separated variables. Excel, for example, is commonly used for data entry and storage and can write CSV files easily.

Command Name

dget

Gets a text file from disk that represents an R object (usually created using dput). The object is reconstructed to re-create the original object if possible.

Common Usage

dget(file)

Related Commands

dput
read.table
read.csv
scan
source

Command Parameters

file

The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by

file.choose()

, which brings up a file browser.

Examples

Command Name

file.choose

Allows the user to select a file interactively. This command can be used whenever a file parameter is required (that is, whenever a filename is needed). The command opens a browser window for file selection. Note that this does not work on Linux OS.

Command Name

read.table read.csv read.csv2 read.delim read.delim2

These commands read a plain text file from disk and creates a data frame. The basic read.table command enables many parameters to be specified. The read.csv command and the other variants have certain defaults permitting particular file types to be read more conveniently.

Common Usage

Related Commands

dget
scan
source
write.table
write.csv

Command Parameters

Examples

Command Name

scan

Reads data from keyboard, clipboard, or text file from disk (or URL). The command creates a vector or list. If a filename is not specified, the command waits for input from keyboard (including clipboard); otherwise, the filename is used as the target data to read.

Common Usage

Related Commands

read.table
read.csv
dget
source

Command Parameters

Examples

Command Name

source

Reads a text file and treats it as commands typed from the keyboard. Commonly used to run saved scripts, that is, lines of R commands.

SEE also source command in Theme 4, “Programming: Saving and Running Scripts.”

Common Usage

source(file)

Related Commands

scan
read.table
read.csv
dget
dump

Command Parameters

file

The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by

file.choose()

, which brings up a file browser.

Examples

Importing Data from Data Files

R can read data that it previously saved (and so binary encoded) to disk. R can also read a variety of proprietary formats such as Excel, SPSS, and Minitab, but you will need to load additional packages to R to do this. In general, it is best to open the data in the proprietary program and save the data in CSV format before returning to R and using the read.csv command.

SEE also “Importing Data from Text Files.”

Command Name

data

The base distribution of R contains a datasets package, which contains example data. Other packages contain data sets. The data command can load a data set or show the available data. Data sets in loaded packages are available without any command, but the data command adds them to the search path.

Common Usage

Related Commands

load
source
read.table
package: foreign
package: gdata
package: xlsx

Command Parameters

Examples

Command Name

load

Reloads data that was saved from R in binary format (usually via the save command). The save command creates a binary file containing named R objects, which may be data, results, or custom functions. The load command reinstates the named objects, overwriting any identically named objects with no warning.

SEE also load in Theme 4, “Programming: Saving and Running Scripts.”

Common Usage

load(file)

Related Commands

save
save.image
source
scan
read.table
read.csv
dget

Command Parameters

file

The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by

file.choose()

, which brings up a file browser.

Examples

Command Name

package: foreign read.spss

This command is available in the foreign package, which is not part of the base distribution of R. The command allows an SPSS file to be read into a data frame.

Common Usage

To get the package, use the following commands:

> install.packages("foreign") > library(package)

Related Commands

package: gdata

package: xlsx

library

install.packages

Command Name

package: gdata read.xls

This command is available in the gdata package, which is not part of the base distribution of R. The command allows a Microsoft Excel file to be read into a data frame.

Common Usage

To get the package, use the following command:

> install.packages("gdata") > library(gdata)

Related Commands

package: foreign

package: gdata

library

install.packages

Command Name

package: xlsx read.xlsx

This command is available in the xlsx package, which is not part of the base distribution of R. The command allows a Microsoft Excel file to be read into a data frame.

Common Usage

To get the package, use the following command:

> install.packages("xlsx")

Related Commands

package: gdata

package: foreign

library

install.packages

Saving Data

The R objects you create can be saved to disk. These objects might be data, results, or customized functions, for example. Objects can be saved as plain text files or binary encoded (therefore only readable by R). Most of the commands that allow you to save an object to a file will also permit the output to be routed to the computer screen.

What’s In This Topic:

Saving

data

as

a

text

file to disk

Save data items to disk fileShow data items on screen

Saving

data

as

a

data

file to disk

Save individual objectsSave the entire workspace to disk

Saving Data as a Text File to Disk

In some cases it is useful to save data to disk in plain text format. This can be useful if you are going to transfer the data to a spreadsheet for example.

Command Name

cat

This command outputs objects to screen or a file as text. The command is used more for handling simple messages to screen rather than for saving complicated objects to disk. The cat command can only save vectors or matrix objects to disk (the names are not preserved for matrix objects).

SEE also Theme 4, “Utilities.”

Common Usage

Related Commands

dput
dump
write
write.table
write.csv
save

Command Parameters

Examples

Command Name

dput

This command attempts to write an ASCII representation of an object. As part of this process the object is deparsed and certain attributes passed to the representation. This is not always entirely successful and the dget command cannot always completely reconstruct the object. The dump command may be more successful. The save command keeps all the attributes of the object, but the file is not ASCII.

Common Usage

Related Commands

dget
cat
dump
write
write.table
write.csv
save

Command Parameters

Examples

Command Name

dump

This command attempts to create text representations of R objects. Once saved to disk, the objects can usually be re-created using the source command.

SEE also dump in Theme 4, “Programming: Saving and Running Scripts.”

Common Usage

Related Commands

cat
dput
write
write.table
write.csv
save

Command Parameters

Examples

Command Name

write

Writes data to a text file. The command is similar to the cat command and can handle only vector or matrix data.

Common Usage

Related Commands

cat
dput
dump
write.table
write.csv
save

Command Parameters

Examples

Command Name

write.table write.csv write.csv2

Writes data to disk and converts it to a data frame.

Common Usage

Related Commands

read.table
read.csv
cat
dput
dump
write
save

Command Parameters

Examples

Saving Data as a Data File to Disk

Any R object can be saved to disk as a binary-encoded file. The save command saves named objects to disk that can be recalled later using the load command (the data command can also work for some objects). The save.image command saves all the objects; that is, the current workspace.

Command Name

save save.image

These commands save R objects to disk as binary encoded files. These can be recalled later using the load command. The save.image command is a convenience command that saves all objects in the current workspace (similar to what happens when quitting R).

SEE also save in Theme 4, “Programming: Saving and Running Scripts.”

Common Usage

Related Commands

load
source
cat
dput
dump
write
write.table
write.csv

Command Parameters

Examples

Viewing Data

R works with named objects. An object could be data, a result of an analysis, or a customized function. You need to be able to see which objects are available in the memory of R and on disk. You also need to be able to see what an individual object is and examine its properties. Finally, you need to be able to view an object and possibly select certain components from it.

SEE “Data Types” for determining what is an individual object.

What’s In This Topic:

Listing data

View objects in current workspaceView files on diskView objects within other objects (i.e., object components)

Data

object

properties

Selecting and sampling data

Sorting and rearranging data

Obtain an index for items in an objectReorder the items in an objectReturn the ranks of items in an object

Listing Data

You need to be able to see what data items you have in your R workspace and on disk. You also need to be able to view the objects themselves and look at the components that make up each object.

Command Name

attach

Objects can have multiple components, which will not appear separately and cannot be selected simply by typing their name. The attach command “opens” an object and allows the components to be available. Data objects that have the same names as the components can lead to confusion, so this command needs to be used with caution.

Common Usage

attach(what)

Related Commands

detach
with
$

Command Parameters

what

An R object to be “opened” and made available on the search path. Usually this is a data frame or list.

Examples