28,99 €
An essential library of basic commands you can copy and paste into R
The powerful and open-source statistical programming language R is rapidly growing in popularity, but it requires that you type in commands at the keyboard rather than use a mouse, so you have to learn the language of R. But there is a shortcut, and that's where this unique book comes in. A companion book to Visualize This: The FlowingData Guide to Design, Visualization, and Statistics, this practical reference is a library of basic R commands that you can copy and paste into R to perform many types of statistical analyses.
Whether you're in technology, science, medicine, business, or engineering, you can quickly turn to your topic in this handy book and find the commands you need.
Simplify the complex statistical R programming language with The Essential R Reference.
.Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 533
Veröffentlichungsjahr: 2012
Table of Contents
Cover
Theme 1: Data
Data Types
Creating Data
Importing Data
Saving Data
Viewing Data
Summarizing Data
Distribution of Data
Theme 2: Math and Statistics
Mathematical Operations
Summary Statistics
Differences Tests
Correlations and Associations
Analysis of Variance and Linear Modeling
Miscellaneous Methods
Theme 3: Graphics
Making Graphs
Adding to Graphs
Graphical Parameters
Theme 4: Utilities
Install
Using R
Programming
About the Author
About the Technical Editor
Credits
Acknowledgments
Introduction
R is an object-oriented language; that means that it deals with named objects. Most often these objects are the data that you are analyzing. This theme deals with making, getting, saving, examining, and manipulating data objects.
Topics in this Theme
Data Types
Creating Data
Importing Data
Saving Data
Viewing Data
Summarizing Data
Distribution of Data
Commands in this Theme:
R recognizes many kinds of data, and these data can be in one of several forms. This topic shows you the commands relating to the kinds of data and how to switch objects from one form to another.
What’s In This Topic:
Types of data
Altering data type
s
Testing data type
s
Data can exist as different types and forms. These have different properties and can be coerced from one type/form into another.
array
An array is a multidimensional object.
character
Data in text form (not numbers) is called character data. The command creates a blank data object containing empty text data items.
data.frame
A data.frame is a two-dimensional, rectangular object that contains columns and rows. The columns can contain data of different types (some columns can be numbers and others text). The command makes a data frame from named objects.
factor
This command creates factor objects. These appear without quotation marks and are used in data analyses to indicate levels of a treatment variable.
ftable
Creates a “flat” contingency table.
integer
Data objects that are numeric (not text) and contain no decimals are called integer objects. The command creates a vector containing the specified number of 0s.
list
A list object is a collection of other R objects simply bundled together. A list can be composed of objects of differing types and lengths. The command makes a list from named objects.
list(...)
...
Objects to be bundled together as a list. Usually named objects are separated by commas.
logical
A logical value is either TRUE or FALSE. The command creates a vector of logical values (all set to FALSE).
matrix
A matrix is a two-dimensional, rectangular object with rows and columns. A matrix can contain data of only one type (either all text or all numbers). The command creates a matrix object from data.
numeric
Data that are numeric are numbers that may contain decimals (not integer values). The command creates a new vector of numbers (all 0).
raw
Data that are raw contain raw bytes. The command creates a vector of given length with all elements 00.
table
The table command uses cross-classifying factors to build a contingency table of the counts at each combination of factor levels.
ftable
xtabs
ts
A time-series object contains numeric data as well as information about the timing of the data. The command creates a time-series object with either a single or multiple series of data. The resulting object will have a class attribute "ts" and an additional "mts" attribute if it is a multiple series. There are dedicated plot and print methods for the "ts" class.
vector
xtabs
This command carries out cross tabulation, creating a contingency table as a result.
Each type of data (for example, numeric, character) can potentially be switched to a different type, and similarly, each form (for example, data frame, matrix) of data object can be coerced to a new form. In general, a command of the form as.xxxx (where xxxx is the name of the required data type) is likely to be what you need.
as.array as.character as.data.frame as.factor as.integer as.list as.logical as.matrix as.numeric as.raw as.table as.ts as.vector
These commands attempt to coerce an object into the specified form. This will not always succeed.
as.character(x)
x
The object to be coerced to the new form.
as.data.frame
This command attempts to convert an object into a data frame. For example, this can be useful for cross tabulation by converting a frequency table into a data table.
You can determine what sort of data an object contains and also the form of the data object. Generally, a command of the form is.xxxx (where xxxx is the object type to test) is required. The result is a logical TRUE or FALSE.
class
Returns the class attribute of an object.
inherits
Tests the class attribute of an object. The return value can be a logical value or a number (0 or 1).
is
Determines if an object holds a particular class attribute.
is(object, class2)
object
An R object.
class
2
The name of the
class
to test. If this name is in the
class
attribute of the object,
TRUE
is the result.
is.array is.character is.data.frame is.factor is.integer is.list is.logical is.matrix is.numeric is.raw is.table is.ts is.vector
These commands test an object and returns a logical value (TRUE or FALSE) as the result.
is.character(x)
x
The object to be tested. The result is a logical
TRUE
or
FALSE
.
Data can be created by typing in values from the keyboard, using the clipboard, or by importing from another file. This topic covers the commands used in creating (and modifying) data from the keyboard or clipboard.
What’s In This Topic:
Creating
data f
rom the keyboard
Creating
data f
rom the clipboard
Adding
to existing data
Relatively small data sets can be typed in from the keyboard.
c
This command is used whenever you need to combine items. The command combines several values/objects into a single object. Can be used to add to existing data.
c(...)
...
Objects to be joined together (concatenated); names are separated by commas.
cbind
Adds a column to a matrix.
gl
Generates factor levels. This command creates factor vectors by specifying the pattern of their levels.
interaction
This command creates a new factor variable using combinations of other factors to represent the interactions. The resulting factor is unordered. This can be useful in creating labels or generating graphs.
rep
Creates replicated elements. Can be used for creating factor levels where replication is unequal, for example.
rep(x, times, length.out, each)
x
A vector or other object suitable for replicating. Usually a vector, but lists, data frames, and matrix objects can also be replicated.
times
A vector giving the number of times to repeat. If
times
is an integer, the entire object is repeated the specified number of
times
. If
times
is a vector, it must be the same length as the original object. Then the individual elements of the vector specify the repeats for each element in the original.
length.out
The total length of the required result.
each
Specifies how many times each element of the original are to be repeated.
rbind
Adds a row to a matrix.
seq seq_along seq_len
These commands generate regular sequences. The seq command is the most flexible. The seq_along command is used for index values and the seq_len command produces simple sequences up to the specified length.
scan
This command can read data items from the keyboard, clipboard, or text file.
It is possible to use the clipboard to transfer data into R; the scan command is designed especially for this purpose.
scan
This command can read data items from the keyboard, clipboard, or text file.
If you have an existing data object, you can append new data to it in various ways. You can also amend existing data in similar ways.
$
Allows access to parts of certain objects (for example, list and data frame objects). The $ can access named parts of a list and columns of a data frame.
object$element
element
The
$
provides access to named elements in a list or named columns in a data frame.
[]
Square brackets enable sub-setting of many objects. Components are given in the brackets; for vector or list objects a single component is given: vector[element]. For data frame or matrix objects two elements are required: matrix[row, column]. Other objects may have more dimensions. Sub-setting can extract elements or be used to add new elements to some objects (vectors and data frames).
object[elements]
elements
Named elements or index number. The number of elements required depends on the object. Vectors and list objects have one dimension. Matrix and data frame objects have two dimensions:
[row, column]
. More complicated tables may have three or more dimensions.
c
Combines items. Used for many purposes including adding elements to existing data objects (mainly vector objects).
c(...)
...
Objects to be combined.
cbind
Binds together objects to form new objects column-by-column. Generally used to create new matrix objects or to add to existing matrix or data frame objects.
data.frame
Used to construct a data frame from separate objects or to add to an existing data frame.
matrix
A matrix is a two-dimensional, rectangular object with rows and columns. A matrix can contain data of only one type (all text or all numbers). The command creates a matrix object from data or adds to an existing matrix.
rbind
Binds together objects to form new objects row-by-row. Generally used to create new matrix objects or to add to existing matrix or data frame objects.
within
Objects may contain separate elements. For example, a data frame contains named columns. These elements are not visible in the search path and will not be listed as objects by the ls command. The within command allows an object to be opened up temporarily so that the object can be altered.
within(data, expr)
Data can be imported to R from disk files. Usually these files are plain text (for example, CSV files), but it is possible to import data saved previously in R as a binary (data) file.
What’s In This Topic:
Import
ing
data
from text files
Import
ing
data
from data files
Most programs can write data to disk in plain text format. The most commonly used format is CSV; that is, comma-separated variables. Excel, for example, is commonly used for data entry and storage and can write CSV files easily.
dget
Gets a text file from disk that represents an R object (usually created using dput). The object is reconstructed to re-create the original object if possible.
dget(file)
file
The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by
file.choose()
, which brings up a file browser.
file.choose
Allows the user to select a file interactively. This command can be used whenever a file parameter is required (that is, whenever a filename is needed). The command opens a browser window for file selection. Note that this does not work on Linux OS.
read.table read.csv read.csv2 read.delim read.delim2
These commands read a plain text file from disk and creates a data frame. The basic read.table command enables many parameters to be specified. The read.csv command and the other variants have certain defaults permitting particular file types to be read more conveniently.
scan
Reads data from keyboard, clipboard, or text file from disk (or URL). The command creates a vector or list. If a filename is not specified, the command waits for input from keyboard (including clipboard); otherwise, the filename is used as the target data to read.
source
Reads a text file and treats it as commands typed from the keyboard. Commonly used to run saved scripts, that is, lines of R commands.
source(file)
file
The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by
file.choose()
, which brings up a file browser.
R can read data that it previously saved (and so binary encoded) to disk. R can also read a variety of proprietary formats such as Excel, SPSS, and Minitab, but you will need to load additional packages to R to do this. In general, it is best to open the data in the proprietary program and save the data in CSV format before returning to R and using the read.csv command.
data
The base distribution of R contains a datasets package, which contains example data. Other packages contain data sets. The data command can load a data set or show the available data. Data sets in loaded packages are available without any command, but the data command adds them to the search path.
load
Reloads data that was saved from R in binary format (usually via the save command). The save command creates a binary file containing named R objects, which may be data, results, or custom functions. The load command reinstates the named objects, overwriting any identically named objects with no warning.
load(file)
file
The filename in quotes. Defaults to the current working directory unless specified explicitly. Can also link to URL. For Windows and Mac OS the filename can be replaced by
file.choose()
, which brings up a file browser.
package: foreign read.spss
This command is available in the foreign package, which is not part of the base distribution of R. The command allows an SPSS file to be read into a data frame.
To get the package, use the following commands:
> install.packages("foreign") > library(package)
package: gdata
package: xlsx
library
install.packages
package: gdata read.xls
This command is available in the gdata package, which is not part of the base distribution of R. The command allows a Microsoft Excel file to be read into a data frame.
To get the package, use the following command:
> install.packages("gdata") > library(gdata)
package: foreign
package: gdata
library
install.packages
package: xlsx read.xlsx
This command is available in the xlsx package, which is not part of the base distribution of R. The command allows a Microsoft Excel file to be read into a data frame.
To get the package, use the following command:
> install.packages("xlsx")
package: gdata
package: foreign
library
install.packages
The R objects you create can be saved to disk. These objects might be data, results, or customized functions, for example. Objects can be saved as plain text files or binary encoded (therefore only readable by R). Most of the commands that allow you to save an object to a file will also permit the output to be routed to the computer screen.
What’s In This Topic:
Saving
data
as
a
text
file to disk
Saving
data
as
a
data
file to disk
In some cases it is useful to save data to disk in plain text format. This can be useful if you are going to transfer the data to a spreadsheet for example.
cat
This command outputs objects to screen or a file as text. The command is used more for handling simple messages to screen rather than for saving complicated objects to disk. The cat command can only save vectors or matrix objects to disk (the names are not preserved for matrix objects).
dput
This command attempts to write an ASCII representation of an object. As part of this process the object is deparsed and certain attributes passed to the representation. This is not always entirely successful and the dget command cannot always completely reconstruct the object. The dump command may be more successful. The save command keeps all the attributes of the object, but the file is not ASCII.
dump
This command attempts to create text representations of R objects. Once saved to disk, the objects can usually be re-created using the source command.
write
Writes data to a text file. The command is similar to the cat command and can handle only vector or matrix data.
write.table write.csv write.csv2
Writes data to disk and converts it to a data frame.
Any R object can be saved to disk as a binary-encoded file. The save command saves named objects to disk that can be recalled later using the load command (the data command can also work for some objects). The save.image command saves all the objects; that is, the current workspace.
save save.image
These commands save R objects to disk as binary encoded files. These can be recalled later using the load command. The save.image command is a convenience command that saves all objects in the current workspace (similar to what happens when quitting R).
R works with named objects. An object could be data, a result of an analysis, or a customized function. You need to be able to see which objects are available in the memory of R and on disk. You also need to be able to see what an individual object is and examine its properties. Finally, you need to be able to view an object and possibly select certain components from it.
What’s In This Topic:
Listing data
Data
object
properties
Selecting and sampling data
Sorting and rearranging data
You need to be able to see what data items you have in your R workspace and on disk. You also need to be able to view the objects themselves and look at the components that make up each object.
attach
Objects can have multiple components, which will not appear separately and cannot be selected simply by typing their name. The attach command “opens” an object and allows the components to be available. Data objects that have the same names as the components can lead to confusion, so this command needs to be used with caution.
attach(what)
what
An R object to be “opened” and made available on the search path. Usually this is a data frame or list.
