99,99 €
BRIDGES THE GAP BETWEEN SAS AND R, ALLOWING USERS TRAINED IN ONE LANGUAGE TO EASILY LEARN THE OTHER SAS and R are widely-used, very different software environments. Prized for its statistical and graphical tools, R is an open-source programming language that is popular with statisticians and data miners who develop statistical software and analyze data. SAS (Statistical Analysis System) is the leading corporate software in analytics thanks to its faster data handling and smaller learning curve. SAS for R Users enables entry-level data scientists to take advantage of the best aspects of both tools by providing a cross-functional framework for users who already know R but may need to work with SAS. Those with knowledge of both R and SAS are of far greater value to employers, particularly in corporate settings. Using a clear, step-by-step approach, this book presents an analytics workflow that mirrors that of the everyday data scientist. This up-to-date guide is compatible with the latest R packages as well as SAS University Edition. Useful for anyone seeking employment in data science, this book: * Instructs both practitioners and students fluent in one language seeking to learn the other * Provides command-by-command translations of R to SAS and SAS to R * Offers examples and applications in both R and SAS * Presents step-by-step guidance on workflows, color illustrations, sample code, chapter quizzes, and more * Includes sections on advanced methods and applications Designed for professionals, researchers, and students, SAS for R Users is a valuable resource for those with some knowledge of coding and basic statistics who wish to enter the realm of data science and business analytics.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 172
Veröffentlichungsjahr: 2019
Cover
Preface
Scope
1 About SAS and R
1.1 About SAS
1.2 About R
1.3 Notable Points in SAS and R Languages
1.4 Some Important Functions with Comparative Comparisons Respectively
1.5 Summary
1.6 Quiz Questions
Quiz Answers
2 Data Input, Import and Print
2.1 Importing Data
2.2 Importing Data in SAS
2.3 Importing Data in R
2.4 Providing Data Input
2.5 Data Input in SAS
2.6 Printing Data
2.7 Summary
2.8 Quiz Questions
Quiz Answers
3 Data Inspection and Cleaning
3.1 Introduction
3.2 Data Inspection
3.3 Missing Values
3.4 Data Cleaning
3.5 Quiz Questions
Quiz Answers
4 Handling Dates, Strings, Numbers
4.1 Working with Numeric Data
4.2 Working with Date Data
4.3 Handling Strings Data
4.4 Quiz Questions
Quiz Answers
5 Numerical Summary and Groupby Analysis
5.1 Numerical Summary and Groupby Analysis
5.2 Numerical Summary and Groupby Analysis in SAS
5.3 Numerical Summary and Group by Analysis in R
5.4 Quiz Questions
Quiz Answers
6 Frequency Distributions and Cross Tabulations
6.1 Frequency Distributions in SAS
6.2 Frequency Distributions in R
7 Using SQL with SAS and R
7.1 What is SQL?
7.2 SQL Select
7.3 Merges
7.4 Summary
7.5 Quiz Questions
Quiz Answers
8 Functions, Loops, Arrays, Macros
8.1 Functions
8.2 Loops
8.3 Arrays
8.4 Macros
8.5 Quiz Questions
Quiz Answers
9 Data Visualization
9.1 Importance of Data Visualization
9.2 Data Visualization in SAS
9.3 Data Visualization in R
9.4 Quiz Questions
Quiz Answers
10 Data Output
10.1 Data Output in SAS
10.2 Data Output in R
10.3 Quiz Questions
Quiz Answers
11 Statistics for Data Scientists
11.1 Types of Variables
11.2 Statistical Methods for Data Analysis
11.3 Distributions
11.4 Descriptive Statistics
11.5 Inferential Statistics
11.6 Algorithms in Data Science
11.7 Quiz Questions
Quiz Answers
Further Reading
Index
End User License Agreement
Chapter 5
Figure 5.1 Proc Univariate Output.
Figure 5.2 sessionInfo Output in R.
Chapter 6
Figure 6.1 CrossTables output in R.
Chapter 7
Figure 7.1 Proc SQL in SAS.
Figure 7.2 Sort/Order Data in SAS.
Figure 7.3 Proc SQL – Create and Insert in SAS.
Figure 7.4 Proc SQL – Where Condition Result in SAS.
Figure 7.5 sqldf – Where Condition in R.
Figure 7.6 Issued table.
Figure 7.7 Book table.
Figure 7.8 User table.
Figure 7.9 Inner Join in SAS.
Figure 7.10 Inner Join in R.
Figure 7.11 Left Join in SAS.
Figure 7.12 Left Join in R.
Figure 7.13 Download Data in SAS Studio.
Chapter 9
Figure 9.1 Anscombe Dataset in R.
Figure 9.2 Data Visualization Options in SAS.
Figure 9.3 Bar Plot in SAS.
Figure 9.4 Bar‐Line Plot in SAS.
Figure 9.5 Box Plot in SAS.
Figure 9.6 Bubble Plot in SAS.
Figure 9.7 Heat Map in SAS.
Figure 9.8 Histogram in SAS.
Figure 9.9 Line Plot in SAS.
Figure 9.10 Mosaic Plot in SAS.
Figure 9.11 Pie Plot in SAS.
Figure 9.12 Scatter Plot in SAS.
Figure 9.13 Bar Plot in R.
Figure 9.14 Bar‐Line Plot in R.
Figure 9.15 Box Plot in R.
Figure 9.16 Bubble Plot in R.
Figure 9.17 Heatmap in R.
Figure 9.18 Histogram in R.
Figure 9.19 Line Plot in R.
Figure 9.20 Mosaic Plot in R.
Figure 9.21 Pie Plot in R.
Chapter 10
Figure 10.1 Creating plots in SAS.
Figure 10.2 HTML output plot in SAS.
Figure 10.3 Output format for plots in R.
Figure 10.4 Knit document in R studio.
Figure 10.5 HTML output by knit in R.
Chapter 11
Figure 11.1 Variable types.
Figure 11.2 (a) Skewness and (b) kurtosis.
Figure 11.3 Hypotheis test types.
Figure 11.4 Types of statistical error.
Figure 11.5 Normal distribution.
Figure 11.6 Bayes theorem.
Figure 11.7 Linear regression.
Figure 11.8 Logistic regression.
Figure 11.9 Support vector machines SVM.
Figure 11.10 k Nearest Neighbors kNN.
Figure 11.11 Decision tree.
Figure 11.12 Confusion matrix.
Figure 11.13 Confusion matrix.
Figure 11.14 ROC curve.
Figure 11.15 K means cluster.
Figure 11.16 Hierachical cluster (dendogram).
Figure 11.17 Gaussian mixture cluster.
Figure 11.18 Time series decomposition.
Cover
Table of Contents
Begin Reading
iii
iv
v
xiii
xiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
151
152
153
154
155
156
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
Ajay Ohri
Delhi, IN
This edition first published 2020© 2020 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Ajay Ohri to be identified as the author of this work has been asserted in accordance with law.
Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Ohri, (Ajay), author.Title: SAS for R users : a book for data scientists / Ajay Ohri.Description: First edition. | Hoboken, NJ : John Wiley & Sons, Inc., 2020. | Includes bibliographical references and index.Identifiers: LCCN 2019021408 (print) | ISBN 9781119256410 (pbk.)Subjects: LCSH: SAS (Computer program language) | R (Computer program language) | Statistics–Data processing.Classification: LCC QA76.73.S27 O44 2020 (print) | LCC QA76.73.S27 (ebook) | DDC 005.5/5–dc23LC record available at https://lccn.loc.gov/2019021408LC ebook record available at https://lccn.loc.gov/2019980765
Cover Design: WileyCover Image: © DmitriyRazinkov/Shutterstock
This book is dedicated to my students and my family, my son Kush Ohri, members of my church, and my God Jesus Christ.
I would like to thank the generosity of the SAS Institute and its employees to provide SAS On Demand for Academics for free without whom this book would not exist. In addition, I also want to thank the baristas from Starbucks Gurgaon. These are the people who downvote my questions on Stackoverflow. You inspire me guys.
SAS for R users is aimed at entry‐level data scientists. It is not aimed at researchers in academia nor is it aimed at high‐ end data scientists working on Big Data, deep learning, or machine learning. In short, it is merely aimed at human learning business analytics (or data science as it is now called).
Both SAS and R are widely used languages and yet both are very different. SAS is a programming language that was designed in the 1960s which is broadly divided into Data Steps and a wide variety of Procedure or PROC steps, while R is an object oriented, mostly functional, language designed in the 1990s.
There are many, many books covering either but only very few books covering both.
Why then write the book? After all, I have written two books on R, and one on Python for R. SAS language remains the most widely used language in enterprises, contributing directly to the brand name, and profitability of one of the largest private software companies that invests hugely in its own research instead of borrowing research in the name of open source. A statistics student knowing Python (esp Machine Learning ML), R, SAS, Big Data (esp Spark ML), Data Visualization (using Tableau) is a mythical unicorn unavailable to recruiters who often have to settle for a few of these skills and then train them in house.
As a teacher, I want my students to have jobs – there is no ideological tilt to open source or any company here. The probability of students getting jobs from campus greatly increases if they know BOTH SAS and R not just one of them. That is why this book has been written.
This book is designed for professionals and students; people who want to enter data science and who have a coding background with some basics of statistical information. It is not aimed at researchers or people who like giraffes and do not read the book from the beginning.
