SAS for R Users - Ajay Ohri - E-Book

SAS for R Users E-Book

Ajay Ohri

0,0
99,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

BRIDGES THE GAP BETWEEN SAS AND R, ALLOWING USERS TRAINED IN ONE LANGUAGE TO EASILY LEARN THE OTHER SAS and R are widely-used, very different software environments. Prized for its statistical and graphical tools, R is an open-source programming language that is popular with statisticians and data miners who develop statistical software and analyze data. SAS (Statistical Analysis System) is the leading corporate software in analytics thanks to its faster data handling and smaller learning curve. SAS for R Users enables entry-level data scientists to take advantage of the best aspects of both tools by providing a cross-functional framework for users who already know R but may need to work with SAS. Those with knowledge of both R and SAS are of far greater value to employers, particularly in corporate settings. Using a clear, step-by-step approach, this book presents an analytics workflow that mirrors that of the everyday data scientist. This up-to-date guide is compatible with the latest R packages as well as SAS University Edition. Useful for anyone seeking employment in data science, this book: * Instructs both practitioners and students fluent in one language seeking to learn the other * Provides command-by-command translations of R to SAS and SAS to R * Offers examples and applications in both R and SAS * Presents step-by-step guidance on workflows, color illustrations, sample code, chapter quizzes, and more * Includes sections on advanced methods and applications Designed for professionals, researchers, and students, SAS for R Users is a valuable resource for those with some knowledge of coding and basic statistics who wish to enter the realm of data science and business analytics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 172

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Preface

Scope

1 About SAS and R

1.1 About SAS

1.2 About R

1.3 Notable Points in SAS and R Languages

1.4 Some Important Functions with Comparative Comparisons Respectively

1.5 Summary

1.6 Quiz Questions

Quiz Answers

2 Data Input, Import and Print

2.1 Importing Data

2.2 Importing Data in SAS

2.3 Importing Data in R

2.4 Providing Data Input

2.5 Data Input in SAS

2.6 Printing Data

2.7 Summary

2.8 Quiz Questions

Quiz Answers

3 Data Inspection and Cleaning

3.1 Introduction

3.2 Data Inspection

3.3 Missing Values

3.4 Data Cleaning

3.5 Quiz Questions

Quiz Answers

4 Handling Dates, Strings, Numbers

4.1 Working with Numeric Data

4.2 Working with Date Data

4.3 Handling Strings Data

4.4 Quiz Questions

Quiz Answers

5 Numerical Summary and Groupby Analysis

5.1 Numerical Summary and Groupby Analysis

5.2 Numerical Summary and Groupby Analysis in SAS

5.3 Numerical Summary and Group by Analysis in R

5.4 Quiz Questions

Quiz Answers

6 Frequency Distributions and Cross Tabulations

6.1 Frequency Distributions in SAS

6.2 Frequency Distributions in R

7 Using SQL with SAS and R

7.1 What is SQL?

7.2 SQL Select

7.3 Merges

7.4 Summary

7.5 Quiz Questions

Quiz Answers

8 Functions, Loops, Arrays, Macros

8.1 Functions

8.2 Loops

8.3 Arrays

8.4 Macros

8.5 Quiz Questions

Quiz Answers

9 Data Visualization

9.1 Importance of Data Visualization

9.2 Data Visualization in SAS

9.3 Data Visualization in R

9.4 Quiz Questions

Quiz Answers

10 Data Output

10.1 Data Output in SAS

10.2 Data Output in R

10.3 Quiz Questions

Quiz Answers

11 Statistics for Data Scientists

11.1 Types of Variables

11.2 Statistical Methods for Data Analysis

11.3 Distributions

11.4 Descriptive Statistics

11.5 Inferential Statistics

11.6 Algorithms in Data Science

11.7 Quiz Questions

Quiz Answers

Further Reading

Index

End User License Agreement

List of Illustrations

Chapter 5

Figure 5.1 Proc Univariate Output.

Figure 5.2 sessionInfo Output in R.

Chapter 6

Figure 6.1 CrossTables output in R.

Chapter 7

Figure 7.1 Proc SQL in SAS.

Figure 7.2 Sort/Order Data in SAS.

Figure 7.3 Proc SQL – Create and Insert in SAS.

Figure 7.4 Proc SQL – Where Condition Result in SAS.

Figure 7.5 sqldf – Where Condition in R.

Figure 7.6 Issued table.

Figure 7.7 Book table.

Figure 7.8 User table.

Figure 7.9 Inner Join in SAS.

Figure 7.10 Inner Join in R.

Figure 7.11 Left Join in SAS.

Figure 7.12 Left Join in R.

Figure 7.13 Download Data in SAS Studio.

Chapter 9

Figure 9.1 Anscombe Dataset in R.

Figure 9.2 Data Visualization Options in SAS.

Figure 9.3 Bar Plot in SAS.

Figure 9.4 Bar‐Line Plot in SAS.

Figure 9.5 Box Plot in SAS.

Figure 9.6 Bubble Plot in SAS.

Figure 9.7 Heat Map in SAS.

Figure 9.8 Histogram in SAS.

Figure 9.9 Line Plot in SAS.

Figure 9.10 Mosaic Plot in SAS.

Figure 9.11 Pie Plot in SAS.

Figure 9.12 Scatter Plot in SAS.

Figure 9.13 Bar Plot in R.

Figure 9.14 Bar‐Line Plot in R.

Figure 9.15 Box Plot in R.

Figure 9.16 Bubble Plot in R.

Figure 9.17 Heatmap in R.

Figure 9.18 Histogram in R.

Figure 9.19 Line Plot in R.

Figure 9.20 Mosaic Plot in R.

Figure 9.21 Pie Plot in R.

Chapter 10

Figure 10.1 Creating plots in SAS.

Figure 10.2 HTML output plot in SAS.

Figure 10.3 Output format for plots in R.

Figure 10.4 Knit document in R studio.

Figure 10.5 HTML output by knit in R.

Chapter 11

Figure 11.1 Variable types.

Figure 11.2 (a) Skewness and (b) kurtosis.

Figure 11.3 Hypotheis test types.

Figure 11.4 Types of statistical error.

Figure 11.5 Normal distribution.

Figure 11.6 Bayes theorem.

Figure 11.7 Linear regression.

Figure 11.8 Logistic regression.

Figure 11.9 Support vector machines SVM.

Figure 11.10 k Nearest Neighbors kNN.

Figure 11.11 Decision tree.

Figure 11.12 Confusion matrix.

Figure 11.13 Confusion matrix.

Figure 11.14 ROC curve.

Figure 11.15 K means cluster.

Figure 11.16 Hierachical cluster (dendogram).

Figure 11.17 Gaussian mixture cluster.

Figure 11.18 Time series decomposition.

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

iv

v

xiii

xiv

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

75

76

77

78

79

80

81

82

83

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

151

152

153

154

155

156

157

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

SAS for R Users

A Book for Data Scientists

Ajay Ohri

Delhi, IN

This edition first published 2020© 2020 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Ajay Ohri to be identified as the author of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Ohri, (Ajay), author.Title: SAS for R users : a book for data scientists / Ajay Ohri.Description: First edition. | Hoboken, NJ : John Wiley & Sons, Inc., 2020. | Includes bibliographical references and index.Identifiers: LCCN 2019021408 (print) | ISBN 9781119256410 (pbk.)Subjects: LCSH: SAS (Computer program language) | R (Computer program language) | Statistics–Data processing.Classification: LCC QA76.73.S27 O44 2020 (print) | LCC QA76.73.S27 (ebook) | DDC 005.5/5–dc23LC record available at https://lccn.loc.gov/2019021408LC ebook record available at https://lccn.loc.gov/2019980765

Cover Design: WileyCover Image: © DmitriyRazinkov/Shutterstock

This book is dedicated to my students and my family, my son Kush Ohri, members of my church, and my God Jesus Christ.

Preface

I would like to thank the generosity of the SAS Institute and its employees to provide SAS On Demand for Academics for free without whom this book would not exist. In addition, I also want to thank the baristas from Starbucks Gurgaon. These are the people who downvote my questions on Stackoverflow. You inspire me guys.

SAS for R users is aimed at entry‐level data scientists. It is not aimed at researchers in academia nor is it aimed at high‐ end data scientists working on Big Data, deep learning, or machine learning. In short, it is merely aimed at human learning business analytics (or data science as it is now called).

Both SAS and R are widely used languages and yet both are very different. SAS is a programming language that was designed in the 1960s which is broadly divided into Data Steps and a wide variety of Procedure or PROC steps, while R is an object oriented, mostly functional, language designed in the 1990s.

There are many, many books covering either but only very few books covering both.

Why then write the book? After all, I have written two books on R, and one on Python for R. SAS language remains the most widely used language in enterprises, contributing directly to the brand name, and profitability of one of the largest private software companies that invests hugely in its own research instead of borrowing research in the name of open source. A statistics student knowing Python (esp Machine Learning ML), R, SAS, Big Data (esp Spark ML), Data Visualization (using Tableau) is a mythical unicorn unavailable to recruiters who often have to settle for a few of these skills and then train them in house.

As a teacher, I want my students to have jobs – there is no ideological tilt to open source or any company here. The probability of students getting jobs from campus greatly increases if they know BOTH SAS and R not just one of them. That is why this book has been written.

Scope

This book is designed for professionals and students; people who want to enter data science and who have a coding background with some basics of statistical information. It is not aimed at researchers or people who like giraffes and do not read the book from the beginning.