41,99 €
Leverage the capabilities of SAS to process and analyze Big Data
SAS professionals and data analysts who wish to perform analytics on Big Data using SAS to gain actionable insights will find this book to be very useful. If you are a data science professional looking to perform large-scale analytics with SAS, this book will also help you. A basic understanding of SAS will be helpful, but is not mandatory.
SAS has been recognized by Money Magazine and Payscale as one of the top business skills to learn in order to advance one's career. Through innovative data management, analytics, and business intelligence software and services, SAS helps customers solve their business problems by allowing them to make better decisions faster. This book introduces the reader to the SAS and how they can use SAS to perform efficient analysis on any size data, including Big Data.
The reader will learn how to prepare data for analysis, perform predictive, forecasting, and optimization analysis and then deploy or report on the results of these analyses. While performing the coding examples within this book the reader will learn how to use the web browser based SAS Studio and iPython Jupyter Notebook interfaces for working with SAS. Finally, the reader will learn how SAS's architecture is engineered and designed to scale up and/or out and be combined with the open source offerings such as Hadoop, Python, and R.
By the end of this book, you will be able to clearly understand how you can efficiently analyze Big Data using SAS.
The book starts off by introducing the reader to SAS and the SAS programming language which provides data management, analytical, and reporting capabilities. Most chapters include hands on examples which highlights how SAS provides The Power to Know©. The reader will learn that if they are looking to perform large-scale data analysis that SAS provides an open platform engineered and designed to scale both up and out which allows the power of SAS to combine with open source offerings such as Hadoop, Python, and R.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 249
Veröffentlichungsjahr: 2017
Big Data Analytics with SAS
Get actionable insights from your Big Data using the power of SAS
David Pope
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2017
Production reference: 1151117
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78829-090-6
www.packtpub.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
The output/code/data analysis for this paper was generated using SAS® software. Copyright © 2017 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.
Author
David Pope
Copy Editors
Safis Editor
Vikrant Phadkay
Reviewers
Ruben Oliva Ramos
Project Coordinator
Nidhi Joshi
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Viraj Madhav
Indexer
Rekha Nair
Content Development Editor
Cheryl Dsa
Graphics
Tania Dutta
Technical Editor
Suwarna Patil
Production Coordinator
Shraddha Falebhai
"A primary lesson of history is that periodically, and often at the most inconvenient times, society needs to make a sharp break with old habits and deliberately learn new ways of behaving." – Jumping the Curve, by Nicholas Imparato and Oren Harari
For my business partner and me, lunch at iHop always heralded a serious business decision. We had lunch at iHop when we decided to incorporate our consulting firm in 1992. We had lunch there when we decided to hire a CEO. We had lunch again when we decided to fire him. We found ourselves again at iHop in 2011—me nibbling on the Breakfast Sampler, he on the steak and eggs—as we discussed whether to accept an offer to acquire our firm.
We weren't for sale. Our company, Baseline Consulting, was small by management consulting standards, but we were the leaders in the niche field of analytics and data strategy. Larger companies had begun to take notice. A few competitors had reached out by email, a large systems integrator had suggested a meeting "in your office or ours," and software vendors had also come calling. We received an offer from a company that we had worked with and admired, and whose leadership we respected.
The analytics market was booming, business intelligence vendors were blowing out their numbers, and Baseline's growth promised to continue apace. The emergence of data management and curation tools, the explosion of big data, and the adoption of advanced analytics by a new crop of business users—these were just a few bellwethers of rapid industry disruption. Our engagements were getting more complicated, and our deal sizes were growing. We’d have to invest to keep up, or start taking the acquisition offers seriously.
A few short years later, big data, analytics, the cloud, IoT, and artificial intelligence are no longer the purview of tech analysts and vendors. Factories are using analytics and IoT to catch defects before products leave production. Retailers are using cloud applications to push personalized offers to your smartphone. Delivery companies are optimizing their routes, the resulting fuel savings adding double-digit percentages to their bottom lines. Physicians can now monitor a patient's vital signs in real time from their offices or golf courses. Advanced analytics is as close as an app on your tablet or the car in your garage. Your daughter is running regressions in her high school math class. Your son wants to be a data scientist.
Author David Pope, an analytics expert, has written a vital book that not only embraces the analytics industry's trends, but also proselytizes the impact of delivering newfound knowledge with the SAS® software. As the pioneer in advanced analytics and a recognized data management and analytics software leader, SAS stands out as the purveyor of leading-edge solutions in the new analytics economy.
The voice in these pages is no less authoritative than his message. David has worked for SAS for 26 years and has been on the frontlines of some of the industry's most cutting-edge use cases. In Big Data Analytics with SAS, he delivers a veritable toolbox of the techniques companies will be using to realize their digital futures. Users new to SAS and SAS veterans alike will recognize some of the book's themes and embrace its best practices.
I've always been a believer in a best-of-breed approach, choosing the right tool for the job. Amidst the din of vendor hype and the buzzword-du-jour, companies need to deliver insights of value—and quickly. There are more vendor choices than ever. David nimbly navigates the alleyways of software selection and usage, explaining how analytics is deployed and managed the right way. This alone is worth the price of admission.
As my partner and I tucked into our lunches, we deliberated about the future of our business and what was best for our employees. Could we grow at the rate of the industry? Could we stay ahead of it? Were there companies that could get us there faster, cultivating our talent while providing learning opportunities, and a channel for growth? We concluded that, all things considered, SAS would be the best choice.
We have no regrets choosing SAS. And neither will you. Happy reading!
Jill Dyché Author of The New IT
David Pope
David Pope has worked for SAS for over 26 years in a variety of departments, including research and development (R&D), information technology (IT), SAS Solutions on Demand (SSOD), and sales and marketing. He graduated from North Carolina State University with a bachelor's of science in industrial engineering and a certificate in computer programming. He started his career with SAS, testing and writing code for the SAS system in R&D using C, Java, and of course SAS programming languages.
David has worked in both the United States and Europe in this capacity. Then he moved into IT within SAS to help support running it as a business, using SAS and other technologies such as JavaScript, HTML, and Unix/Linux scripting languages. He spent 4 years working as a consultant with SAS customers in SSOD prior to moving into presales support, where he worked across all industries as an analytics and SAS architecture expert. David moved into presales management to build out a team of data scientists and technical architects who support opportunities in the US energy industry, electric utilities, and oil and gas companies. He currently holds 10 patents for SAS and is an active blogger under the SASVoices corporate blog. He is a life-long learner who enjoys teaching and empowering people to solve business problems.
I would like to recognize all the developers who have worked on SAS, without whom this book would not have been possible to write. There are too many individuals to list here whom I have learned from over the course of my career, and they have in one way or another way influenced what is in this book. However, I'd like to recognize Brian Jones for his specific help in using his graphic art skills to greatly improve the visual presentations of several of my ideas that are included in this book.
Ruben Oliva Ramos is a computer systems engineer from Tecnologico de Leon Institute, with a master's degree in computer and electronic systems engineering, teleinformatics, and networking specialization from the University of Salle Bajio in Leon, Guanajuato, Mexico. He has more than 5 years of experience in developing web applications to control and monitor devices connected with Arduino and Raspberry Pi, using web frameworks and cloud services to build Internet of Things applications.
He is a mechatronics teacher at the University of Salle Bajio and teaches students of the master's degree in design and engineering of mechatronics systems. Ruben also works at Centro de Bachillerato Tecnologico Industrial 225 in Leon, Guanajuato, Mexico, teaching subjects such as electronics, robotics and control, automation, and microcontrollers at Mechatronics Technician Career; he is a consultant and developer for projects in areas such as monitoring systems and datalogger data using technologies (such as Android, iOS, Windows Phone, HTML5, PHP, CSS, Ajax, JavaScript, Angular, and ASP.NET), databases (such as SQlite, MongoDB, and MySQL), web servers (such as Node.js and IIS), hardware programming (such as Arduino, Raspberry Pi, Ethernet Shield, GPS, and GSM/GPRS, ESP8266), and control and monitoring systems for data acquisition and programming.
He is the author of these books for Packt:
He is also involved in monitoring, controlling, and acquiring data with Arduino and Visual Basic .NET for Alfaomega.
I would like to thank my savior and lord, Jesus Christ, for giving me the strength and courage to pursue this project; to my dearest wife, Mayte; our two lovely sons, Ruben and Dario; my dear father, Ruben; my dearest mom Rosalia; my brother (Juan Tomas; and my sister, Rosalia, whom I love, for all their support while reviewing this book, for allowing me to pursue my dream, and tolerating me not being with them after my busy day. I'm very grateful to Packt Publishing for giving the opportunity to collaborate as an author and reviewer, to belong to this honest and professional team.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788290909.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
This book is dedicated to all the employees of SAS especially those who have worked together in writing all the underlying code that makes up SAS® Software without which neither the company SAS nor this book would be possible. In addition, I'd like to dedicate this book to my wife, Jeannie, and our three children—Spencer, Rachael, and Lissa whose support and love has helped me progress in both life and in my career.
This book will introduce the reader to how SAS can be used to perform analytics on any size of data and how it's designed to enable users to perform big data analytics. The reader will be provided an introduction to learning SAS for data management, analytics, and reporting, and get examples in each chapter to allow hands-on use of The Power to Know®, thereby teaching the reader how they can use SAS® software to further their career and improve their company’s business processes.
The mission of this book is to introduce the reader to what the SAS programming language offers and how the reader can use SAS®software to further their careers and improve their company’s business processes. As stated in the Money Magazine and Payscale article The 21 Most Valuable Career Skills Now, May 16, 2016 by Kerri Anne Renzulli,Cybele Weisser, and Megan Leonhardt, SAS® is the most valuable career skill. The study isolated the specific skills (from about 2,300) correlated with higher pay, advancement, and career opportunities. SAS was found to be the most valuable in terms of average increase in salary. I have programmed in a variety of computer languages, such as C, C++, Java, and scripting languages like korn shell, and I will say that one of the reasons I enjoy using SAS is that I am confident that I can get SAS to accomplish any type of computing task or project. Don't get me wrong; this doesn't imply that SAS is the best solution/tool to use for everything, but it does mean I can use it to accomplish a task if I really want too. Like any good programmers, I chose to use applications or tools that can efficiently accomplish the task at hand. In my career, I've found SAS to be the best solution to solve complex analytics-based business problems, and it is my hope that you will find this book a great introduction to SAS that will help you advance your own career.
The reader will be provided with an introduction to learning SAS for data management, analysis, and reporting, as well as examples in each chapter, which will allow them hands-on use of The Power to Know®.
While is it impossible to become an expert on everything SAS does within one book, it is possible to start down the path to learning the fundamentals of SAS, which unpin how everything in SAS works. As such, this book is meant to be an initial primer for those who want to start the process of learning SAS and who are interested in how SAS makes it easier to solve complex business problems in a timely, efficient way.
This book will dismiss some of the misconceptions some may have heard about SAS, such as you can’t learn SAS without buying a license (not true), SAS is difficult to use (not true), and so on. It should empower the reader to be better prepared to seek SAS certifications if they so choose.
This book uses the SAS® University Edition and a combination of the SAS Studio web-based interface and an iPython Jupyter Notebook for the hands-on examples. However, all the code examples are valid when submitted to any SAS 9.4 environment for execution.
Chapter 1, Setting Up the SAS® Software Environment, teaches how to install and use a free version of SAS that leverages both the SAS Studio and an iPython Jupyter Notebook as interfaces to work with SAS.
Chapter 2, Working with Data Using SAS® Software, shows how to use SAS to create data directly and how SAS can be used with external data sources. In addition, the reader will learn how data needs to be prepared differently to do analytics versus doing queries and reports.
Chapter 3, Data Preparation Using SAS Data Step and SAS Procedures, introduces using both SAS data step code as well as SAS procedures for preparing data for analysis and reporting. The reader will learn a couple of ways SAS can be used to transform data efficiently for doing analytics and learn about SAS macro programming.
Chapter 4, Analysis with SAS® Software, provides examples of performing descriptive and predictive analytics along with just one technique to improve the predictive power of a model. Furthermore, this chapter provides examples for doing forecasting as well as optimization.
Chapter 5, Reporting with SAS® Software, shows the reader how to use SAS Studio tasks and snippets to generate reports and graphs. In addition, it shows how to use some of the BASE SAS procedures and the ODS to deliver reports in different formats.
Chapter 6, Other Programming Languages in BASE SAS® Software, introduces two new languages, DS2 and FedSQL, which were developed in BASE SAS software and play important roles in performing big data analytics and moving the actual processing to where the data is stored.
Chapter 7, SAS® Software Engineers the Processing Environment for You, explains the importance that the SAS architecture plays in their analytics processing environment, which allows analytics to return important insights on big data in a timely manner.
Chapter 8, Why SAS Programmers Love SAS, wraps up the book and provides several examples of why SAS programmers love SAS and how analytics can be used across a variety of industries. It also discusses the importance of setting up an ACE and the roles and skills associated with this type of group.
The reader should be curious about how SAS can be used to analyze data of any size and have a PC or macOS that meets the requires to run the ;SAS® University Edition as a virtual application or a compatible web browser that can run the SAS® University Edition via an AWS. Chapter 1, Setting Up the SAS® Software Environment, provides more details on the specifics needed to run the SAS® University Edition.
SAS professionals and data analysts who wish to perform analytics on big data using SAS to gain actionable insights will find this book to be very useful. If you are a data science professional looking to perform large-scale analytics with SAS, this book will also help you. A basic understanding of SAS will be helpful but is not mandatory.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We will write some SAS code that will print Hello World."
A block of code is set as follows:
/* This is one way to add comments to your code */ data _null_; text="Hello World"; put text; run; * here is another way to add a comment or to comment out code;New terms and important words are shown in bold.
Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "We will primarily make use of the default SAS Programmer view for the examples within this book."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Big-Data-Analytics-with-SAS. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from
https://www.packtpub.com/sites/default/files/downloads/BigDataAnalyticswithSAS_ColorImages.pdf
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
What is SAS? If you had never heard of SAS, most likely you would not have picked up this book. You may have thought about the airline, Scandinavian Airline Systems (SAS), and wondered what an airline has to do with big data analytics. Other than the fact that airlines generate a lot of big data and they to analyze it just like any other business, we are not talking about the airline. This book is about the SAS Institute, which is officially described like this SAS is the world's largest privately held software company. Third-party guide for SAS trademarks, https://www.sas.com/en_us/legal/editorial-guidelines.html.
Privately held simply means the company is privately owned and does not sell stock. SAS, the software company that develops and sells SAS® software, has been the world's recognized leader as the best analytics platform for 41 years and counting. SAS is also the name of the fourth-generation programming language that provides the framework designed and engineered to do data management for analytics, provide advanced analytic capabilities, and provide multiple ways to deploy the results into production systems. This book will provide an introduction to this powerful solution, give you some hands-on experience, and provide you with knowledge about how SAS scales from small data to handle Big Data Analytics with SAS
