31,19 €
R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment.
This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services.
Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 267
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Amey VarangaonkarAcquisition Editor:Prachi BishtContent Development Editor: Deepti ThoreTechnical Editor:Nilesh SawakhandeCopy Editors: Safis Editing, Laxmi SubramanianProject Coordinator:Shweta H BirwatkarProofreader: Safis EditingIndexer:Mariammal ChettiyarGraphics:Tania DuttaProduction Coordinator:Shantanu Zagade
First published: February 2018
Production reference: 1260218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78728-357-2
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Tomaž Kaštrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Julie Koesmarnois a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Marlon Ribunal is a data professional primarily focusing on the Microsoft stack. His work experience includes but is not limited to database administration, SQL development, query and performance tuning, ETL, and BI. He is the primary author of SQL Server 2012 Reporting Services Blueprints. His love for continuous learning is leading him toward the path of big data and data science, which he is gearing up for in his next adventure.
Dave Wentzel is the CTO of Capax Global, a data and cloud consultancy. For years he worked at big ISVs, dealing with the scalability limitations of traditional databases. With the advent of Hadoop and big data technologies, things that were impossible to do with data were suddenly within reach.
Prior to Capax, he worked at Microsoft, assisting customers with Azure big data solutions. Success for him is solving challenging problems at companies he respects with talented people whom he admires.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
SQL Server 2017 Machine Learning Services with R
www.PacktPub.com
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to R and SQL Server
Using R prior to SQL Server 2016
Microsoft's commitment to the open source R language
Boosting analytics with SQL Server R integration
Summary
Overview of Microsoft Machine Learning Server and SQL Server
Analytical barriers
The Microsoft Machine learning R Server platform
Microsoft R Open (MRO)
Microsoft Machine Learning R Server
Microsoft SQL Server Machine Learning R Services
R Tools for Visual Studio (RTVS)
The Microsoft Machine Learning R Services architecture
R Limitations
Performance issues
Memory limitations
Security aspects
Language syntax
Summary
Managing Machine Learning Services for SQL Server 2017 and R
Minimum requirements
Choosing the edition
Configuring the database
Configuring the environment and installing R Tools for Visual Studio (RTVS)
Security
Resource Governor
Installing new R packages
Package information
Using R Tools for Visual Studio (RTVS) 2015 or higher
Using R.exe in CMD
Using XP_CMDSHELL
Copying files
Using the rxInstallPackages function
Managing SQL Server R Services with PowerShell
Getting to know the sp_execute_external_script external procedure
Arguments
Summary
Data Exploration and Data Visualization
Understanding SQL and R data types
Data frames in R
Data exploration and data munging
Importing SQL Server data into R
Exploring data in R
Data munging in R
Adding/removing rows/columns in data frames
More data munging with dplyr
Finding missing values
Transpose data
Pivot / Unpivot data
Example - data exploration and munging using R in T-SQL
Data visualization in R
Plot
Histogram
Boxplot
Scatter plot
Tree diagram
Example – R data visualization in T-SQL
Integrating R code in reports and visualizations
Integrating R in SSRS reports
Integrating R in Power BI
Summary
RevoScaleR Package
Overcomming R language limitations
Scalable and distributive computational environments
Functions for data preparation
Data import from SAS, SPSS, and ODBC
Importing SAS data
Importing SPSS data
Importing data using ODBC
Variable creation and data transformation
Variable creation and recoding
Dataset subsetting
Dataset merging
Functions for descriptive statistics
Functions for statistical tests and sampling
Summary
Predictive Modeling
Data modeling
Advanced predictive algorithms and analytics
Deploying and using predictive solutions
Performing predictions with R Services in the SQL Server database
Summary
Operationalizing R Code
Integrating an existing R model
Prerequisite – prepare the data
Step 1 – Train and save a model using T-SQL
Step 2 – Operationalize the model
Fast batch prediction
Prerequisites
Real-time scoring
Native scoring
Integrating the R model for fast batch prediction
Step 1 – Train and save a real-time scoring model using T-SQL
Step 2a – Operationalize the model using real-time scoring
Step 2b – Operationalize the model using native scoring
Managing roles and permissions for workloads
Extensibility framework workloads
Fast batch prediction workloads
External packages
Tools
Using SSMS as part of operationalizing R script
Using custom reports for SQL Server R Services
Adding the custom reports for the first time
Viewing an R Services custom report
Managing SQL Server Machine Learning Services with DMVs
System configuration and system resources
Resource governor
Operationalizing R code with Visual Studio
Integrating R workloads and prediction operations beyond SQL Server
Executing SQL Server prediction operations via PowerShell
Scheduling training and prediction operations
Operationalizing R script as part of SSIS
Summary
Deploying, Managing, and Monitoring Database Solutions containing R Code
Integrating R into the SQL Server Database lifecycle workflow
Preparing your environment for the database lifecycle workflow
Prerequisites for this chapter
Creating the SQL Server database project
Importing an existing database into the project
Adding a new stored procedure object
Publishing schema changes
Adding a unit test against a stored procedure
Using version control
Setting up continuous integration
Creating a build definition in VSTS
Deploying the build to a local SQL Server instance
Adding the test phase to the build definition
Automating the build for CI
Setting up continuous delivery
Monitoring the accuracy of the productionized model
Useful references
Summary
Machine Learning Services with R for DBAs
Gathering relevant data
Exploring and analyzing data
Creating a baseline and workloads, and replaying
Creating predictions with R - disk usage
Summary
R and SQL Server 2016/2017 Features Extended
Built-in JSON capabilities
Accessing external data sources using PolyBase
High performance using ColumnStore and in memory OLTP
Testing rxLinMod performance on a table with a primary key
Testing rxLinMod performance on a table with a clustered ColumnStore index
Testing rxLinMod performance on a memory-optimized table with a primary key
Testing rxLinMod performance on a memory-optimized table with a clustered ColumnStore index
Comparing results
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
SQL Server has supported machine learning capabilities since SQL Server 2016. Previously known as SQL Server 2016 R Services, SQL Server 2017 Machine Learning Services come in two flavors, R and Python. This book provides hands-on reference and learning materials for data professionals, DBAs, and data scientists on how to install, develop, deploy, maintain, and administer data science and advanced analytical solutions using SQL Server with R. Whether you are new to SQL Server, or an experienced SQL Server professional, Hands-on SQL Server Machine Learning Services with R contains practical explanations, tips, and examples to enable you to make the most of keeping analytics close to the data for better efficiency and security.
This book is for data analysts, data scientists, and database administrators with some or no experience in R, and who are eager to deliver practical data science solutions easily in their day-to-day work or future projects using SQL Server.
Chapter 1, Introduction to R and SQL Server, begins our data science journey in SQL Server, prior to SQL Server 2016, and brings us to today's SQL Server R integration.
Chapter 2, Overview of Microsoft Machine Learning Server and SQL Server, gives a brief outline and overview of Microsoft Machine Learning Server with an emphasis on SQL Server Machine Learning Services, while exploring how it works and the different versions of R environment. This includes key discussions on the architecture behind it, different computational environments, how the integration among systems work, and how to achieve parallelism and load distribution.
Chapter 3, Managing Machine Learning Services for SQL Server 2017 and R, covers the installation and setup, including how to use PowerShell. It covers exploring the capabilities of a resource governor, setting up roles and security for users to work with SQL Server Machine Learning Services with R, working with sessions and logs, installing any missing or additional R packages for data analysis or predictive modeling, and taking the first steps with using thesp_execute_external_scriptexternal procedure.
Chapter 4,Data Exploration and Data Visualization, explores the R syntax for data browsing, analysis, munging, and wrangling for visualization and predictive analysis. Developing these techniques is essential to the next steps covered in this chapter and throughout this book. This chapter introduces various useful R packages for visualization and predictive modeling. In addition, readers will learn how to integrate R with Power BI,SQL Server Reporting Services(SSRS), and mobile reports.
Chapter 5,RevoScaleR Package, discusses the advantages of using RevoScaleR for scalable and distributed statistical computation over large datasets. Using RevoScaleR improves CPU and RAM utilization and improves performance. This chapter introduces readers to RevoScaleR functions on data preparation, descriptive statistics, statistical tests, and sampling, as well as predictive modeling.
Chapter 6,Predictive Modeling, focuses on helping readers who are stepping into the world of prediction modeling for the first time. Using SQL Server and SQL Server Machine Learning Services with R, readers will learn how to create predictions, perform data modeling, explore advanced predictive algorithms available in RevoScaleR and other packages, and how to easily deploy the models and solutions. Finally, calling and running predictions and exposing the results to different proprietary tools (such as Power BI, Excel, and SSRS) complete the world of prediction modeling.
Chapter 7,Operationalizing R Code, provides tips and tricks in operationalizing R code and R predictions. Readers will learn the importance as stable and reliable process flows are essential to combining R code, persistent data, and prediction models in production. In this chapter, readers will have a chance to explore ways to adopt existing and create new R code, followed by integrating this in SQL Server through various readily available client tools such asSQL Server Management Studio(SSMS) and Visual Studio. Furthermore, this chapter covers how readers can use SQL Server Agent jobs, stored procedures, CLR with .NET, and PowerShell to productized R code.
Chapter 8,Deploying, Managing, and Monitoring Database Solutions containing R Code, covers how to manage deployment and change control to database deployment when integrating R code. This chapter provides guidelines on how to do an integrated deployment of the solution and how to implement continuous integration, including automated deployment and how to manage the version control. Here, readers will learn efficient ways to monitor the solution, monitor the effectiveness of the code, and predictive models after the solution is deployed.
Chapter 9, Machine Learning Services with R for DBAs, examines and explores monitoring, performance, and troubleshooting for daily, weekly, and monthly tasks the DBAs are doing. Using simple examples showing that R Services can also be useful for other roles involved in SQL Server, this chapter shows how R Services integrated in SQL Server enables DBAs to be more empowered by evolving their rudimentary monitoring activities into more useful actionable predictions.
Chapter 10, R and SQL Server 2016/2017 Features Extended, covers how new features of SQL Server 2016 and 2017 and R services can be used together, such as taking advantage of the new JSON format with the R language, using new improvements to the in-memory OLTP technology to deliver almost real-time analytics, combining new features in Column store index and R, and how to get the most out of them. It also considers how to leverage PolyBase and Stretch DB to reach beyond on-premises to hybrid and cloud possibilities. Lastly, the query store holds many statistics from execution plans, and R is a perfect tool to perform deeper analysis.
In order to work with SQL Server Machine Learning Services, and to run the code examples found in this book, the following software will be required:
SQL Server 2016 and/or SQL Server 2017 Developer or Enterprise Edition
SQL Server Management Studio (SSMS)
R IDE such as R Studio or Visual Studio 2017 with RTVS extension
Visual Studio 2017 Community edition with the following extensions installed:
R Tools for Visual Studio (RTVS)
SQL Server Data Tools (SSDT)
VisualStudio.com online account
The chapters in this book go through the installation and configuration steps as the software is introduced.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/SQL-Server-2017-Machine-Learning-Services-with-R. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/SQLServer2017MachineLearningServiceswithR_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
SQL Server 2016 came with great new features, and among them was R integration into SQL Server, partly with advanced analytics and partly with new programmability capabilities. Microsoft R Services for SQL Server is part of the family of new extensibilities for highly scalable and parallel advanced analytics. R Services allows you to perform advanced analytics (statistical, multivariate statistics, predictive analytics, machine learning, and deep learning) on large quantities of data stored in the database. Microsoft published R Services as part of Microsoft R Server (MRS), which was specially designed for reading data directly from the SQL Server database within the same SQL Server computational context.
We will cover the following aspects in this chapter:
Using R prior to SQL Server 2016
Microsoft's commitment on open source R language
Boosting analytics with SQL Server R integration
Outline of the book
With a growing popularity and community, R has become and continues to be a big player in the field of advanced analytics and data visualization. R and machine learning servers (or services) are not just buzzword that will be forgotten in the next cycle of SQL Server, but it is infiltrating more and more into different layers of open source and corporate software. In the past five years, many big analytical players have introduced R integration, interpreters, and wrappers for the R language, because of the language's practicality, usability, and inter-disciplinarily and open source orientation. As Microsoft's making a bold and strategic move toward being open source friendly, the use cases for integrating R in SQL Server are growing, making this move even more natural and at the right point in time. This move had been very well appreciated in the SQL community and the business as well.
In comparison to other big analytical tools, Microsoft took integration very seriously. It addressed many of the issues and limitations of the language itself, and created complete integration of R with the SQL Server in order to give the best user experience. Many competitors (such as SAS, IBM, SAP, and Oracle) have done similar integration, but failed to take into account many aspects that contribute to a holistic user experience. Microsoft has announced that joining the R consortium will give them the ability to help the development of the R language and to support future development. In addition, Microsoft has created its own package repository called MRAN (from CRAN, where M stands for Microsoft) and is giving support and SLA agreement for R as well, even though the language and engine is based on Open R (a free, open-sourced version). All these steps tell us how dedicated Microsoft is in bringing an open source, statistical and programming language into the SQL Server environment.
We can only expect more R integration into other services. For example, Power BI supports native R visuals (https://powerbi.microsoft.com/en-us/blog/r-powered-custom-visuals) since October 2016, and R language since December 2015. Therefore, I am a strong believer that R will soon be part of the whole SQL Server ecosystem such as SSAS, SSIS, and SSRS natively as well. With Azure Analysis Services, R is again one step closer to analysis services.
Data science is in the forefront of the SQL Server and R integration. Every task performed by DBA, sysadmin, the analyst, wrangler, or any other role that is working with SQL server can have these tasks supported with any kind of statistics, data correlation, data analysis, or data prediction. R integration should not be restricted only to the fields of data science. Instead, it should be explored and used in all tasks. DBA can gain from R integration by using switching from monitoring tasks to understanding and predicting what might or will happen next. Likewise, this idea can be applied to sysadmin, data wranglers, and so on. R integration also brings different roles of people closer to understand statistics, metrics, measures, and learn how to improve them by using statistical analysis and predictions.
Besides bringing siloed individual teamwork into more coherent and cohesive teams, R integration also brings less data movement, because different users can now—with the help of R code—execute, drill down, and feel the data, instead of waiting to have data first prepared, exported, and imported again. With smoother workflows comes faster time to deployment, whether it is a simple report, a predictive model, or analysis. This allows the boundaries of data ownership to shift into insights ownership, which is a positive aspect of faster reactions to business needs.