SQL Server 2017 Machine Learning Services with R - Tomaž Kaštrun - E-Book

SQL Server 2017 Machine Learning Services with R E-Book

Tomaž Kaštrun

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment.
This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services.
Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 267

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



SQL Server 2017 Machine Learning Services with R
Data exploration, modeling, and advanced analytics
Tomaž Kaštrun
Julie Koesmarno
BIRMINGHAM - MUMBAI

SQL Server 2017 Machine Learning Services with R

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor:Amey VarangaonkarAcquisition Editor:Prachi BishtContent Development Editor: Deepti ThoreTechnical Editor:Nilesh SawakhandeCopy Editors: Safis Editing, Laxmi SubramanianProject Coordinator:Shweta H BirwatkarProofreader: Safis EditingIndexer:Mariammal ChettiyarGraphics:Tania DuttaProduction Coordinator:Shantanu Zagade

First published: February 2018

Production reference: 1260218

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78728-357-2

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Tomaž Kaštrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.

I would firstly like to thank my dear friend and co-author, Julie Koesmarno, for making this book a pleasing journey. It was also a great pleasure to work with Deepti Thore and Nilesh Sawakhande. Special thanks to my daughter, Rubi Kaštrun, for your patience while writing this book and your inspiration. Thanks to my family's support. Thanks to the SQL Server community and friends, for kindling my curiosity with all the questions.

Julie Koesmarnois a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.

Special thanks to my coauthor, Tomaž Kaštrun, for his expertise, passion, and dedication to this book and to the Data Community—it has been a dream and a goal to work with him. Thanks to our editors and the team at Packt Publishing for their professionalism. Also, thanks to my husband for his support throughout this writing journey, as well as to my inspiring friend, Hamish Watson, for sharing his CI/CD best practices with me.

About the reviewers

Marlon Ribunal is a data professional primarily focusing on the Microsoft stack. His work experience includes but is not limited to database administration, SQL development, query and performance tuning, ETL, and BI. He is the primary author of SQL Server 2012 Reporting Services Blueprints. His love for continuous learning is leading him toward the path of big data and data science, which he is gearing up for in his next adventure.

Thank you to the SQL Server Community. #SQLFamily is the best! Let's keep inspiring and supporting one another.

Dave Wentzel is the CTO of Capax Global, a data and cloud consultancy. For years he worked at big ISVs, dealing with the scalability limitations of traditional databases. With the advent of Hadoop and big data technologies, things that were impossible to do with data were suddenly within reach.

Prior to Capax, he worked at Microsoft, assisting customers with Azure big data solutions. Success for him is solving challenging problems at companies he respects with talented people whom he admires.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

SQL Server 2017 Machine Learning Services with R

www.PacktPub.com

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introduction to R and SQL Server

Using R prior to SQL Server 2016

Microsoft's commitment to the open source R language

Boosting analytics with SQL Server R integration

Summary

Overview of Microsoft Machine Learning Server and SQL Server

Analytical barriers

The Microsoft Machine learning R Server platform

Microsoft R Open (MRO)

Microsoft Machine Learning R Server

Microsoft SQL Server Machine Learning R Services

R Tools for Visual Studio (RTVS)

The Microsoft Machine Learning R Services architecture

R Limitations

Performance issues

Memory limitations

Security aspects

Language syntax

Summary

Managing Machine Learning Services for SQL Server 2017 and R

Minimum requirements

Choosing the edition

Configuring the database

Configuring the environment and installing R Tools for Visual Studio (RTVS)

Security

Resource Governor

Installing new R packages

Package information

Using R Tools for Visual Studio (RTVS) 2015 or higher

Using R.exe in CMD

Using XP_CMDSHELL

Copying files

Using the rxInstallPackages function

Managing SQL Server R Services with PowerShell

Getting to know the sp_execute_external_script external procedure

Arguments

Summary

Data Exploration and Data Visualization

Understanding SQL and R data types

Data frames in R

Data exploration and data munging

Importing SQL Server data into R

Exploring data in R

Data munging in R

Adding/removing rows/columns in data frames

More data munging with dplyr

Finding missing values

Transpose data

Pivot / Unpivot data

Example - data exploration and munging using R in T-SQL

Data visualization in R

Plot

Histogram

Boxplot

Scatter plot

Tree diagram

Example – R data visualization in T-SQL

Integrating R code in reports and visualizations

Integrating R in SSRS reports

Integrating R in Power BI

Summary

RevoScaleR Package

Overcomming R language limitations

Scalable and distributive computational environments

Functions for data preparation

Data import from SAS, SPSS, and ODBC

Importing SAS data

Importing SPSS data

Importing data using ODBC

Variable creation and data transformation

Variable creation and recoding

Dataset subsetting

Dataset merging

Functions for descriptive statistics

Functions for statistical tests and sampling

Summary

Predictive Modeling

Data modeling

Advanced predictive algorithms and analytics

Deploying and using predictive solutions

Performing predictions with R Services in the SQL Server database

Summary

Operationalizing R Code

Integrating an existing R model

Prerequisite – prepare the data

Step 1 – Train and save a model using T-SQL

Step 2 – Operationalize the model

Fast batch prediction

Prerequisites

Real-time scoring

Native scoring

Integrating the R model for fast batch prediction

Step 1 – Train and save a real-time scoring model using T-SQL

Step 2a – Operationalize the model using real-time scoring

Step 2b – Operationalize the model using native scoring

Managing roles and permissions for workloads

Extensibility framework workloads

Fast batch prediction workloads

External packages

Tools

Using SSMS as part of operationalizing R script

Using custom reports for SQL Server R Services

Adding the custom reports for the first time

Viewing an R Services custom report

Managing SQL Server Machine Learning Services with DMVs

System configuration and system resources

Resource governor

Operationalizing R code with Visual Studio

Integrating R workloads and prediction operations beyond SQL Server

Executing SQL Server prediction operations via PowerShell

Scheduling training and prediction operations

Operationalizing R script as part of SSIS

Summary

Deploying, Managing, and Monitoring Database Solutions containing R Code

Integrating R into the SQL Server Database lifecycle workflow

Preparing your environment for the database lifecycle workflow

Prerequisites for this chapter

Creating the SQL Server database project

Importing an existing database into the project

Adding a new stored procedure object

Publishing schema changes

Adding a unit test against a stored procedure

Using version control

Setting up continuous integration

Creating a build definition in VSTS

Deploying the build to a local SQL Server instance

Adding the test phase to the build definition

Automating the build for CI

Setting up continuous delivery

Monitoring the accuracy of the productionized model

Useful references

Summary

Machine Learning Services with R for DBAs

Gathering relevant data

Exploring and analyzing data

Creating a baseline and workloads, and replaying

Creating predictions with R - disk usage

Summary

R and SQL Server 2016/2017 Features Extended

Built-in JSON capabilities

Accessing external data sources using PolyBase

High performance using ColumnStore and in memory OLTP

Testing rxLinMod performance on a table with a primary key

Testing rxLinMod performance on a table with a clustered ColumnStore index

Testing rxLinMod performance on a memory-optimized table with a primary key

Testing rxLinMod performance on a memory-optimized table with a clustered ColumnStore index

Comparing results

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

SQL Server has supported machine learning capabilities since SQL Server 2016. Previously known as SQL Server 2016 R Services, SQL Server 2017 Machine Learning Services come in two flavors, R and Python. This book provides hands-on reference and learning materials for data professionals, DBAs, and data scientists on how to install, develop, deploy, maintain, and administer data science and advanced analytical solutions using SQL Server with R. Whether you are new to SQL Server, or an experienced SQL Server professional, Hands-on SQL Server Machine Learning Services with R contains practical explanations, tips, and examples to enable you to make the most of keeping analytics close to the data for better efficiency and security.

Who this book is for

This book is for data analysts, data scientists, and database administrators with some or no experience in R, and who are eager to deliver practical data science solutions easily in their day-to-day work or future projects using SQL Server.

What this book covers

Chapter 1, Introduction to R and SQL Server, begins our data science journey in SQL Server, prior to SQL Server 2016, and brings us to today's SQL Server R integration.

Chapter 2, Overview of Microsoft Machine Learning Server and SQL Server, gives a brief outline and overview of Microsoft Machine Learning Server with an emphasis on SQL Server Machine Learning Services, while exploring how it works and the different versions of R environment. This includes key discussions on the architecture behind it, different computational environments, how the integration among systems work, and how to achieve parallelism and load distribution.

Chapter 3, Managing Machine Learning Services for SQL Server 2017 and R, covers the installation and setup, including how to use PowerShell. It covers exploring the capabilities of a resource governor, setting up roles and security for users to work with SQL Server Machine Learning Services with R, working with sessions and logs, installing any missing or additional R packages for data analysis or predictive modeling, and taking the first steps with using thesp_execute_external_scriptexternal procedure.

Chapter 4,Data Exploration and Data Visualization, explores the R syntax for data browsing, analysis, munging, and wrangling for visualization and predictive analysis. Developing these techniques is essential to the next steps covered in this chapter and throughout this book. This chapter introduces various useful R packages for visualization and predictive modeling. In addition, readers will learn how to integrate R with Power BI,SQL Server Reporting Services(SSRS), and mobile reports.

Chapter 5,RevoScaleR Package, discusses the advantages of using RevoScaleR for scalable and distributed statistical computation over large datasets. Using RevoScaleR improves CPU and RAM utilization and improves performance. This chapter introduces readers to RevoScaleR functions on data preparation, descriptive statistics, statistical tests, and sampling, as well as predictive modeling.

Chapter 6,Predictive Modeling, focuses on helping readers who are stepping into the world of prediction modeling for the first time. Using SQL Server and SQL Server Machine Learning Services with R, readers will learn how to create predictions, perform data modeling, explore advanced predictive algorithms available in RevoScaleR and other packages, and how to easily deploy the models and solutions. Finally, calling and running predictions and exposing the results to different proprietary tools (such as Power BI, Excel, and SSRS) complete the world of prediction modeling.

Chapter 7,Operationalizing R Code, provides tips and tricks in operationalizing R code and R predictions. Readers will learn the importance as stable and reliable process flows are essential to combining R code, persistent data, and prediction models in production. In this chapter, readers will have a chance to explore ways to adopt existing and create new R code, followed by integrating this in SQL Server through various readily available client tools such asSQL Server Management Studio(SSMS) and Visual Studio. Furthermore, this chapter covers how readers can use SQL Server Agent jobs, stored procedures, CLR with .NET, and PowerShell to productized R code.

Chapter 8,Deploying, Managing, and Monitoring Database Solutions containing R Code, covers how to manage deployment and change control to database deployment when integrating R code. This chapter provides guidelines on how to do an integrated deployment of the solution and how to implement continuous integration, including automated deployment and how to manage the version control. Here, readers will learn efficient ways to monitor the solution, monitor the effectiveness of the code, and predictive models after the solution is deployed.

Chapter 9, Machine Learning Services with R for DBAs, examines and explores monitoring, performance, and troubleshooting for daily, weekly, and monthly tasks the DBAs are doing. Using simple examples showing that R Services can also be useful for other roles involved in SQL Server, this chapter shows how R Services integrated in SQL Server enables DBAs to be more empowered by evolving their rudimentary monitoring activities into more useful actionable predictions.

Chapter 10, R and SQL Server 2016/2017 Features Extended, covers how new features of SQL Server 2016 and 2017 and R services can be used together, such as taking advantage of the new JSON format with the R language, using new improvements to the in-memory OLTP technology to deliver almost real-time analytics, combining new features in Column store index and R, and how to get the most out of them. It also considers how to leverage PolyBase and Stretch DB to reach beyond on-premises to hybrid and cloud possibilities. Lastly, the query store holds many statistics from execution plans, and R is a perfect tool to perform deeper analysis.

To get the most out of this book

In order to work with SQL Server Machine Learning Services, and to run the code examples found in this book, the following software will be required:

SQL Server 2016 and/or SQL Server 2017 Developer or Enterprise Edition

SQL Server Management Studio (SSMS)

R IDE such as R Studio or Visual Studio 2017 with RTVS extension

Visual Studio 2017 Community edition with the following extensions installed:

R Tools for Visual Studio (RTVS)

SQL Server Data Tools (SSDT)

VisualStudio.com online account

The chapters in this book go through the installation and configuration steps as the software is introduced.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/SQL-Server-2017-Machine-Learning-Services-with-R. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/SQLServer2017MachineLearningServiceswithR_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Introduction to R and SQL Server

SQL Server 2016 came with great new features, and among them was R integration into SQL Server, partly with advanced analytics and partly with new programmability capabilities. Microsoft R Services for SQL Server is part of the family of new extensibilities for highly scalable and parallel advanced analytics. R Services allows you to perform advanced analytics (statistical, multivariate statistics, predictive analytics, machine learning, and deep learning) on large quantities of data stored in the database. Microsoft published R Services as part of Microsoft R Server (MRS), which was specially designed for reading data directly from the SQL Server database within the same SQL Server computational context.

We will cover the following aspects in this chapter:

Using R prior to SQL Server 2016

Microsoft's commitment on open source R language

Boosting analytics with SQL Server R integration

Outline of the book

Microsoft's commitment to the open source R language

With a growing popularity and community, R has become and continues to be a big player in the field of advanced analytics and data visualization. R and machine learning servers (or services) are not just buzzword that will be forgotten in the next cycle of SQL Server, but it is infiltrating more and more into different layers of open source and corporate software. In the past five years, many big analytical players have introduced R integration, interpreters, and wrappers for the R language, because of the language's practicality, usability, and inter-disciplinarily and open source orientation. As Microsoft's making a bold and strategic move toward being open source friendly, the use cases for integrating R in SQL Server are growing, making this move even more natural and at the right point in time. This move had been very well appreciated in the SQL community and the business as well.

In comparison to other big analytical tools, Microsoft took integration very seriously. It addressed many of the issues and limitations of the language itself, and created complete integration of R with the SQL Server in order to give the best user experience. Many competitors (such as SAS, IBM, SAP, and Oracle) have done similar integration, but failed to take into account many aspects that contribute to a holistic user experience. Microsoft has announced that joining the R consortium will give them the ability to help the development of the R language and to support future development. In addition, Microsoft has created its own package repository called MRAN (from CRAN, where M stands for Microsoft) and is giving support and SLA agreement for R as well, even though the language and engine is based on Open R (a free, open-sourced version). All these steps tell us how dedicated Microsoft is in bringing an open source, statistical and programming language into the SQL Server environment.

We can only expect more R integration into other services. For example, Power BI supports native R visuals (https://powerbi.microsoft.com/en-us/blog/r-powered-custom-visuals) since October 2016, and R language since December 2015. Therefore, I am a strong believer that R will soon be part of the whole SQL Server ecosystem such as SSAS, SSIS, and SSRS natively as well. With Azure Analysis Services, R is again one step closer to analysis services.

Boosting analytics with SQL Server R integration

Data science is in the forefront of the SQL Server and R integration. Every task performed by DBA, sysadmin, the analyst, wrangler, or any other role that is working with SQL server can have these tasks supported with any kind of statistics, data correlation, data analysis, or data prediction. R integration should not be restricted only to the fields of data science. Instead, it should be explored and used in all tasks. DBA can gain from R integration by using switching from monitoring tasks to understanding and predicting what might or will happen next. Likewise, this idea can be applied to sysadmin, data wranglers, and so on. R integration also brings different roles of people closer to understand statistics, metrics, measures, and learn how to improve them by using statistical analysis and predictions.

Besides bringing siloed individual teamwork into more coherent and cohesive teams, R integration also brings less data movement, because different users can now—with the help of R code—execute, drill down, and feel the data, instead of waiting to have data first prepared, exported, and imported again. With smoother workflows comes faster time to deployment, whether it is a simple report, a predictive model, or analysis. This allows the boundaries of data ownership to shift into insights ownership, which is a positive aspect of faster reactions to business needs.