E-Book
26,39 €

Data Science with SQL Server Quick Start Guide E-Book

Dejan Sarka

0,0

26,39 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you.
This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment.
You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 219

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Data Science with SQL Server Quick Start Guide

Integrate SQL Server with data science

Dejan Sarka

BIRMINGHAM - MUMBAI

Data Science with SQL Server Quick Start Guide

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor: Reshma RamanContent Development Editor:Roshan KumarTechnical Editor:Sushmeeta JenaCopy Editor: Safis EditingProject Coordinator:Hardik BhindeProofreader: Safis EditingIndexer:Aishwarya GangawaneGraphics:Jason MonteiroProduction Coordinator:Shraddha Falebhai

First published: August 2018

Production reference: 1300818

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78953-712-3

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Dejan Sarka, MCT and Microsoft Data Platform MVP, is an independent trainer and consultant who focuses on the development of database and business intelligence applications. Besides projects, he spends about half his time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. He is the main author or co author of many books about databases and SQL Server. The last three books before this one were published by Packt, and their titles were SQL Server 2016 Developer's Guide,SQL Server 2017 Integration Services Cookbook, and SQL Server 2016 Developer's Guide. Dejan Sarka has also developed many courses and seminars for Microsoft, SolidQ, and Pluralsight.

I would like to thank to all of the wonderful people that made the writing process as smooth as possible. Without their enormous contribution, the writing would have been a very hard job for me. Special thanks goes to the content editors, Aditi Gour and Roshan Kumar. Tomaž Kaštrun reviewed my work a third time. As always, he was precise and constant. Thank you, Tomaž.

Finally, I want to thank my significant other, who has put up with me for more than 30 years.

About the reviewer

Tomaz Kastrun is an SQL Server developer and data scientist with more than 15 years, experience in the fields of business warehousing, development, ETL, database administration, and also data analysis and machine learning. He is a Microsoft Data Platform MVP, a blogger, and a frequent speaker at community and Microsoft events. When he is not working, drinking coffee, or riding a fixed-gear bike, he enjoys spending time with his daughter, Rubi.

I would like to express my deepest gratitude to my colleague, friend, and community leader, the author of this book, Dejan Sarka, for his immense work and the energy he brings to the SQL community every day, and for all the passion he puts into his work, sharing his wisdom and wit. Thank you.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Data Science with SQL Server Quick Start Guide

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Writing Queries with T-SQL

Before starting – installing SQL Server

SQL Server setup 

Core T-SQL SELECT statement elements

The simplest form of the SELECT statement

Joining multiple tables

Grouping and aggregating data

Advanced SELECT techniques

Introducing subqueries

Window functions

Common table expressions

Finding top n rows and using the APPLY operator

Summary

Introducing R

Obtaining R

Your first line R of code in R

Learning the basics of the R language

Using R data structures

Summary

Getting Familiar with Python

Selecting the Python environment

Writing your first python code

Using functions, branches, and loops

Organizing the data

Integrating SQL Server and ML

Summary

Data Overview

Getting familiar with a data science project life cycle

Ways to measure data values

Introducing descriptive statistics for continuous variables

Calculating centers of a distribution

Measuring the spread

Higher population moments

Using frequency tables to understand discrete variables

Showing associations graphically

Summary

Data Preparation

Handling missing values

Creating dummies

Discretizing continuous variables

Equal width discretization

Equal height discretization

Custom discretization

The entropy of a discrete variable

Advanced data preparation topics

Efficient grouping and aggregating in T-SQL

Leveraging Microsoft scalable libraries in Python

Using the dplyr package in R

Summary

Intermediate Statistics and Graphs

Exploring associations between continuous variables

Measuring dependencies between discrete variables

Discovering associations between continuous and discrete variables

Expressing dependencies with a linear regression formula

Summary

Unsupervised Machine Learning

Installing ML services (In-Database) packages 

Performing market-basket analysis

Finding clusters of similar cases

Principal components and factor analyses

Summary

Supervised Machine Learning

Evaluating predictive models

Using the Naive Bayes algorithm

Predicting with logistic regression

Trees, forests, and more trees

Predicting with T-SQL

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

The book will give you a jump-start in data science with Microsoft SQL Server and in-database Machine Learning Services (ML Services). It covers all stages of a data science project, from business and data understanding through data overview, data preparation, and modeling, to using algorithms, model evaluation, and deployment. The book shows how to use the engines and languages that come with SQL Server, including ML Services with R, Python, and Transact-SQL (T-SQL). You will find useful code examples in all three languages mentioned. The book also shows which algorithms to use for which tasks, and briefly explains each algorithm.

Who this book is for

SQL Server only started to fully support data science with its two latest versions, 2016 and 2017. Therefore, SQL Server is not widely used for data science yet. However, there are professionals from the worlds of SQL Server and data science who are interested in using SQL Server and ML Services for their projects. Therefore, this book is intended for SQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects.

What this book covers

Chapter 1, Writing Queries with T-SQL, gives a brief overview of T-SQL queries. It introduces all of the important parts of the mighty SELECT statement and focuses on analytical queries.

Chapter 2, Introducing R, introduces the second language in this book, R. R has been supported in SQL Server since version 2016. In order to use it properly, you have to understand the language constructs and data structures.

Chapter 3, Getting Familiar with Python, gives an overview of the second most popular data science language, Python. As a more general language, Python is probably even more popular than R. Lately, Python has been catching up with R in the data science field.

Chapter 4, Data Overview, deals with understanding data. You can use introductory statistics and basic graphs for this task. You will learn how to perform a data overview in all three languages used in this book.

Chapter 5, Data Preparation, teaches you how to work with the data that you get from your business systems and from data warehouses, which is typically not suited for direct use in a machine learning project. You need to add derived variables, deal with outliers and missing values, and more.

Chapter 6, Intermediate Statistics and Graphs, starts with the real analysis of the data. You can use intermediate-level statistical methods and graphs for the beginning of your advanced analytics journey.

Chapter 7, Unsupervised Machine Learning, explains the algorithms that do not use a target variable. It is like fishing in the mud - you try and see if some meaningful information can be extracted from your data. The most common undirected techniques are clustering, dimensionality reduction, and affinity grouping, also known as basket analysis or association rules.

Chapter 8, Supervised Machine Learning, deals with the algorithms that need a target variable. Some of the most important directed techniques include classification and estimation. Classification means examining a new case and assigning it to a predefined discrete class, for example, assigning keywords to articles and assigning customers to known segments. Next is estimation, where you try to estimate the value of a continuous variable of a new case. You can, for example, estimate the number of children or the family income. This chapter also shows you how you can evaluate your machine learning models and use them for predictions.

To get the most out of this book

In order to run the demo code associated with this book, you will need SQL Server 2017, SQL Server Management Studio, and Visual Studio 2017.

All of the information about the installation of the software needed to run the code is included in the first three chapters of the book.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packtpub.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Data-Science-with-SQL-Server-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/DataSciencewithSQLServerQuickStartGuide_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

# R version and contributorsR.version.stringcontributors()

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

1 + 22 + 5 * 43 ^ 4

sqrt(81)

Any command-line input or output is written as follows:

install.packages("RODBC")

library(RODBC)

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Writing Queries with T-SQL

This book is intended for any SQL Server developer or database administrator (DBA) who wants to start working in the data science field. In addition, this book is also aimed at existing data scientists who want to start using SQL Server with related services and tools. I will use, and show examples in, three programming languages in this book: Transact-SQL (or T-SQL), R, and Python. Therefore, it makes sense to start with a brief introduction of the three languages. This is what the first three chapters are about. If you are already a SQL Server developer, proficient in writing T-SQL queries, you can simply skip the first chapter. If you are already working with R, skip the second chapter. If you are familiar with Python, please feel free to skip the third chapter.

This first chapter is not a comprehensive reference guide to T-SQL; I will focus on the mighty SELECT statement only, the statement you need to use immediately when your data is located in a SQL Server database. However, besides the basic clauses, I will also explain advanced techniques, such as window functions, common table expressions, and the APPLY operator.

This chapter will cover the following points:

Core Transact-SQL SELECT statement elements

Advanced SELECT techniques

Before starting – installing SQL Server

If you don't have a SQL Server yet, you can use a free SQL Server Evaluation Edition or Developer Edition. You can download any of them from the SQL Server downloads site at https://www.microsoft.com/en-ca/sql-server/sql-server-downloads.

SQL Server setup

You just start SQL Server setup, and then from the Feature Selection page select the following:

Database Engine Services

Underneath

Machine Learning

(ML)

Services (In-Database)

With both languages,

and

Python

, selected, like you can see in the next screenshot

After that, all you need is client tools, and you can start writing the code. The following screenshot shows the SQL Server setup Feature Selection page with the appropriate features selected:

Figure 1.1: SQL Server Setup feature selection

The next step is to install client tools. Of course, you need SQL Server Management Studio (SSMS). You can download it at https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017. If you are not familiar with this tool, please use the SSMS at https://docs.microsoft.com/en-us/sql/ssms/tutorials/tutorial-sql-server-management-studio?view=sql-server-2017 to learn the basics about this tool.

In the next two chapters, I will explain what other tools you need to efficiently write R and Python code, respectively.

Finally, you need some demo data. I will mostly use the AdventureWorksDW2017 demo database in this book. You can find this demo database and other Microsoft samples at https://github.com/Microsoft/sql-server-samples/releases. I will point you to the appropriate sources when I use any other demo data in this book.

After you install everything mentioned so for, you are ready to start learning or renewing the knowledge of the T-SQL SELECT statement.

Core T-SQL SELECT statement elements

You probably already know that the most important SQL statement is the mighty SELECT statement you use to retrieve data from your databases. Every database developer knows the basic clauses and their usage:

SELECT

: Defines the columns returned, or a projection of all table columns

FROM

: Lists the tables used in the query and how they are associated, or joined

WHERE

: Filters the data to return only the rows that satisfy the condition in the predicate

GROUP BY

: Defines the groups over which the data is aggregated

HAVING

: Filters the data after the grouping with conditions that refer to aggregations

ORDER BY

: Sorts the rows returned to the client application