Introducing Microsoft SQL Server 2019
Reliability, scalability, and security both on premises and in the cloud
Kellyn Gorman, Allan Hirt, Dave Noderer, Mitchell Pearson, James Rowland-Jones, Dustin Ryan, Arun Sirpal, and Buck Woody
Introducing Microsoft SQL Server 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Authors: Kellyn Gorman, Allan Hirt, Dave Noderer, Mitchell Pearson, James Rowland-Jones, Dustin Ryan, Arun Sirpal, and Buck Woody
Additional material: Mitchell Pearson
Managing Editor: Alexander Mazonowicz
Acquisitions Editor: Alicia Wooding
Production Editor: Nitesh Thakur
Editorial Board: Shubhopriya Banerjee, Bharat Botle, Ewan Buckingham, Mahesh Dhyani, Taabish Khan, Manasa Kumar, Alex Mazonowicz, Pramod Menon, Bridget Neale, Dominic Pereira, Shiny Poojary, Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray
First Published: April 2020
Production Reference: 1160420
ISBN: 978-1-83882-621-5
Published by Packt Publishing Ltd.
Livery Place, 35 Livery Street
Birmingham B3 2PB, UK
About the Authors
Kellyn Gorman is an Azure Data Platform Architect for Microsoft with a long history in multi-platform technology. She spends a 60/40% split between Oracle on Azure and Analytics with her present team at Microsoft. A recent Idera ACE, a current friend of Redgate in the Microsoft community and an Oracle ACE Director alumnus, she has been awarded numerous awards over the years for her technical contributions and community volunteerism. She is one of only six women part of the Oak Table, a network for the Oracle scientist. She has extensive experience in environment optimization, automation and architect of robust environments, specializing in multi-terabyte management of OLAP/DSS systems. A consistent advocate for logical tuning of code and design before hardware solutions. She's recently become known for her expertise in DevOps, Enterprise Manager, AWR, (Automatic Workload Repository) Warehouse and virtualization of database environments with complex cloud management. The technical knowledge required to support these features offers great educational opportunities to learn by attending her technical presentations, engaging with her on social media presence as DBAKevlar or reading her blog, https://dbakevlar.com/.
Boston-based Cloud and Data Center and Data Plaform Dual MVP Allan Hirt has been working with SQL Server since 1992 and clustering in Windows Server since the days just after Wolfpack. He got his start with databases as an intern at SQL Solutions, which then got purchased by Sybase where he remained an intern until the end of college. Allan has used every version of SQL Server that Microsoft has released for Windows. He founded his own company, Megahirtz, in 2007 and is now partners with Max Myrick in SQLHA.
You will often find Allan speaking at local user groups, SQL Saturdays, and various conferences like PASS Summit, SQLBits, and TechEd as well as doing various webcasts during the years. He has authored quite a bit of content over the years including articles for SQL Server Magazine and whitepapers for Microsoft that are up on TechNet and MSDN. He is the author or co-author of a quite a few books, and is working on his latest, Mission Critical SQL Server, which will be due out soon.
Dave Noderer is the CEO / President and founder of Computer Ways, Inc., a software development company in Deerfield Beach, FL. Dave is an electrical engineer by training, designed computers for 20 years and has been writing software since founding Computer Ways, Inc. in 1994. Dave spent three years as an officer and director of INETA (International .NET Association) where he oversaw the enrollment and support of hundreds of user groups worldwide and 16 years as a Microsoft MVP. He co-founded Florida .NET User groups in 2001 and has been holding meetups in South Florida ever since. Since 2005, he has led the annual, free South Florida Code Camp. This event attracts over 1000 developer attendees. Dave is involved in local community activities as a board member of the Deerfield Beach Historical Society, the Hillsboro Lighthouse Society, and TechLauderdale.org.
James Rowland-Jones (JRJ) is a Principal Program Manager at Microsoft. He is currently part of the SQL Server team working on SQL Server 2019 Big Data Clusters and data virtualization. Prior to joining SQL Server, JRJ worked extensively on Azure SQL Data Warehouse. He helped the team launch Gen 1 of the service and led the product management effort to bring Gen 2 into preview.
JRJ is passionate about delivering highly scalable solutions that are creative, simple and elegant. He is also a keen advocate for the worldwide SQL community; previously serving on the Board of Directors for PASS while also helping to build SQLBits—Europe's largest data event. JRJ was awarded Microsoft's MVP accreditation from 2008 to 2015 for his services to the community.
For Jane, Lucy, Kate, and Oliver. Forever x.
Dustin Ryan is a Senior Cloud Solution Architect at Microsoft. Dustin has worked in the business intelligence and data warehousing field since 2008, has spoken at community events such as SQL Saturday, SQL Rally, and PASS Summit, and has a wide range of experience designing and building solutions featuring SQL Server and Azure. Prior to his time at Microsoft, Dustin worked as a business intelligence consultant and trainer for Pragmatic Works. Dustin is also an author, contributor, and technical editor of books.
Dustin resides outside Jacksonville, Florida with his wife, three children, and a three-legged cat and enjoys spending time with his family and serving at his local church.
Arun Sirpal is a SQL Server consultant and currently a Microsoft Data Platform MVP. Specializing in both SQL Server and Microsoft Azure, he has over 12 years' experience architecting, designing, and performance tuning physical and virtualized SQL Servers and has a wealth of experience with designing solutions using the Azure Data Platform including Azure SQL Database, Azure SQL Database Managed Instances, elastic pools, and hybrid concepts. Arun is also a frequent writer, speaker, and technical reviewer and a member of Microsoft's Azure Advisors and SQL Advisors groups.
Buck Woody works on the Azure Data Services team at Microsoft and uses data and technology to solve business and science problems. With over 35 years of professional and practical experience in computer technology, he is also a popular speaker at conferences around the world; author of over 700 articles and eight books on databases, machine learning, and R, he also sits on various Data Science Boards at two US Universities, and specializes in advanced data analysis techniques.
Table of Contents
Preface i
1. Optimizing for performance, scalability and real-time insights 1
Hybrid transactional and analytical processing (HTAP) 2
Clustered Columnstore Indexes 3
Adding Clustered Columnstore Indexes to memory-optimized tables 5
Disk-based tables versus memory-optimized tables 5
In-memory OLTP 6
Planning data migration to memory-optimized tables 7
Natively compiled stored procedures 18
TempDB enhancements 21
Enabling memory-optimized TempDB metadata 21
Limitations of memory-optimized TempDB metadata 21
Intelligent Query Processing 22
Hybrid Buffer Pool 24
Query Store 25
Changes to default parameter values 27
QUERY_CAPTURE_MODE 27
QUERY_CAPTURE_MODE: CUSTOM 27
Support for FAST_FORWARD and STATIC Cursors 29
Automatic tuning 29
Automatic plan correction 29
Lightweight query profiling 32
New functionality in 2019 32
sys.database_scoped_configurations 33
Activity monitor 34
Columnstore stats in DBCC CLONEDATABASE 34
Columnstore statistics support 35
DBCC CLONEDATABASE validations 35
Understanding DBCC CLONEDATABASE syntax 35
Estimate compression for Columnstore Indexes 36
sp_estimate_data_compression_savings Syntax 37
Troubleshooting page resource waits 39
sys.dm_db_page_info 39
sys.fn_pagerescracker 41
2. Enterprise Security 45
SQL Data Discovery and Classification 46
SQL Vulnerability Assessment 51
Transparent Data Encryption 55
Setup 57
New features – suspend and resume 59
Extensible Key Management 60
Always Encrypted 60
Algorithm types 61
Setup 61
Confidential computing with secure enclaves 61
Dynamic Data Masking 64
Types 64
Implementing DDM 64
Row-Level Security 67
Auditing 71
Securing connections 73
Configuring the MMC snap-in 73
Enabling via SQL Server Configuration Manager 74
Azure SQL Database 74
SSL/TLS 75
Firewalls 75
Azure Active Directory (AD) authentication 75
Advanced data security 77
Advanced threat detection 78
3. High Availability and Disaster Recovery 81
SQL Server availability feature overview 82
Backup and restore 82
Always On features 83
Log shipping 91
What About Database Mirroring and Replication? 92
Availability improvements in SQL Server 2019 92
Accelerated database recovery 92
Configuration-only replica 92
Certificate management in SQL Server Configuration Manager 94
Clustered columnstore index online rebuild 95
Database scoped default setting for online and resumable DDL operations 95
Failover Cluster Instance Support for Machine Learning Services 96
Increased number of synchronous replicas in the Enterprise edition 96
Online builds or rebuilds for Clustered Columnstore Indexes 97
Read-only routing configuration in SQL Server Management Studio 98
Replication for Linux-based configurations 99
Secondary-to-primary read/write connection redirection 100
Windows Server 2019 availability enhancements 102
Changing domains for a Windows Server Failover Cluster 103
Cluster Shared Volumes support for Microsoft Distributed Transaction Coordinator 103
File share witness without a domain 103
Improved Windows Server Failover Cluster security 104
Storage Replica in the Standard edition 104
Storage Spaces Direct two-node configuration 106
Windows Server Failover Cluster improvements in Azure 107
4. Hybrid Features – SQL Server and Microsoft Azure 111
Backup to URL 112
Benefits 112
Requirements 112
The storage account 113
Setup 114
SQL Server data files in Azure 118
Setup and concepts 119
Considerations 121
File-snapshot backups 123
Setup 123
Extending on-premises Availability Groups to Azure 125
Replication to Azure SQL Database 126
Classic approach 127
Transactional replication 127
Prerequisites 129
Setup 129
5. SQL Server 2019 on Linux 143
2019 platform support 144
Why move databases to SQL Server on Linux? 145
Installation and configuration 146
Improvements in SQL Server 2019 150
Machine Learning Services on Linux 150
Kubernetes 152
Working with Docker and Linux 154
Change data capture 155
Hybrid Buffer Pool and PMEM 155
Distributed Transaction Coordinator on Linux 157
Replication 158
SQL Server tools 159
Azure Data Studio 159
Command-line query tools for SQL in Linux 163
SQLCMD 163
MSSQL-CLI 164
Enhanced focus on scripting 165
The SQL DBA in the Linux world 165
Users and groups 166
Azure Cloud Shell 166
Windows Subsystem for Linux 167
Root, the super-user 167
6. SQL Server 2019 in Containers and Kubernetes 171
Why containers matter 172
Container technical fundamentals 173
Deploying an SQL Server container using Docker 174
Using Docker and Bash 179
Using local SQL Server utilities 179
Customizing SQL Server containers 180
Availability for SQL Server containers 180
7. Data Virtualization 185
Data integration challenges 186
Introducing data virtualization 186
Data virtualization use cases 188
Data virtualization and hybrid transactional analytical processing 188
Data virtualization and caching 188
Data virtualization and federated systems 188
Data virtualization and data lakes 189
Contrasting data virtualization and data movement 189
Data virtualization in SQL Server 2019 190
Secure data access 190
The database master key 191
Database scoped credentials 191
External data sources 192
Supported data sources 193
Extending your environment using an ODBC external data source 194
Accessing external data sources in Azure 196
External file formats 197
PolyBase external tables 198
Creating external tables with Azure Data Studio 200
Contrasting linked servers and external tables 201
Installing PolyBase in SQL Server 2019 202
General pre-installation guidance 203
Installing PolyBase on Windows 204
Installing PolyBase on Linux 205
Installing PolyBase on SQL Server running in Docker 206
Post-installation steps 208
Installing PolyBase as a scale-out group 209
Tip #1: Use different resource groups for each part of the architecture 210
Tip #2: Create the virtual network and secure subnets before building virtual machines 210
Tip #3: Place your scale-out group SQL Server instances inside one subnet 210
Tip #4: Complete this pre-installation checklist! 211
Scale-out group installation 212
Bringing it all together: your first data virtualization query 215
8. Machine Learning Services Extensibility Framework 219
Machine learning overview 220
How machine learning works 220
Use cases for machine learning 221
Languages and tools for machine learning 222
SQL Server 2019 Machine Learning Services architecture and components 224
Components 226
Configuration 228
Machine learning using the Machine Learning Services extensibility framework 230
R for machine learning in SQL Server 2019 230
Python for machine learning in SQL Server 2019 232
Java and machine learning in SQL Server 233
Machine learning using the PREDICT T-SQL command 237
Machine learning using the sp_rxPredict stored procedure 239
Libraries and packages for machine learning 240
Management 241
Security 242
Monitoring and Performance 242
Using the team data science process with Machine Learning Services 244
Understanding the team data science process 244
Phase 1: Business understanding 245
Phase 2: Data acquisition and understanding 245
Phase 3: Modeling 245
Phase 4: Deployment 245
Phase 5: Customer acceptance 246
9. SQL Server 2019 Big Data Clusters 249
Big data overview 250
Applying scale-out architectures to SQL Server 250
Containers 251
Kubernetes 253
SQL Server on Linux 254
PolyBase 255
SQL Server 2019 big data cluster components 256
Installation and configuration 257
Platform options 258
Using a Kubernetes service 258
Using an on-premises Kubernetes installation 259
Working with a Dev/Test environment 259
Deploying the big data clusters on a Kubernetes cluster 260
Programming SQL Server 2019 big data clusters 262
Azure Data Studio 262
Relational operations 264
Creating scale-out tables 266
Creating a data lake 268
Working with Spark 269
Submitting a job from Azure Data Studio 270
Submitting a Spark job from IntelliJ 272
Spark job files and data locations 273
Management and monitoring 273
SQL Server components and operations 273
Kubernetes operations 273
SQL Server 2019 big data cluster operations 274
Monitoring performance and operations with Grafana 275
Monitoring logs with Kibana 276
Spark operations 277
Security 277
Access 278
Security setup and configuration 278
Authentication and authorization 280
10. Enhancing the Developer Experience 283
SQL Graph Database 285
Why use SQL Graph? 287
Edge constraints 287
SQL Graph data integrity enhancements 290
SQL Graph MATCH support in MERGE 290
Using a derived table or view in a graph MATCH query 294
Java language extensions 296
Why language extensions? 296
Installation 297
Sample program 300
JSON 307
Why use JSON? 307
JSON example 308
UTF-8 support 309
Why UTF-8? 309
Temporal tables 310
Why temporal tables? 311
Temporal table example 311
Spatial data types 314
Why spacial data types? 315
Dealer locator example 315
11. Data Warehousing 319
Extract-transform-load solutions with SQL Server Integration Services 320
Best practices for loading your data warehouse with SSIS 321
Clustered Columnstore Indexes 322
Partitioning 324
Online index management 325
Enabling online DML processing 326
Resuming online index create or rebuild 327
Build and rebuild online clustered columnstore indexes 329
Using ALTER DATABASE SCOPE CONFIGURATION 329
Creating and maintaining statistics 330
Automatically managing statistics 331
The AUTO_CREATE_STATISTICS option 331
The AUTO_UPDATE_STATISTICS option 331
The AUTO_UPDATE_STATISTICS_ASYNC option 331
Statistics for columnstore indexes 332
Modern data warehouse patterns in Azure 332
Introduction to Azure SQL Data Warehouse 333
Control node 333
Compute nodes 334
Storage 334
Data movement services (DMSes) 334
Best practices for working with Azure SQL Data Warehouse 334
Reduce costs by scaling up and down 335
Use PolyBase to load data quickly 335
Manage the distributions of data 336
Do not over-partition data 336
Using Azure Data Factory 337
New capabilities in ADF 337
Understanding ADF 338
Copying data to Azure SQL Data Warehouse 340
Hosting SSIS packages in ADF 343
Azure Data Lake Storage 344
Key features of Azure Data Lake Storage Gen2 345
Azure Databricks 346
Working with streaming data in Azure Stream Analytics 347
Analyzing data by using Power BI – and introduction to Power BI 349
Understanding the Power BI ecosystem 349
Connecting Power BI to Azure SQL Data Warehouse 352
12. Analysis Services 355
Introduction to tabular models 356
Introduction to multidimensional models 358
Enhancements in tabular mode 359
Query interleaving with short query bias 360
Memory settings for resource governance 360
Calculation groups 361
Dynamic format strings 362
DirectQuery 363
Bidirectional cross-filtering 366
Many-to-many relationships 367
Governance settings for Power BI cache refreshes 368
Online attach 368
Introducing DAX 369
Calculated columns 370
Calculated measures 370
Calculated tables 372
Row filters 373
DAX calculation best practices 375
Writing DAX queries 376
Using variables in DAX 379
Introduction to Azure Analysis Services 380
Selecting the right tier 381
Scale-up, down, pause, resume, and scale-out 382
Connecting to your data where it lives 382
Securing your data 383
Using familiar tools 383
Built-in monitoring and diagnostics 384
Provisioning an Azure Analysis Services server and deploying a tabular model 384
13. Power BI Report Server 389
SSRS versus Power BI Report Server 389
Report content types 391
Migrating existing paginated reports to Power BI Report Server 392
Exploring new capabilities 395
Performance Analyzer 396
The new Modeling view 398
Row-level security for Power BI data models 398
Report theming 400
Managing parameter layouts 401
Developing KPIs 402
Publishing reports 405
Managing report access and security 406
Publishing mobile reports 409
Viewing reports in modern browsers 409
Viewing reports on mobile devices 412
Exploring Power BI reports 415
Using the FILTERS panel 416
Crossing-highlighting and cross-filtering 416
Sorting a visualization 417
Displaying a visualization's underlying data 417
Drill-down in a visualization 418
Automating report delivery with subscriptions 418
Pinning report items to the Power BI service 420
14. Modernization to the Azure Cloud 423
The SQL data platform in Azure 424
Azure SQL Database managed instance 424
Deployment of a managed instance in Azure 425
Managed instance via the Azure portal 426
Managed instance via templates 427
Migrating SQL Server to Managed Instance 430
Azure Database Migration Service (DMS) 431
Application Connectivity 431
Requirements for the DMS 432
Data Migration Assistant 433
Managed Instance Sizing 433
Migration 433
Monitoring Managed Instance 434
SQL Server in Azure virtual machines 435
Creating an Azure VM from the Azure portal 436
Storage options for VMs 438
Diagnostics and advanced options 438
Creating a SQL Server 2019 VM from the command line in Azure 440
Security for SQL Server on an Azure VM 443
Backups of Azure VM SQL Server instances 444
Built-in security for Azure VMs 444
SQL Server IaaS agent extension 446
Disaster Recovery environment in the cloud 447
Azure Site Recovery 447
Extended support for SQL 2008 and 2008 R2 448
Preface
About
This section briefly introduces the coverage of this book, the technical skills you'll need to get started, and the hardware and software required to complete the book.
1. Optimizing for performance, scalability and real-time insights