SQL Query Design Patterns and Best Practices - Steve Hughes - E-Book

SQL Query Design Patterns and Best Practices E-Book

Steve Hughes

0,0
25,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

SQL has been the de facto standard when interacting with databases for decades and shows no signs of going away. Through the years, report developers or data wranglers have had to learn SQL on the fly to meet the business needs, so if you are someone who needs to write queries, SQL Query Design and Pattern Best Practices is for you.
This book will guide you through making efficient SQL queries by reducing set sizes for effective results. You’ll learn how to format your results to make them easier to consume at their destination. From there, the book will take you through solving complex business problems using more advanced techniques, such as common table expressions and window functions, and advance to uncovering issues resulting from security in the underlying dataset. Armed with this knowledge, you’ll have a foundation for building queries and be ready to shift focus to using tools, such as query plans and indexes, to optimize those queries. The book will go over the modern data estate, which includes data lakes and JSON data, and wrap up with a brief on how to use Jupyter notebooks in your SQL journey.
By the end of this SQL book, you’ll be able to make efficient SQL queries that will improve your report writing and the overall SQL experience.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 276

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



SQL Query Design Patterns and Best Practices

A practical guide to writing readable and maintainable SQL queries using its design patterns

Steve Hughes

Dennis Neer

Dr. Ram Babu Singh

Shabbir H. Mala

Leslie Andrews

Chi Zhang

BIRMINGHAM—MUMBAI

SQL Query Design Patterns and Best Practices

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Arindam Majumdar

Senior Editor: Tiksha Abhimanyu Lad

Technical Editor: Sweety Pagaria

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Arunkumar Govinda Bhat

Marketing Coordinator: Nivedita Singh

First published: February 2023

Production reference: 2190423

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-83763-328-9

www.packtpub.com

Contributors

About the authors

Steve Hughes is a senior director of data and analytics at 3Cloud. In this role, he focuses on growing the teams’ skills and capabilities to deliver data projects with Azure. He has worked with technology for over 20 years, with much of that time spent on creating business intelligence (BI) solutions and helping customers implement leading BI and cloud solutions. He is passionate about using data effectively and helping customers understand that data is valuable and profitable. Steve has recently been diagnosed with ALS but continues to work and share with others what he has learned. Steve is also the founder of Data on Wheels where he blogs with one of his daughters on topics such as data, analytics, and work enablement.

I want to thank my family, friends, and peers for supporting me through my health changes, and I am especially thankful to Sheila, my wife. I could not do this without her support and encouragement.

Dennis Neer is a senior architect of data and analytics at 3Cloud. In this role, he focuses on working with clients to design Azure solutions for their data and analytic needs so that they may use that data to make business decisions based on that data. This includes using tools such as SQL Server databases, Synapse, data lakes, and Power BI. He has worked with technology for over 30 years, with 25 years in designing and building database and visualization solutions. He is passionate about data and helping businesses to understand the information contained in their data and how it can be used to make important decisions regarding their business to improve the strength of their business.

I want to thank my wife, Jeanie, for all of the support and encouragement she has given me throughout my career, and while working on this book; I couldn’t do this without you.

Dr. Ram Babu Singh is an accomplished data analytics and data science professional with a Ph.D. in computer science and a Microsoft Certified Professional designation. With over 2 decades of experience in data platforms, he is recognized as an expert in his field. As the lead data architect at 3Cloud, he has developed complex data analytics and data science solutions for high-profile clients such as Microsoft, Accenture, and HP. An accomplished academic with published papers in international journals, he has a patent in artificial intelligence, highlighting his innovative problem-solving approach. With over 10 years of leadership experience in data and analytics, he has earned a reputation as a thought leader and mentor in the industry.

I want to thank my wife, Dr. Nirmal Lodhi, for supporting me throughout my journey, and my son Rudra and daughter Riddhima for letting me steal their family time. I also want to thank my colleagues for their support.

Shabbir H. Mala is a director of data and analytics at 3Cloud. He has over 23 years of experience in thought leadership and consulting, developing complex data solutions, business systems, and processes using Azure Data Services and Power BI. He currently manages a data architecture team of over 40 principals and senior architects focusing on business growth, successful project delivery, and client experience. He loves and has done talks at Power BI conferences and local Power BI user groups. He has been awarded Microsoft FastTrack Solution Architect for the last 2 consecutive years, in 2021 and 2022. He was born and raised in Mumbai (India), and came to the United States in 2006 and is currently living in Chicago. He is married with three kids.

Leslie Andrews is a lead data architect of data and analytics at 3Cloud. Working in the IT sector for over 20 years in local government, electrical supply, law enforcement, and healthcare, she has broad experience to draw from to assist clients in making data more accessible for end users. Leslie’s expertise includes SQL development, ETL, data warehousing, data modeling, and analytics. Leslie currently holds several Microsoft certifications related to Azure technologies for data engineering. She approaches each day as an opportunity to use data and technology to make it easier for others to do their daily work. Leslie has been doing public speaking since 2015, was an IDERA ACE in 2019, and is a supporter of and contributor to the SQL community.

Chi Zhang is a lead data architect at 3Cloud. After completing her master’s degree at Carnegie Mellon University, she worked in data consulting for the past 5 years. She has helped clients from various industries to deliver customized data solutions within Azure. Focusing on ETL, data warehousing, data modeling, and BI reporting, Chi solves data problems for clients and builds data solutions that provide business users with better operational insights and a smoother experience. Recently, she has continued to grow her footprint in the Azure community: earning certifications in Azure data engineering and data science, giving her first public tech talk, co-authoring a technical book, and becoming an active contributor to the SQL community.

I want to thank my husband, Joey, for all the help and company he has been giving me throughout my life so far. You push me to be a better person, and I'd do this all over again with you.

About the reviewers

Paul Turley is a competency lead director for the 3Cloud solutions BI&A practice, and Microsoft Data Platform MVP. He consults, writes, speaks, teaches, and blogs about BI and reporting solutions. He works with companies to model data and visualize and deliver critical information to make informed business decisions, using the Microsoft Data Platform and business analytics tools. He is the director of the Oregon Data Community PASS chapter and user group, the author and lead author of Professional SQL Server 2016 Reporting Services, and 14 other titles from Wrox & Microsoft Press. Paul is a FastTrack Recognized Solution Architect and holds several certifications related to the Microsoft Data Platform and BI.

Christopher J Knapp is a cloud solutions director at 3Cloud. For the last 17 years, he has worked primarily in the healthcare industry in all aspects of data, including transformation, storage, data modeling, BI, advanced analytics, and machine learning. He has spoken in multiple settings across the nation on various data topics. More recently, CJ has focused on Spark-based ETL and analytical methods, as well as cloud-based data platform systems at scale for healthcare. He is a father, husband, polyglot, and constant consumer of culture outside of work. He spends the majority of his free time having adventures with his family, traveling, and mixing craft cocktails at home. Sometimes, he even manages to mix them well.

Andie Letourneau is a cloud solution consultant with over 25 years of experience specializing in MS SQL Server databases, data warehousing, and BI. She has Azure Data Fundamentals and Azure Administrator Associate Microsoft certifications. She has presented at numerous SQL user group meetings and SQL Saturday events. She lives with her husband in beautiful Cave Creek, AZ.

Table of Contents

Preface

Part 1: Refining Your Queries to Get the Results You Need

1

Reducing Rows and Columns in Your Result Sets

Technical requirements

Identifying data to remove from the dataset

Reducing the amount of data

Understanding the value of creating views versus removing data

Exploring the impact of row and column reductions on aggregations

Summary

2

Efficiently Aggregating Data

Technical requirements

Identifying data to be aggregated

Determining when data should be aggregated

The AVG() function

The SUM() function

The COUNT() function

The MAX() function

The MIN() Function

Improving performance when aggregating data

Summary

3

Formatting Your Results for Easier Consumption

Technical requirements

Using the FORMAT() function

Format() with culture

Format() with custom formatting strings

Formatting dates and numbers with functions

Formatting dates and numbers with CONVERT() and CAST()

Formatting numbers with ROUND() and CEILING()

Comparing FORMAT(), CONVERT(), and CAST()

Alias columns with meaningful names

Summary

4

Manipulating Data Results Using Conditional SQL

Technical requirements

Using the CASE statement

Using a simple CASE expression in a SELECT statement

Using a searched CASE expression in a SELECT statement

Using CASE in an ORDER BY statement

Using CASE in an UPDATE statement

Using CASE in a HAVING statement

Using the COALESCE() expression

How to use COALESCE()

Comparing COALESCE() and CASE()

Using ISNULL() function

How to use ISNULL()

Comparing ISNULL() and COALESCE()

Summary

Part 2: Solving Complex Business and Data Problems in Your Queries

5

Using Common Table Expressions

Technical requirements

Creating CTEs

Set theory for queries

Creating a more complex CTE

Creating a recursive CTE

Creating the hierarchical data

Creating the recursive CTE

Recursive alternative

Summary

6

Analyze Your Data Using Window Functions

Technical requirements

Understanding window functions

Using a window function in a query

Adding a partition to the results

Window functions with frames

Scenarios and business problems

Days between orders

Finding a pattern

Finding first N records of every group

Running totals

First and last record in the partition

Year-over-year growth

7

Reshaping Data with Advanced Techniques

Technical requirements

Working with the PIVOT operator

Using PIVOT dynamically

Working with the UNPIVOT operator

Understanding hierarchical data

Summary

8

Impact of SQL Server Security on Query Results

Technical requirements

Why is data missing from my result set?

Understanding SQL Server security

Validating security settings

Summary

Part 3: Optimizing Your Queries to Improve Performance

9

Understanding Query Plans

Technical requirements

Understanding query optimization techniques

Understanding the Query Optimizer

Understanding and exploring the Query Execution Plan

Comparing execution plans

Analyzing the query plan

Summary

10

Understanding the Impact of Indexes on Query Design

Technical requirements

Understanding index and index types

Using CREATE to create an index

The DROP index

Using RENAME to rename an index

Modifying indexes

The ALTER index

Exploring the types of indexes

Clustered indexes

The NONCLUSTERED index

The UNIQUE index

The filtered index

The covering index

The impact of indexes on query design

Understanding index maintenance

The REBUILD index

The REORGANIZE index

Caution

Summary

Further reading

Part 4: Working with Your Data on the Modern Data Platform

11

Handling JSON Data in SQL Server

Technical requirements

Introducing JSON functionality built into SQL Server

The basics of JSON

JSON functions in SQL Server

JSON as data in SQL Server

JSON path in SQL Server

Formatting results as JSON in SQL Server

Using FOR JSON

Filling the table

Working with JSON data in SQL queries

Discovering JSON data in SQL Server

Extracting JSON data in SQL Server

Modifying JSON data in SQL Server

Building tabular results from JSON data in SQL Server

Summary

12

Integrating File Data and Data Lake Content with SQL

Technical requirements

Understanding the OPENROWSET (BULK..) function

Required security and storage permissions

Understanding external tables

Summary

13

Organizing and Sharing Your Queries with Jupyter Notebooks

Technical requirements

Creating Jupyter notebooks in Azure Data Studio

Understanding notebooks in Azure Data Studio

Creating a notebook in Azure Data Studio

Creating the Jupyter book

Adding a notebook to your Jupyter book

Adding queries in your Jupyter notebooks

Creating connections in your notebook

Creating and executing SQL code in your notebook

Formatting results

Exporting results

Resetting results

Documenting your code with markdown in your notebooks

Adding a text block to your notebook

Creating a header for your notebook

Using basic markdown for documentation

Adding images and links to your documentation

Managing and sharing your queries with Jupyter notebooks

Simple sharing of Jupyter books and notebooks

Using GitHub to collaborate on your notebook

Integrating third-party Jupyter books

Summary

Appendix

Preparing Your Environment

Prerequisites for running the exercises

Choosing a platform

Setting up the Azure environment

Setting up the SQL Server developer environment

Index

Other Books You May Enjoy

Preface

SQL was created to support relational database management systems (RDBMSs). It was not created just for SQL Server. SQL, or as it is typically pronounced sequel, has been the de facto standard for working with relational databases for nearly 50 years. The structure and understanding of this language have been established as a standard in both ANSI and ISO.

While the language has a standard and well-established set of syntax rules and capabilities, it has been implemented in many ways throughout the years by various RDBMS vendors. Microsoft implemented Transact-SQL (T-SQL) in SQL Server and has continued to use it as the primary SQL version, used in the various Azure SQL Database implementations.

While the focus of our book is primarily around retrieving data from databases efficiently, SQL is not limited to just data retrieval. SQL can be used to manipulate the database structure, manipulate data, and retrieve data. SQL can also be used to issue commands to the underlying database system depending on what the language supports.

As we move into the modern data estates, relational data is not the only data within the environment. We are seeing more document-style databases and other non-relational datasets used in common practice. What is interesting about this is that there is always a push to get back to SQL-supported datasets. The tabular nature of the data returned by SQL is the easiest data to consume in numerous tools available in the marketplace today and is easy for users to understand. Languages and document sets such as JSON are highly flexible and support a less structured version of data. However, those sets often must be converted to a tabular format to be easily consumed by various tools and understood by the users consuming that data. Think of it like JSON a machine and developer-friendly data storage format, but tabular formats used by SQL make it easy for you to understand what is in it.

As we move into some of these modern scenarios and even in some older scenarios such as MDX, we find the SELECT... FROM... WHERE format of the SQL language has been implemented to make it easier to work with data. As a developer, it is important for you to understand the best formats and most efficient methods of writing these queries to get the results you need. A lot of these efficiencies are true for whatever database system you work in. This book will focus on working with SQL Server and T-SQL in particular; however, many of the principles will apply across all relational systems.

Who this book is for

This book is for the SQL developer who is ready to take their query development skills to the next level. This includes report writers, data scientists, or similar data gatherers and allows users to expand their skills for complex querying and build more efficient and performant queries. This book also seeks to advance their knowledge of the modern data estate and introduce query techniques for pulling data from data lakes. For those just learning SQL, this book can help you accelerate your learning and prevent you from making common mistakes.

What this book covers

The book is organized into four parts, with multiple chapters in each part. The book is designed to grow your experience from beginning to end if you choose to read it in that fashion. One of the greatest values is that each chapter is self-standing and can be used as a reference if you come across a situation where you need to improve your query writing specifically with that scenario. This also allows you to engage this book at whichever point makes the most sense for your query writing capabilities and skills.

In Chapter 1, Reducing Rows and Columns in Your Result Sets, the focus is on reducing your rows and columns to build an efficient dataset that can be used in reporting and other use cases.

The focus shifts to aggregating your results in Chapter 2, Efficiently Aggregating Data in Your Results. You can return a much more refined dataset that is easier to consume by effectively aggregating results to the right granularity in the content you need to pass along.

In Chapter 3, Formatting Your Results for Easier Consumption, the focus shifts to formatting your results. Often when working with data coming from a database, it must support levels of granularity or specific types of data to be efficient or inclusive. However, this comes at the price of storing data in a format that is not necessarily conducive to end users. Common formatting difficulties include handling currencies and dates. In this chapter, you will learn how to effectively format your data for more efficient use outside the database.

We expand on query efficiency by using conditional SQL in Chapter 4, Manipulating Your Data Results Using Conditional SQL. This will allow you to refine results even further for your end user. SQL Server supports several conditional logic statements that will be explored in this chapter.

Chapter 5, Using Common Table Expressions, delves into the world of common table expressions (CTEs) as they are commonly referred to. This technique is used to reduce subqueries and support other complex scenarios. In this chapter, you will learn how to build efficient CTEs to solve complex business problems.

Chapter 6, Analyze Your Data Using Window Functions, introduces you to window functions inside SQL. These functions allow you to do inline query aggregations or other techniques, such as row numbers. Some of the problems that are solved using window functions include building and running totals in your results.

Chapter 7, Reshaping Your Data with Advanced Techniques, dives into advanced techniques for reshaping your data. This includes handling hierarchical data and working with the PIVOT and UNPIVOT commands.

Chapter 8, Impact of SQL Security on Query Results, helps you deal with the impact of security on your query results. Citizen developers are often unaware of the impact security might have on the data they are pulling into their results. This chapter looks at the various implications of security and how to understand that impact on the results that you get and deliver to your end user.

Chapter 9, Understanding Query Plans, describes how to understand query plans. In this chapter, you will be introduced to the query plan itself and how to read it to determine what you can do to improve the performance and the query that you’re writing.

We then move on to understanding the impact of indexes on your query design in Chapter 10, Understanding the Impact of Indexes on Your Query Design. While we will not dive into authoring indexes, understanding the impact of indexes, including which indexes will improve your query performance, is the focus here. This will help you communicate your index needs to those that own the database design.

JSON data or NoSQL data has been disruptive to SQL writers around the world. In Chapter 11, Handling JSON Data in SQL Server, we will walk through the various functions and capabilities of SQL in SQL Server that supports JSON.

In Chapter 12, Integrating File Data and Data Lake Content with SQL, we will walk through techniques to integrate file and data lake content in your queries. This more complex technique is becoming very necessary in the new modern data platform.

We then have Chapter 13, Organizing and Sharing Your Queries with Jupyter Notebooks, covering Jupyter notebooks. Jupyter notebooks are available in Azure Data Studio and Synapse Workspace. These notebooks can be used to organize and share queries and their results more efficiently than SQL Server Management Studio (SSMS). In this chapter, we will walk you through notebook creation and sharing.

In the Appendix, Preparing Your Environment, we will walk you through setting up your environment to support the exercises in this book.

To get the most out of this book

Software/hardware covered in the book

Operating system requirements

Microsoft Azure Storage Explorer

Windows, macOS, or Linux

SQL Server 2022 or Azure SQL Database

Windows, macOS, or Linux

Azure Data Studio

Windows, macOS, or Linux

This book covers many illustrations and examples of working with SQL to improve your overall writing and performance. To make this simple for all our users, we have chosen to use the Wide World Importers sample databases available from Microsoft. Therefore, depending on the nature of the query that we are constructing and providing examples for, one of the two databases (operational: WideWorldImporters; data warehouse: WideWorldImportersDW) provided by Microsoft will be used. Bear in that means you will have two databases to run the exercises created within this book.

It is recommended that you install the following tools on your device, as these will be used with the exercises in this book:

SSMS and Azure Data Studio installation. You’ll find the installation instructions at the following location: https://learn.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms. Be sure to install both SQL Server Management Studio and Azure Data Studio in the installation process.Microsoft Azure Storage Explorer is used to move files in and out of Azure. You’ll find the installation instructions at this location: https://azure.microsoft.com/en-us/products/storage/storage-explorer/.You can use either SQL Server 2022 or Azure SQL Database for your database platform. If you need to install SQL Server 2022 or set up Azure SQL Database, refer to the Appendix for detailed instructions.All examples in this book will be using the Wide World Importers sample databases provided by Microsoft and found here: https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0. We will be using the full backups for SQL Server and the standard backups for Azure SQL DB.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Full instructions for installation and database configuration can be found in the Appendix.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/SQL-Query-Design-Best-Practices. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/Xxotr.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “All window functions can utilize the PARTITION BY expression of the OVER clause, which is always optional.”

A block of code is set as follows:

SELECT TOP (3) [Order Key]       ,[Description]   FROM [Fact].[Order]

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

SELECT TOP(3) FORMAT( [Date], 'D', 'en-US' ) 'US English'       ,FORMAT( [Date], 'D', 'en-gb' ) 'British English'       ,FORMAT( [Date], 'D', 'de-de' ) 'German'       ,FORMAT( [Date], 'D', 'zh-cn' ) 'Chinese Simplified (PRC)' FROM [Dimension].[Date]

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Next you will expand the Security and Users folders and find your login ID.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read SQL Query Design Pattern Best Practices, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781837633289

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Refining Your Queries to Get the Results You Need

The goal of the four chapters in this part is to help you become a more efficient query writer. One of the areas we commonly see where people struggle is building queries to get the results that they need without including additional content in their query results just because they might need it later:

Chapter 1, Reducing Rows and Columns in Your Result SetsChapter 2, Efficiently Aggregating Data in Your ResultsChapter 3, Formatting Your Results for Easier ConsumptionChapter 4, Manipulating Your Data Results Using Conditional SQL