39,59 €
Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation.
Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory.
By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 619
Veröffentlichungsjahr: 2021
Implement professional data design and structures in Azure
Peter ter Braake
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Aditi Gour
Senior Editor: Roshan Kumar
Content Development Editors: Sean Lobo, Priyanka Soam
Technical Editor: Arjun Varma
Copy Editor: Safis Editing
Language Support Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Alishon Mendonca
First published: July 2021
Production reference: 1240621
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN: 978-1-80107-734-7
www.packt.com
Peter ter Braake started working as a developer in 1996 after studying physics in Utrecht, the Netherlands. Databases and business intelligence piqued his interest the most, leading to him specializing in SQL Server and its business intelligence components. He has worked with Power BI from the tool's very beginnings. Peter started working as an independent contractor in 2008. This has enabled him to divide his time between teaching data-related classes, consulting with customers, and writing articles and books.
Marcel Alsdorf has been working for 4 years as a cloud solution architect at Microsoft. In his role, he consults companies on cloud computing fundamentals, the construction of modern data warehouse pipelines, event systems, and machine learning applications and tooling. In addition, he lectures on big data fundamentals at a German university. Previously, he worked as an FPGA engineer for the LHC project at CERN and as a software engineer in the banking industry.
I would like to thank Christoph Körner for suggesting me as a reviewer and the team at Packt for giving me the opportunity to help Peter ter Braake create a good starting point for people in terms of understanding the fundamentals of data modeling in Azure.
Alexey Bokov is an experienced Azure architect and has been a Microsoft technical evangelist since 2011 – he works closely with Microsoft top-tier customers all around the world to develop applications based on the Azure cloud platform. Building cloud-based applications in the most challenging scenarios is his passion, as well as helping the development community to upskill and learn new things by working hands-on and hacking. He has been a long-time contributor, as a co-author and reviewer, to many Azure books and is an occasional speaker at Kubernetes events.
I'd like to thank my family – my beautiful wife, Yana, and my amazing son, Kostya, who have supported my efforts to help the author and publisher of this book.
Databases play an important role in almost all the applications that we use. The database has a direct impact on the performance and scalability of the application it supports. That makes choosing the right type of database to use and designing that database correctly a vital part of all development since scalability and performance depend on a well-chosen design. With databases hosted in Azure, the design may also have a direct impact on the costs of the database. The first part of this book teaches you when to use a relational database (Azure SQL Database) and when a NoSQL database (Cosmos DB) is the better option. You will also learn how to design that database and, finally, how to implement the chosen design.
All the data gathered by applications can be used for Business Intelligence (BI). A crucial part of BI is creating a central repository for your data to build your BI solution on. This can be a data lake with data marts, or it may be a data warehouse. In the second part of the book, you will learn to design data warehouses according to the theory of dimensional modeling or by designing a data vault. You will also learn how to design a data lake. You will then learn how to apply what you have learned by creating a data lake in Azure Storage and creating data marts using Azure Synapse Analytics.
The book ends with a chapter on Azure Data Factory. Data Factory is used to get data from the source databases you created in part one and store that data in the data platforms you implemented in part two.
After reading this book, you have a solid understanding of data modeling and of how to implement a database schema in Azure. This will help you to build scalable and cost-effective data solutions in Azure.
This book is for application developers, BI developers, and data engineers who build databases and data solutions in the Microsoft Azure cloud. This book assumes that you have basic knowledge of Azure and a basic understanding of what a database is and what it is for. The book does not assume any specific development or database knowledge or experience. However, you should have an affinity for working with data. Some T-SQL experience will also be advantageous.
This book will also help data analysts gain an understanding of how databases are set up and why they are created in the way they are. This understanding will help data analysts to get the most out of the different data stores that they may need to use.
Chapter 1, Introduction to Databases, explains what a database is and what it is used for. It also explains the common concepts of databases.
Chapter 2, Entity Analysis, explains how to create and read Entity Relationship Diagrams (ERD) from a top-down perspective.
Chapter 3, Normalizing Data, teaches you how to create a normalized database design and when to use this design technique.
Chapter 4, Provisioning and Implementing an Azure SQL Database, shows hands-on how to create and implement a normalized design in Azure using Microsoft's PaaS offering – Azure SQL Database.
Chapter 5, Designing a NoSQL Database, explains when to choose Cosmos DB over a relational database. It also teaches you how to design different types of NoSQL databases.
Chapter 6, Provisioning and Implementing an Azure Cosmos DB, shows hands-on how to create and implement a Document database in Azure using Microsoft's NoSQL database, Cosmos DB.
Chapter 7, Dimensional Modeling, teaches you how to create a star schema database design according to the rules of dimensional modeling.
Chapter 8, Provisioning and Implementing an Azure Synapse SQL Pool, shows hands-on how to create and implement a star schema database using an Azure Synapse dedicated SQL pool.
Chapter 9, Data Vault Modeling, explains when to use a data vault and how to design a data warehouse using the data vault modeling technique.
Chapter 10, Designing and Implementing a Data Lake Using Azure Storage, discusses when implementing a data lake is a better option than creating a data warehouse. It also shows how to implement a data lake using Azure Storage.
Chapter 11, Implementing ETL Using Azure Data Factory, shows how to create pipelines to automate the process of getting data out of production databases and into data lakes and data marts.
Since this book implements all database services in Azure, a computer with a modern browser and an Azure subscription with permission to create resources in Azure are required:
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Data-modelling-for-Azure-Data-Services. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801077347_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Column names such as PatientID, Patient_FirstName, and PostalCode would already render it more readable."
A block of code is set as follows:
CREATE TABLE [dbo].[OrderDetail]
(
OrderID INT NOT NULL,
ProductID INT NOT NULL,
UnitPrice MONEY NOT NULL,
Quantity SMALLINT NOT NULL,
Discount NUMERIC(5, 4) NOT NULL
);
GOWhen we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Click, in Azure Data Studio, on the New Query button to open a new query file."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read Data Modeling for Azure Data Services, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
Databases are used in everyday applications such as webshops, CRM systems, and financial systems. These are our Line Of Business (LOB) applications. In this section, we will learn how to choose between a SQL database and a Cosmos DB database. We will also learn how to design a database for optimal performance and scalability.
This section comprises the following chapters:
Chapter 1, Introduction to DatabasesChapter 2, Entity AnalysisChapter 3, Normalizing DataChapter 4, Provisioning and Implementing an Azure SQL DBChapter 5, Designing a NoSQL DatabaseChapter 6, Provisioning and Implementing an Azure Cosmos DB Database