E-Book
33,59 €

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide E-Book

Marcelo Leite

0,0

33,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Passing the DP-900 Microsoft Azure Data Fundamentals exam opens the door to a myriad of opportunities for working with data services in the cloud. But it is not an easy exam and you'll need a guide to set you up for success and prepare you for a career in Microsoft Azure.
Absolutely everything you need to pass the DP-900 exam is covered in this concise handbook. After an introductory chapter covering the core terms and concepts, you'll go through the various roles related to working with data in the cloud and learn the similarities and differences between relational and non-relational databases. This foundational knowledge is crucial, as you'll learn how to provision and deploy Azure's relational and non-relational services in detail later in the book. You'll also gain an understanding of how to glean insights with data analytics at both small and large scales, and how to visualize your insights with Power BI. Once you reach the end of the book, you'll be able to test your knowledge with practice tests with detailed explanations of the correct answers.
By the end of this book, you will be armed with the knowledge and confidence to not only pass the DP-900 exam but also have a solid foundation from which to embark on a career in Azure data services.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 296

Veröffentlichungsjahr: 2022

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Mensch und Wald

Carsten Wippermann

The Food Truck Handbook

David Weber

Die selbstbestimmte Geburt

Ina May Gaskin

Leseprobe

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

The comprehensive guide to passing the DP-900 exam on your first attempt

Marcelo Leite

BIRMINGHAM—MUMBAI

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Reshma Raman

Publishing Product Manager: Birjees Patel

Content Development Editor: Shreya Moharir

Technical Editor: Sweety Pagaria

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Hemangini Bari

Production Designer: Joshua Misquitta

Marketing Coordinators: Shifa Ansari

First published: November 2022

Production reference: 1281022

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80324-063-3

www.packt.com

For my wife, Rayane, and my parents, Diva and Marco, who have always supported me in every decision in my career.

Contributors

About the author

Marcelo Leite is a data and artificial intelligence specialist at Microsoft. He got the DP-900 in October 2020 and holds the advanced titles of Azure Data Engineer and Azure AI Engineer. He graduated in technology with a database specialization and has MBA in project and IT management. Marcelo has been working at Microsoft for over 5 years. He's also a professor of MBA courses for databases, data architecture, and cloud computing. With 17+ years of experience in roles such as software engineer, consulting manager, and solutions sales specialist, he shares his knowledge of data technologies on his YouTube channel "Dicas de Dados" and is very active on LinkedIn.

I dedicate this book to everyone who is in search of a successful career in data but is still walking this journey. It’s possible, and this book was written to support you on it.

About the reviewers

James Reeves is a self-described technology nerd who has spent far too much time around data and yet keeps going back for more. As an experienced data engineer and architect, James loves to “think outside of the box” to create innovative solutions and believes that doing data discovery is like being a detective!

Abhishek Mittal is a cloud solution architect who has more than 9 years of experience in business intelligence and data warehousing space. He delivers exceptional value to customers by designing high-quality solutions and leading their successful implementations. His work entails architecting solutions for complex data problems for various clients across various business domains, managing technical scope and client expectations, and managing the implementation of a solution. He is a Microsoft Azure-certified professional and works as a senior architect with Nagarro. He is very passionate about learning and exploring new skills. He is gregarious in nature and always believes in sharing knowledge and helping others. You can reach out to him on LinkedIn.

Kasam Shaikh, a hybrid and cross-cloud practitioner, is a seasoned professional with 14 years of demonstrated industry experience, working as a Microsoft Azure cloud specialist with one of the leading IT companies in Mumbai, India. He is a Microsoft MVP in AI and among the only three AI MVPs from India. He is a global Azure AI speaker and author of four best-selling books on Microsoft Azure and AI. He is also the founder of Dear Azure – Azure INDIA (az-india), an online community for learning Azure AI. He owns a YouTube channel, where he shares his expertise on Microsoft Azure AI.

First, I would like to thank the Almighty,ALLAH, my mother, wife, and especially my little daughter, Maryam, for motivating me throughout the process, and Packt for believing in and considering me for this awesome contribution.

Anindita Basak is a cloud architect with almost 15+ years of experience, the last 12 years of which she has been extensively working on Azure. She has delivered various real-time implementations on Azure data analytics, and cloud-native and real-time event-driven architecture for Fortune 500 enterprises, ranging from banking, financial services, and insurance (BFSI)to retail sectors. She is also a cloud and DataOps trainer and consultant, and author of cloud AI and DevOps books.

Preface

Part 1: Core Data Concepts

1 Understanding the Core Data Terminologies

Understanding the core data concepts

What is data?

How is data stored in a modern cloud environment?

Describing a data solution

Transactional databases

Analytical databases

Defining the data type and proper storage

Characteristics of relational and non-relational databases

A transactional workload

An analytical workload

Understanding data ingestion

Understanding batch load

Understanding data streaming

Case study

Summary

Sample questions and answers

Answer key

2 Exploring the Roles and Responsibilities in Data Domain

Different workforces in a data domain

Most common roles in a data domain

Database Administrator

Data engineer

Data analyst

Tasks and tools for database administration profiles

Tasks of the DBA

Tools for the DBA

Tasks and tools for data engineer profiles

Tasks of the data engineer

Tools for the data engineer

Tasks and tools for the data analyst

Tasks of the data analyst

Tools for the data analyst

Case study

Summary

Sample questions and answers

Answer key

3 Working with Relational Data

Exploring the characteristics of relational data

Tables and entities

Relationship between entities

Exploring relational data structures

Data normalization

Introducing SQL

Key advantages of SQL

Key disadvantages of SQL

Understanding the categories of SQL commands

DDL

DML and DQL

Describing the database components

Views

Stored procedures

Triggers

Indexes

Case study

Summary

Sample questions and answers

Answer key

4 Working with Non-Relational Data

Exploring the characteristics of non-relational data

Understanding the types of non-relational data

Non-structured data

Semi-structured data

Non-relational data basic storage

Exploring NoSQL databases

What is a NoSQL database?

Key-value store

Document database

Column family database

Graph database

Identifying non-relational database use cases

Case study

A 360-degree customer view

Fraud detection – financial institutions

Summary

Sample questions and answers

Answer key

5 Exploring Data Analytics Concepts

Exploring data ingestion and processing

Data pipelines

Data ingestion types

Data source connectors

Exploring the analytical data store

Data warehouse

Data lake

Hybrid approaches

Exploring an analytical data model

Facts and dimensions

Exploring data visualization

Case study

Data-driven culture

Summary

Sample questions and answers

Answer key

Part 2: Relational Data in Azure

6 Integrating Relational Data on Azure

Exploring relational Azure data services

Elastic pool

Use cases

Summary

Sample questions and answers

Answer key

7 Provisioning and Configuring Relational Database Services in Azure

Technical requirements

Provisioning relational Azure data services

Provisioning Azure SQL Database

Provisioning Azure Database for PostgreSQL and MySQL

Configuring relational databases on Azure

Configuring Azure SQL Database

Configuring and managing Azure Database for PostgreSQL and MySQL

Summary

Sample questions and answers

Answer key

8 Querying Relational Data in Azure

Technical requirements

Introducing SQL on Azure

Querying relational data in Azure SQL Database

Common connection issues

Querying relational data in Azure Database for PostgreSQL

Connecting to Azure Database for PostgreSQL

Querying Azure Database for PostgreSQL

Summary

Sample questions and answers

Answer key

Part 3: Non-Relational Data in Azure

9 Exploring Non-Relational Data Offerings in Azure

Exploring Azure non-relational data stores

Exploring Azure Blob storage

Azure Data Lake Storage Gen2

Exploring Azure Files

Exploring Azure Table storage

Exploring Azure NoSQL databases

Exploring Azure Cosmos DB

Azure Cosmos DB APIs

Core (SQL) API

MongoDB API

Table API

Cassandra API

Gremlin API

Summary

Sample questions and answers

Answer key

10 Provisioning and Configuring Non-Relational Data Services in Azure

Technical requirements

Provisioning non-relational data services

Provisioning Azure Cosmos DB

Configuring Azure Cosmos DB

Creating a sample Azure Cosmos DB database

Provisioning an Azure storage account and Data Lake Storage

Summary

Sample questions and answers

Answer key

Part 4: Analytics Workload on Azure

11 Components of a Modern Data Warehouse

Describing modern data warehousing

Challenges of traditional data warehouses

The birth of big data

Azure HDInsight

Modern data warehouse

Azure for the modern data warehouse

Exploring Azure data services for modern data warehouses

Data ingestion and preparation (ELT/ETL)

Data storage – Azure Data Lake Storage Gen2

Data ingestion – Azure Data Factory and Azure Synapse Analytics

Data preparation – Azure Databricks

Modern data warehouse – Azure Synapse Analytics

Real-time data analytics – Azure Stream Analytics, Azure Synapse Data Explorer, and Spark streaming

Azure Stream Analytics

Azure Data Explorer and Azure Synapse Data Explorer pools

Apache Spark Streaming

Delta Lake

Summary

Sample questions and answers

Answer key

12 Provisioning and Configuring Large-Scale Data Analytics in Azure

Technical requirements

Understanding common practices for data loading

Provisioning an Azure Synapse workspace

Practicing data load

Data storage and processing

Azure serverless SQL pool

Azure dedicated SQL pool

Azure Spark pools

Azure Synapse Link

Azure Synapse Data Explorer

Azure Machine Learning

Summary

Sample questions and answers

Answer key

13 Working with Power BI

Technical requirements

Introducing Power BI

The building blocks of Power BI

Exploring Power BI Desktop

Creating a Power BI file

Creating a connection

Publishing a report

Exploring Power BI Service

Creating a dashboard

Power BI mobile app

Summary

Sample questions and answers

Answer key

14 DP-900 Mock Exam

Practice test – questions

Core data concepts

Relational data on Azure

Non-relational data on Azure

Modern data warehouse analytics on Azure

Practice test – answers and explanations

Core data concepts

Relational data on Azure

Non-relational data on Azure

Modern data warehouse analytics on Azure

Summary

Index

Other Books You May Enjoy

Preface

Today, the world’s leading companies are data-driven, and a good strategy for using data is one of the key success factors for organizations worldwide. Following this trend, there is a growing demand for professionals trained to work with this data, orchestrating, processing, and generating intelligence from it.

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide will introduce you to the fundamental knowledge required to ensure successful data projects in Azure, preparing you for the DP-900 certification test.

Going through basic concepts of data as well as hands-on exercises with Azure data services, this book will teach you about the different technologies offered in Azure and when to use each one.

The book is structured in four parts. The first covers core data concepts, the second relational data in Azure, the third covers non-relational data in Azure and the fourth part covers analytics workloads on Azure, ending with a mockup of the DP-900 test evaluating the knowledge acquired.

Who this book is for

This book is for data engineers, database administrators, or aspiring data professionals getting ready to take the DP-900 exam. It will also be helpful for those looking for a bit of guidance on how to be better equipped for Azure-related job roles such as Azure database administrator or Azure data engineer. A basic understanding of core data concepts and relational and non-relational data will help you make the most out of this book, but they're not a pre-requisite.

What this book covers

Chapter 1, Understanding the Core Data Terminologies, is all about creating a knowledge foundation around data types, transactional databases, analytical databases, data ingestion, and data stores.

Chapter 2, Exploring the Roles and Responsibilities in Data Domain, continues your introduction to the different job roles associated with creating, managing, and using databases. You will learn about the key responsibilities of these roles and the tools that these roles use on the Azure and Microsoft cloud portfolios.

Chapter 3, Working with Relational Data, explores the relational model for databases, how tables are structured, how you can use indexes to improve query performance, and how you can use views to simplify complex queries.

Chapter 4, Working with Non-Relational Data, explores non-relational databases and how they compare to relational databases. You will learn about the different types of non-relational databases commonly used by applications.

Chapter 5, Exploring Data Analytics Concepts, covers how to generate insights by processing data into a data analytics system, enabling the business to carry out data-driven operations.

Chapter 6, Integrating Relational Data on Azure, covers the Azure data services for relational databases, including Azure SQL Database, Azure Database for PostgreSQL, Azure Database for MySQL, and Azure Database for MariaDB. You will explore scenarios for using these database management systems.

Chapter 7, Provisioning and Configuring Relational Database Services in Azure, teaches you how to provision and configure Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL.

Chapter 8, Querying Relational Data in Azure, explores Structured Query Language (SQL) and how you can use it to query, insert, update, and delete data in Azure SQL Database.

Chapter 9, Exploring Non-Relational Data Offerings in Azure, explores Azure data services for non-relational data, including Azure Table storage, Azure Blob Storage, Azure Files, and Azure Cosmos DB, as well as situations for using them.

Chapter 10, Provisioning and Configuring Non-Relational Data Services in Azure, looks at how to provision and configure Azure Cosmos DB and Azure Data Lake Storage.

Chapter 11, Components of a Modern Data Warehouse, examines the components of a modern data warehouse. You will understand the role of services such as Azure Databricks, Azure Synapse Analytics, and Azure HDInsight. You will also see how to use Azure Synapse Analytics to load and process data.

Chapter 12, Provisioning and Configuring Large-Scale Data Analytics in Azure, explores data ingestion options to build a data warehouse with Azure, services to perform data analytics, and features of Azure Synapse Analytics. You will create a Synapse Analytics workspace and use it to ingest and analyze data.

Chapter 13, Working with Power BI, is where you will learn what Power BI is, including its building blocks and how they work together.

Chapter 14, DP-900 Mock Exam, provides practice tests to prepare you for the DP-900 exam.

To get the most out of this book

You will need a computer with Windows or macOS with internet access to download the Azure Data Studio and Power Bi Desktop software, as well as access to the Azure portal and the Azure service websites. All code examples have been tested using Azure Data Studio on the Windows 11 operating system. However, they should work with macOS, Linux, and future version releases too.

Power BI Desktop and the mobile app can be downloaded from here:

https://powerbi.microsoft.com/en-us/downloads/

Azure Data Studio can be downloaded from here:

https://learn.microsoft.com/en-us/sql/azure-data-studio/download-azure-data-studio

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Microsoft-Certified-Azure-Data-Fundamentals-Exam-DP-900-Certification-Guide. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/LTQeN.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0 }

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share your thoughts

Once you’ve read Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily!

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below:

https://packt.link/free-ebook/9781803240633

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Core Data Concepts

This part will provide complete coverage of the knowledge and skills required for the Skills measured under the Describe Core Data Concepts section of the exam syllabus. We will also cover knowledge and skills that go beyond exam content so that you are prepared for a real-world, day-to-day Azure data-focused role.

In this part, you will learn about the concepts surrounding data projects, from the terminology and the roles of a data team to the different types of data workloads, such as relational, non-relational, and analytical.

This part comprises the following chapters:

Chapter 1, Understanding the Core Data TerminologiesChapter 2, Exploring the Roles and Responsibilities in Data DomainChapter 3, Working with Relational Data Chapter 4, Working with Non-Relational DataChapter 5, ExploringData Analytics Concepts

1 Understanding the Core Data Terminologies

Welcome, dear reader!

This book has been prepared based on the knowledge that you need to pass the Azure DP-900 Data Platform Fundamentals exam. So, you will find detailed use cases, hand's-on exercises, as well as sample questions and answers to help you during the exam.

This book will not only prepare you for certification but also complement the knowledge needed for planning and working in a data organization. You can look forward to learning about transactional and analytical database concepts, SQL and NoSQL, when to use each option, and the most modern tools and techniques for implementation on Azure.

Data generation and data processing have been growing exponentially in recent years. Data is being generated and processed everywhere: in information systems, cell phones, smart watches, smart TVs, city buses, subways, and cars, among others. Knowing how to capture and process this data to generate intelligence provides today’s main competitive advantage in the market.

To start understanding how these technologies and solutions work, it is necessary to know the concepts of data storage and processing, which we will cover in this introductory chapter.

By the end of this chapter, you will be able to understand the following:

The types of data and how to store itRelational and non-relational dataData AnalyticsHow to differentiate the data workloads

Understanding the core data concepts

To start, let’s understand the terminologies used in the data world so that all the following concepts are easily interpreted to be applied to technologies.

What is data?

Data is a record, also called a fact, which can be a number, text, or description used to make decisions. Data only generates intelligence when processed and then this data is called information or insights.

Data is classified into three basic formats: structured, semi-structured, and unstructured data. We will learn about them all in the following sections.

Structured data

Structured data is formatted and typically stored in a table represented by columns and rows. This data is found in relational databases, which organize their table structures in a way that creates relationships between these tables.

The following figure shows an example of a simple table with structured data:

Figure 1.1 – Example of structured data in a database

In this example, the table called CUSTOMER has seven columns and six records (rows) with different values.

The CUSTOMER table could be part of a customer relationship management (CRM) database, for example, financial and enterprise resource planning (ERP), among other types of business applications.

Semi-structured data

Semi-structured data is a structure in which records have attributes such as columns but are not organized in a tabular way like structured data. One of the most used formats for semi-structured data is JavaScript Object Notation (JSON) files. The following example demonstrates the structure of a JSON file containing the registration of one customer:

## JSON FILE - Document 1 ## { "CUSTOMER_ID": "10302", "NAME": { "FIRST_NAME": "Leo", "LAST_NAME": "Boucher" }, "ADDRESS": { "STREET": "54, rue Royale", "CITY": "Nantes", "ZIP_CODE": "44000", "COUNTRY": "France" } }

In this example, we can see that each JSON file contains a record, like the rows of the structured data table, but there are other formats of JSON and similar files that contain multiple records in the same file.

In addition to the JSON format, there is data in key-value and graph databases, which are considered semi-structured data, too.

The key-value database stores data in a related array format. These arrays have a unique identification key per record. Values written to a record can have a variety of formats, including numbers, text, and even full JSON files.

The following is an example of a key-value database:

Figure 1.2 – Example of a key-value database

As you can see in the preceding figure, each record can contain different attributes. They are stored in a single collection, with no predefined schema, tables, or columns, and no relationships between the entities; this differentiates the key-value database from the relational database.

The graph database is used to store data that requires complex relationships. A graph database contains nodes (object information) and edges (object relationship information). It means that the graph database predetermines what objects are and the relationships they will have with each other, but the records can contain different formats. The following is a representation of nodes and edges in a graph database of sales and deliveries:

Figure 1.3 – Example of a graph database

The diagram demonstrates how the relations around the ORDER entity are created in a graph database, considering the CUSTOMER, LOCATION, SUPPLIER, and PRODUCT nodes in the process. It represents an interesting acceleration in terms of query processing in the database because the graph is already structured to deliver the relations faster.

Unstructured data

In addition to structured and semi-structured data, there is also unstructured data, such as audio, videos, images, or binary records without a defined organization.

This data can also be processed to generate information, but the type of storage and processing for this is different from that of structured and semi-structured data. It is common, for example, for unstructured data such as audio to be transcribed using artificial intelligence, generating a mass of semi-structured data for processing.

Now that you understand the basics of data types, let’s look at how that data is stored in a cloud environment.

How is data stored in a modern cloud environment?

Depending on the data format, structured, semi-structured, and unstructured cloud platforms have different solutions. In Azure, we can count on Azure SQL Database, Azure SQL Database for PostgreSQL, Azure Database for MySQL, and database servers installed on virtual machines, such as SQL Server on a virtual machine in Azure, to store structured data. These are called relational databases.

Semi-structured data can be stored in Azure Cosmos DB and unstructured data (such as videos and images) can be stored in Azure Blob storage in a platform called Azure Data Lake Storage, optimized for queries and processing.

These services are delivered by Azure in the following formats:

Infrastructure as a service (IaaS) – Databases deployed on virtual machinesPlatform as a service (PaaS) – Managed database services, where the responsibility for managing the virtual machine and the operating system lies with Azure

For these database services to be used, they must be provisioned and configured to receive the data properly.

One of the most important aspects after provisioning a service is the access control configuration. Azure allows you to create custom access role control, but in general, we maintain at least three profiles:

Read-only – Users can read existing data on that service, but they cannot add new records or edit or delete themRead/Write – Users can read, create, delete, and edit recordsOwner – Higher access privilege, including the ability to manage permission for other users to use this data

With these configured profiles, you will be able to add users to the profiles to access the data storage/databases.

Let’s look at an example. You are an administrator of a CUSTOMER database, and you have the Owner profile. So, you configure access to this database for the leader of the commercial area to Read/Write, and for salespeople to Read-only.

In addition to the permissions configuration, it is important to review all network configurations, data retention, and backup patterns, among other administrative activities. These management tasks will be covered in Chapter 7, Provisioning and Configuring Relational Database Services in Azure.

In all database scenarios, we will have different access requirements, and it is important (as in the example) to accurately delimit the access level needs of each profile.

Describing a data solution

There are two types of database solutions: transactional solutions and analytical solutions. In the following sections, we will understand in detail what these solutions are and the requirements for choosing between them.

Transactional databases

Transactional databases are used by systems for basic operations: creating, reading, updating, and deleting. Transactional systems are considered the core of the informatization of business processes. With these basic operations, we can create entities such as customers, products, stores, and sales transactions, among others, to store important data.

A transactional database is commonly known as online transaction processing (OLTP) considering that this type of database serves online transactional operations between the application and the database.

For an organization, transactional databases usually have their data segmented into entities, which can be tables (or not), with or without a relationship between these entities to facilitate the correlation between this data.

For example, an e-commerce database can be structured with a table called Shopping_Cart, which represents the products that are being selected in the store during user navigation, and another called Purchases with the completed transaction records.

The process of segmenting entities in a database is called normalization, which will be covered in Chapter 3, Working with Relational Data.

The format of a normalized transactional database is optimized for transactional operations, but it is not the best format for data exploration and analysis.

The following is an example of a relational transactional database:

Figure 1.4 – Example of a relational transactional database

The preceding figure demonstrates a relational database of transactional workloads in a sales and delivery system. We can see the main entity, Orders, joined to Employees, Shippers, Customers, and Order Details, which then detail all products of this order in the relationship with the Products entity, which looks for information in the Categories and Suppliers entities.

Analytical databases

When the data solution requires a good interface for queries, explorations, and data analysis, the data storage organization is different from transactional databases. To meet this requirement, we prioritize the data aggregations and relationships for data consumption and exploration; this specialized data storage is called an analytical database.

Analytical databases use a process called online analytical processing (OLAP) and have undergone a great evolution in recent years with the emergence of data warehouses and big data platforms.

Analytical databases are constituted through a process of data ingestion, and they are responsible for processing and transforming the data into insights and information and then making this processed information available for consumption. The following steps describe this process:

Data ingestion – The process responsible for connecting to transactional databases or other data sources to collect raw transaction information and include it in the analytical database Data processing – The process performed by the OLAP platform to create a data model, organize entities, perform indicator calculations, and define metrics for data consumptionData query – After the data model is loaded with the proper organization for querying, data manipulation and reporting tools can connect to the OLAP platform to perform your queries

The following diagram is an example of a structured data model in an OLAP database:

Figure 1.5 – Example of an analytical relationship

The following diagram is a simple comparison of OLTP and OLAP databases:

Figure 1.6 – Data flow between OLTP and OLAP

The preceding figure demonstrates the traditional flow of data, which is sourced and stored in transactional OLTP databases and then moved to OLAP analytical databases for data intelligence generation.

Important note

There are modern data storage platforms that aim to unite OLTP and OLAP on the same platform, but these databases, often called NewSQL, still need to mature their structures to deliver the best of transactional and analytical worlds in the same database. The industry standard is to keep transactional and analytical data structures separate.

In this section, we defined what transactional and analytical data solutions are and the characteristics of each solution. In the next section, we will detail the recommended data types and storage for each of these types.

Defining the data type and proper storage

Categorizing the data to identify its types and best solutions for your storage is an important process for a data solution, and not just for evaluating whether it is structured, unstructured, or semi-structured. In this section, you will learn about the characteristics of different types of data.

Characteristics of relational and non-relational databases

Relational databases are the most traditional and used database format, as they have an easy-to-understand design and a simple tabular data model like other simple platforms such as Excel spreadsheets. Relational databases have predefined schemas, which are the structures of their tables, containing columns, the data type of each column, and other parameters such as primary and secondary keys used in relationships.

However, relational databases with these rigid schemas can pose challenges, as presented in the following example.

Your CRM system has a database structure with a CUSTOMER table, where you intend to store customer data: CUSTOMER_ID, CUSTOMER_NAME, ADDRESS, MOBILE_PHONE, and ZIP_CODE. To do this, you start by creating a CUSTOMER table with five fields:

Figure 1.7 – Example of a CUSTOMER table in a relational database

However, after setting up this table, you realize that you have clients that have more than one address and zip code, and even more than one mobile phone number. How can you solve this issue?

To face problems like this one, we can use normalization one more time. Normalization is done when there is a need to split a table (CUSTOMER, in this example) into more child tables that are correlated to the initial table.

Therefore, we can change the CUSTOMER table as follows:

Figure 1.8 – A relationship model in a transactional database

Non-relational databases allow you to store data in its original format without having a predefined schema as in relational databases. The most common non-relational storage format is document storage, where each record in the database is an independent file. The benefit is that each file can have different and unique attributes.

On the other hand, the files being independent can present a challenge: data duplication.

Going back to our CUSTOMER entity example in a relational database, when two or more customers live at one address, the database records that relationship, and the normalized database only keeps one address record. But in a non-relational database, if two customers live at the same address, this address will be presented in the records of the first customer and the second customer as well, independently.

Let’s now analyze how this storage could be structured in a relational database, using the concept of normalization:

Figure 1.9 – Example of data structured into tables

The preceding figure exemplifies the data stored within the relational model tables with the CUSTOMER, CUSTOMER_ADDRESS, and ADDRESS entities to understand the structure of a normalized table.

Now let’s analyze the same data in a CUSTOMER table, but in the format of a non-relational database:

## JSON FILE - CUSTOMER ## { "CUSTOMER_ID": "0001", " CUSTOMER_NAME": { "FIRST_NAME": " MARK", "LAST_NAME": " HUGGS" }, "ADDRESS": { "STREET": "1200, Harper Str" } } ## JSON FILE – CUSTOMER2 ## { "CUSTOMER_ID": "0002", " CUSTOMER_NAME": { "FIRST_NAME": " KRISTI", "LAST_NAME": " LAMP" }, "ADDRESS": { "STREET": "1200, Harper Str" } }

In the preceding example, we can see two records in a CUSTOMER table, with each record being a JSON document structured with the attributes of each customer.

Thus, we can observe that the same data can be stored in relational and non-relational structures.

Therefore, to decide between a relational or non-relational data storage solution, you must evaluate the behavior of the application or the user that will use that database, the relationships between the entities, and possible normalization processes.

Both relational and non-relational databases should be used primarily for transactional workloads. In the upcoming sections, we will understand the differences between these transactional workloads and analytical workloads.

A transactional workload

Relational and non-relational databases can be used as solutions for transactional workloads, which are the databases used to perform basic data storage operations: create, read, update, and delete (CRUD). Transactional operations must be done in sequence, with a transaction control that only confirms the conclusion of this transaction (a process called a commit) when the entire operation is successfully executed. If this does not occur, the transaction is canceled, and all processes are not performed, thus generating a process called rollback.

An important idea to help understand the difference between relational and non-relational databases is ACID, present in most database technologies. These properties are as follows:

Atomicity: This is the property that controls the transaction and defines whether it was successfully performed completely to commit or must be canceled by performing a rollback. Database technology should ensure atomicity.Consistency: For a running transaction, it is important to evaluate consistency between the database state before receiving the data and the database state after

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide E-Book

Marcelo Leite

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

Contributors

About the author

About the reviewers

Table of Contents

Preface

Part 1: Core Data Concepts

1

Understanding the Core Data Terminologies

Understanding the core data concepts

What is data?

How is data stored in a modern cloud environment?

Describing a data solution

Transactional databases

Analytical databases

Defining the data type and proper storage

Characteristics of relational and non-relational databases

A transactional workload

An analytical workload

Understanding data ingestion

Understanding batch load

Understanding data streaming

Case study

Summary

Sample questions and answers

Answer key

2

Exploring the Roles and Responsibilities in Data Domain

Different workforces in a data domain

Most common roles in a data domain

Database Administrator

Data engineer

Data analyst

Tasks and tools for database administration profiles

Tasks of the DBA

Tools for the DBA

Tasks and tools for data engineer profiles

Tasks of the data engineer

Tools for the data engineer

Tasks and tools for the data analyst

Tasks of the data analyst

Tools for the data analyst

Case study

Summary

Sample questions and answers

Answer key

3

Working with Relational Data

Exploring the characteristics of relational data

Tables and entities

Relationship between entities

Exploring relational data structures

Data normalization

Introducing SQL

Key advantages of SQL

Key disadvantages of SQL

Understanding the categories of SQL commands

DDL

DML and DQL

Describing the database components

Views

Stored procedures

Triggers

Indexes

Case study

Summary

Sample questions and answers

Answer key

4

Working with Non-Relational Data

Exploring the characteristics of non-relational data

Understanding the types of non-relational data

Non-structured data

Semi-structured data

Non-relational data basic storage

Exploring NoSQL databases

What is a NoSQL database?