E-Book
33,59 €

Learn Azure Synapse Data Explorer E-Book

Pericles (Peri) Rocha

0,0

33,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Lebensstil
Sprache: Englisch

Beschreibung

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data.
This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users.
By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 448

Veröffentlichungsjahr: 2023

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Learn Azure Synapse Data Explorer

A guide to building real-time analytics solutions to unlock log and telemetry data

Pericles (Peri) Rocha

BIRMINGHAM—MUMBAI

Learn Azure Synapse Data Explorer

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Managers: Birjees Patel and Arindam Majumder

Content Development Editor: Shreya Moharir

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Tejal Daruwale Soni

Production Designer: Shankar Kalbhor

Marketing Coordinator: Nivedita Singh

First published: February 2023

Production reference: 1200123

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-395-6

www.packtpub.com

To my daughter, Isabella, I love you to the moon and all the way back. To my wife, Cecilia, my partner, and the love of my life, thank you for your patience, love, friendship, and partnership in life. I love you. To my brother, Plinio, my best friend, and my favorite companion in the things we do together. And last but not least, in loving memory of my mother, Yara, and my father, Jose. This work is dedicated to all of you.

Contributors

About the author

Pericles (Peri) Rocha is a technical product manager, architect, and data scientist with more than 25 years of experience. He has worked with diverse challenges from building highly available database environments to data science projects. He holds an MSc degree in data science from UIUC and is a member of Tau Beta Pi. He currently works at Microsoft as a product manager in the Azure Synapse engineering team. Originally from São Paulo, Brazil, Peri worked in Europe for three years before relocating to the USA in 2016. In his spare time, he enjoys playing music, studying karate, and reading. He lives near Redmond, WA, with his wife, daughter, two dogs, and nine guitars.

I’d like to thank everyone who crossed my path through 25 years of professional experience. All of you helped me shape my own story and I am deeply thankful for it.

About the reviewer

Felipe Andrade is a client technical lead at Microsoft Canada. He has been at Microsoft for 9 years and has been working with data analytics for over 10 years. He has spent most of his career at Microsoft in analytics technical roles working with Power BI, SQL, Synapse, Databricks, and machine learning. He also worked in a couple of startups as a software engineer running social network analytics.

I’d like to thank Peri Rocha for inviting me to be a technical reviewer for his book. Thanks to my family, Leticia, Luisa, and Alice, for their patience and kindness.

Preface

Part 1 Introduction to Azure Synapse Data Explorer

1 Introducing Azure Synapse Data Explorer

Technical requirements

Understanding the lifecycle of data

Introducing the Team Data Science Process

Tooling and infrastructure

The need for a fast and highly scalable data exploration service

What is Azure Synapse?

Data integration

Enterprise data warehousing

Exploration on the data lake

Apache Spark

Log and telemetry analytics

Integrated business intelligence

Data governance

Broad support for ML

Security and Managed Virtual Network

Management interface

What is Azure Synapse Data Explorer?

Integrating Data Explorer pools with other Azure Synapse services

Query experience integrated into Azure Synapse Studio’s query editor

Exploring, preparing, and modeling data with Apache Spark

Data ingestion made easy with pipelines

Unified management experience

Exploring the Data Explorer pool infrastructure and scalability

Data Explorer pool architecture

Scalability of compute resources

Managing data on distributed clusters

Mission-critical infrastructure

How much scale can Data Explorer handle?

What makes Azure Synapse Data Explorer unique?

When to use Azure Synapse Data Explorer

Summary

2 Creating Your First Data Explorer Pool

Technical requirements

Creating a free Azure account

Creating an Azure Synapse workspace

Basics tab

Security tab

Networking tab

Tags tab

Review + create tab

Finding your new workspace

Creating a Data Explorer pool using Azure Synapse Studio

Basics tab

Additional settings tab

Tags tab

Review + create tab

Creating a Data Explorer pool using the Azure portal

Creating a Data Explorer pool using the Azure CLI

Summary

3 Exploring Azure Synapse Studio

Technical requirements

Exploring the user interface of Azure Synapse Studio

Running your first query

Creating a database

Loading the data

Verifying whether your data has loaded successfully

Working with data in Azure Synapse notebooks

Saving your work and configuring source control

Managing and monitoring Data Explorer pools

Scaling Data Explorer pools

Pausing and resuming pools

Monitoring Data Explorer pools

Summary

4 Real-World Usage Scenarios

Technical requirements

Building a multi-purpose end-to-end analytics environment

Sources

Ingest

Store

Process

Enrich

Serve

User

Summary

Managing IoT data

Processing and analyzing geospatial data

Enabling real-time analytics with big data

Performing time series analytics

Summary

Part 2 Working with Data

5 Ingesting Data into Data Explorer Pools

Technical requirements

Understanding the data loading process

Defining a retention policy

Choosing a data load strategy

Streaming ingestion

Batching ingestion

Performing data ingestion

Using KQL control commands

Building an Azure Synapse pipeline

Implementing continuous ingestion

Using other data ingestion mechanisms

Summary

6 Data Analysis and Exploration with KQL and Python

Technical requirements

Analyzing data with KQL

Selecting data

Working with calculated columns

Plotting charts

Obtaining percentiles

Creating a time series

Detecting outliers

Using linear regression

Exploring Data Explorer pool data with Python

Creating an Apache Spark pool

Working with Azure Synapse notebooks

Reading data from Data Explorer pools

Plotting charts

Performing data transformation tasks

Creating a lake database

Summary

7 Data Visualization with Power BI

Technical requirements

Introduction to the Power BI integration

Creating a Power BI report

Adding data sources to your Power BI report

Connecting Power BI with your Azure Synapse workspace

Authoring Power BI reports from Azure Synapse Studio

Summary

8 Building Machine Learning Experiments

Technical requirements

Understanding the application of ML

Introducing ML into your projects with AutoML

Creating an Azure Machine Learning workspace

Configuring the Azure Machine Learning integration

Finding the best model with AutoML

Exploring additional ML capabilities in Azure Synapse

Using pre-trained models with Cognitive Services

Finding patterns using KQL

Training models with Apache Spark MLlib

Building applications with SynapseML

Summary

9 Exporting Data from Data Explorer Pools

Technical requirements

Understanding data export scenarios

Exporting data with client tools

Using server-side export to pull data

Performing robust exports with server-side data push

Exporting to cloud storage

Exporting to SQL tables

Exporting to external tables

Configuring continuous data export

Summary

Part 3Managing Azure Synapse Data Explorer

10 System Monitoring and Diagnostics

Technical requirements

Monitoring your environment

Checking your Data Explorer pool capacity

Monitoring query execution

Reviewing object metadata and changes

Setting up alerts

Creating action groups

Creating alert rules

Summary

11 Tuning and Resource Management

Technical requirements

Implementing resource governance with workload groups

Managing workload groups

Classifying user requests

Queuing requests for delayed execution

Speeding up queries using cache policies

Summary

12 Securing Your Environment

Technical requirements

Security overview

Managing data encryption

Configuring data encryption at rest

Understanding data encryption in transit

Authenticating users

Configuring access to resources

Synapse RBAC roles

Reviewing role assignments

Assigning RBAC roles

Data Explorer database roles

Implementing network security

Using a managed virtual network

Managed private endpoint connection

Enabling data exfiltration protection

Controlling public network access

Protecting against external threats

Summary

13 Advanced Data Management

Technical requirements

Managing extents

Extent tagging

Moving extents

Dropping extents

Purging personal data

Enabling purge on Data Explorer pools

Executing data purge operations

Monitoring data purge operations

Summary

Index

Other Books You May Enjoy

Preface

Large volumes of data are generated daily from applications, websites, internet of things devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and enables you to work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you get from your log and telemetry data.

This book offers a comprehensive view of Azure Synapse Data Explorer, covering not only the core scenarios of Data Explorer but also how it integrates into the whole picture within Azure Synapse. From data ingestion, through data visualization and advanced analytics, you will learn an end-to-end approach to maximizing the value of unstructured data and driving powerful insights using data science capabilities. With real-world usage scenarios, you’ll learn how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. You will also learn how to manage big data as part of a platform as a service offering, tune, secure, and serve data at scale to end users.

By the end of this book, you will have mastered the big data life cycle and be able to implement advanced analytical scenarios from raw telemetry and log data.

Who this book is for

If you are a data engineer, data analyst, or business analyst working with unstructured data and want to learn how to maximize the value of such data, this book is for you. To maximize your learning experience from this book, you should be familiar with working with data and performing simple queries using SQL or KQL. Even though it is not a requirement, familiarity with Python will help you get more from the examples. This book is also excellent for professionals already working with Azure Synapse who want to incorporate unstructured data into their data science projects.

What this book covers

Chapter 1, Introducing Azure Synapse Data Explorer, is the first of four chapters in Part 1, Introduction to Azure Synapse Data Explorer, where you will be introduced to the product and learn the basics that you need before you start to work with data. It welcomes you to Azure Synapse Data Explorer and elaborates on the need for a fast and highly scalable data exploration service for telemetry and log data. It introduces Azure Synapse and explains how the Data Explorer service fits under the Azure Synapse umbrella. Finally, it discusses the architecture and infrastructure of Data Explorer pools, and the scale of the service today.

Chapter 2, Creating Your First Data Explorer Pool, gets your hands busy by walking you through the creation of your first Azure Synapse workspace and a Data Explorer pool using the Azure portal, Azure Synapse Studio, or the Azure Command-Line Interface (CLI). If you are not familiar with Azure yet, don’t worry; this chapter guides you through the steps to create your first free Azure account, allowing you to follow the examples in the book.

Chapter 3, Exploring Azure Synapse Studio, introduces the development and management environment of Azure Synapse. You will learn about the user interface elements of Azure Synapse Studio, and where to find what you are looking for by navigating through the hubs. In addition to that, in this chapter, you will load some data into a database and run your first query to help you familiarize yourself with the query editor. This chapter closes with an overview of where to manage and monitor your environment using Azure Synapse Studio.

Chapter 4, Real-World Usage Scenarios, describes some example solution architectures you can use in common log and telemetry data analytics scenarios. It looks at five real-world use cases that integrate Azure Synapse Data Explorer with other Azure services and helps you understand the blueprints so that you can build your own.

Chapter 5, Ingesting Data into Data Explorer Pools, kicks off Part 2, Working with Data. It walks you through the data loading process, choosing your own data loading strategy, and walks you through different ways to load data into Data Explorer pools. This chapter builds the data assets that you will use in most chapters of the book.

Chapter 6, Data Exploration and Analysis with KQL and Python, is all about learning how to query, transform, and get insights from your data using Kusto Query Language (KQL) and Python. You will learn how to use KQL to explore the data you have at hand and familiarize yourself with the schema, plot simple charts in the query editor, obtain percentiles, and even use native KQL commands to look at trends in your data using linear regression. In the second half of this chapter, you will create an Azure Synapse notebook to explore and transform data using Python and create a lake database.

Chapter 7, Data Visualization with Power BI, complements the previous chapter by helping you configure Power BI integration with Azure Synapse and author new Power BI reports directly from Azure Synapse Studio. It walks you through the creation of reports that connect to data in Data Explorer pools, as well as to your new lake database.

Chapter 8, Building Machine Learning Experiments, provides an overview of applied machine learning, and how to introduce advanced analytics to your Azure Synapse projects using automated machine learning (AutoML). You will use Python to prepare your data for machine learning experiments, train a series of models, and find the best model to help you predict values.

Chapter 9, Exporting Data from Data Explorer Pools, closes Part 2, Working with Data, by walking you through data export scenarios. It explains scenarios where data exports are needed and walks you through different options you have available to perform data exports, including continuous data exports.

Chapter 10, System Monitoring and Diagnostics, is the first of four chapters in Part 3, Managing Azure Synapse Data Explorer. In this chapter, you will learn about managing a platform-as-a-service service such as Azure Synapse, and which parts of the service you should be concerned with. Through code examples and guidance through the user interface, you will learn how to stay on top of your Data Explorer pools and proactively monitor them. By setting up alerts, you’ll learn how to get notified on your phone if an event of interest happens in your environment.

Chapter 11, Tuning and Resource Management, introduces resources to help you provide predictable performance to end users and using cache policies to speed up queries. It walks you through the implementation of resource management to help you categorize user requests to prioritize the execution of critical workloads while queueing requests that can wait.

Chapter 12, Securing Your Environment, provides you with the information you need to make sure your data is secure at rest and in transit, and that only people who are intended to access your data have access to it. It walks you through an overview of the security issues you need to consider for your own implementations, how to double-encrypt your data for an added layer of security, how to authenticate and authorize users, and how to protect the network environment that transits your data.

Chapter 13, Advanced Data Management, covers how to adhere to governmental regulations for data handling, including how to permanently purge personal data. You will learn how to use extents, or data shards, in Azure Synapse Data Explorer to move large volumes of data quickly for archival.

To get the most out of this book

To maximize your learning experience, you should have a basic understanding of concepts around data integration, data retrieval, and building basic data visualizations. Previous experience with SQL, KQL, and Python is not required, but it will help you understand the concepts in the code examples more quickly.

Software/hardware covered in the book

Operating system requirements

Azure Synapse Studio

Windows, macOS, or Linux

The Azure portal

Windows, macOS, or Linux

Power BI Desktop

Windows

Microsoft Azure App

iOS or Android

The Azure portal and Azure Synapse Studio are web-based tools that are used to manage, develop, and build solutions for Azure Synapse Data Explorer. Microsoft supports the latest versions of the following browsers: Microsoft Edge, Safari (Mac only), Chrome, and Firefox.

To install Power BI Desktop, visit https://learn.microsoft.com/power-bi/fundamentals/desktop-get-the-desktop.

To install the Microsoft Azure App, visit http://aka.ms/getazureapp on your mobile device, or look for the Microsoft Azure App in your device’s app store.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Learn-Azure-Synapse-Data-Explorer. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/DQQ7A.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “To create or alter a new workload group, use the .create-or-alter workload_group command.”

A block of code is set as follows:

.alter-merge workload_group ['Engineering Department WG'] ``` { "RequestQueuingPolicy": { "IsEnabled": true } } ```

Any command-line input or output is written as follows:

az synapse kusto pool create --name "droneanalyticsadx" --resource-group "rg-AzureSynapse" --sku name="Compute optimized" size="Small" --workspace-name "drone-analytics"

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “To enable it, you must select the Enable option next to Double encryption using a customer-managed key, in the Security tab of the Create Synapse workspace wizard.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Learn Azure Synapse Data Explorer, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803233956

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1 Introduction to Azure Synapse Data Explorer

To maximize your learning experience, you should quickly become familiar with the core concepts and tools you will work with when reproducing the examples and learning new concepts, and how these concepts can help you in real-life projects. The first part of the book focuses on introducing Azure Synapse Data Explorer and all of its layers. You will learn about the service architecture, all of the platform elements within Azure Synapse, and how to create your own lab environment to run through the book examples. You will also become familiar with Azure Synapse Studio, and the development and management interface of Azure Synapse. Finally, you will learn about solution templates from real-world usage scenarios that will help you speed up your own Azure Synapse Data Explorer implementations.

This part comprises the following chapters:

Chapter 1, Introducing Azure Synapse Data ExplorerChapter 2, Creating Your First Data Explorer PoolChapter 3, Exploring Azure Synapse StudioChapter 4, Real-World Usage Scenarios

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Learn Azure Synapse Data Explorer E-Book

Pericles (Peri) Rocha

Learn Azure Synapse Data Explorer

Learn Azure Synapse Data Explorer

Contributors

About the author

About the reviewer

Table of Contents

Preface

Part 1 Introduction to Azure Synapse Data Explorer

1

Introducing Azure Synapse Data Explorer

Technical requirements

Understanding the lifecycle of data

Introducing the Team Data Science Process

Tooling and infrastructure

The need for a fast and highly scalable data exploration service

What is Azure Synapse?

Data integration

Enterprise data warehousing

Exploration on the data lake

Apache Spark

Log and telemetry analytics

Integrated business intelligence

Data governance

Broad support for ML

Security and Managed Virtual Network

Management interface

What is Azure Synapse Data Explorer?

Integrating Data Explorer pools with other Azure Synapse services

Query experience integrated into Azure Synapse Studio’s query editor

Exploring, preparing, and modeling data with Apache Spark

Data ingestion made easy with pipelines

Unified management experience

Exploring the Data Explorer pool infrastructure and scalability

Data Explorer pool architecture

Scalability of compute resources

Managing data on distributed clusters

Mission-critical infrastructure

How much scale can Data Explorer handle?

What makes Azure Synapse Data Explorer unique?

When to use Azure Synapse Data Explorer

Summary

2

Creating Your First Data Explorer Pool

Technical requirements

Creating a free Azure account

Creating an Azure Synapse workspace

Basics tab

Security tab

Networking tab

Tags tab

Review + create tab

Finding your new workspace

Creating a Data Explorer pool using Azure Synapse Studio

Basics tab

Additional settings tab

Tags tab

Review + create tab

Creating a Data Explorer pool using the Azure portal

Creating a Data Explorer pool using the Azure CLI

Summary

3

Exploring Azure Synapse Studio

Technical requirements

Exploring the user interface of Azure Synapse Studio

Running your first query

Creating a database

Loading the data

Verifying whether your data has loaded successfully

Working with data in Azure Synapse notebooks

Saving your work and configuring source control

Managing and monitoring Data Explorer pools

Scaling Data Explorer pools

Pausing and resuming pools

Monitoring Data Explorer pools

Summary

4

Real-World Usage Scenarios

Technical requirements