Azure Synapse Analytics Cookbook - Gaurav Agarwal - E-Book

Azure Synapse Analytics Cookbook E-Book

Gaurav Agarwal

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

As data warehouse management becomes increasingly integral to successful organizations, choosing and running the right solution is more important than ever. Microsoft Azure Synapse is an enterprise-grade, cloud-based data warehousing platform, and this book holds the key to using Synapse to its full potential. If you want the skills and confidence to create a robust enterprise analytical platform, this cookbook is a great place to start.
You'll learn and execute enterprise-level deployments on medium-to-large data platforms. Using the step-by-step recipes and accompanying theory covered in this book, you'll understand how to integrate various services with Synapse to make it a robust solution for all your data needs. Whether you're new to Azure Synapse or just getting started, you'll find the instructions you need to solve any problem you may face, including using Azure services for data visualization as well as for artificial intelligence (AI) and machine learning (ML) solutions.
By the end of this Azure book, you'll have the skills you need to implement an enterprise-grade analytical platform, enabling your organization to explore and manage heterogeneous data workloads and employ various data integration services to solve real-time industry problems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 144

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Azure Synapse Analytics Cookbook

Implement a limitless analytical platform using effective recipes for Azure Synapse

Gaurav Agarwal

Meenakshi Muralidharan

BIRMINGHAM—MUMBAI

Azure Synapse Analytics Cookbook

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Senior Editor: David Sugarman

Content Development Editor: Priyanka Soam

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Shankar Kalbhor

Marketing Coordinators: Abeer Dawe, Shifa Ansari

First published: April 2022

Production reference: 1130422

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-150-1

www.packt.com

Foreword

Digitalization of economies, across all sectors such as health, mobility, energy, and finance, has seen a significant generation of data. We are living in an age where data is becoming central to how organizations, and indeed our economies and society, function. The increasing value of data is driving organizations to retain more of it so that they can explore new opportunities to increase customer loyalty, bring new services to market, and compete more effectively.

However, the exponential growth in the volume, variety, and velocity of data poses challenges in unlocking the value from that data – at the scale, scope, and speed of business. Today, for organizations that are innovating and transforming with data, the hyperscale cloud is the technology of choice for collecting, transforming, processing, and analyzing data at scale.

Azure Synapse Analytics is a limitless analytics service on Microsoft Azure that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated options—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, transform, manage, and serve data for immediate Business Intelligence (BI) and Machine Learning (ML) needs.

I am delighted to present to you this book on Azure Synapse Analytics. The audience for this book is data architects, data engineers, and developers who want to learn and understand the main concepts of Azure Synapse analytics and implement them in real-world scenarios.

This is a practical, hands-on book to help the reader learn how to load data into Synapse, create robust data pipelines using the Synapse notebook, learn how to visualize data, and manage other big data scenarios. Synapse SQL architecture components and how to leverage scale out capabilities in Azure for distributed processing of data across nodes are described. Data transformation and analytics in real-time using Azure Synapse Link and Cosmos DB will enable the reader to learn how to perform real-time analytics for many applications, including IoT. Big data processing and transformation with Synapse notebooks will enable the reader to work with Azure Data Lake Storage Gen2. Data enrichment using Azure ML helps the reader harness the power of Azure Machine Learning along with Spark MLlib with Synapse Studio. Visualization and reporting with petabytes of data using Power BI, data cataloging, and governance within Synapse truly enable exploration of data using the power of serverless pool. Finally, the book helps the reader migrate legacy data warehouses using Azure Synapse Pathway.

I invite you to leverage this book to unlock the value of data at scale, using the best of SQL technologies used in enterprise data warehousing, spark technologies used for big data, data explorer for log and time series analytics, pipelines for data integration, and ETL/ELT and take advantage of the deep integration of Azure Synapse Analytics with other Azure services such as Power BI, CosmosDB, and AzureML.

Rohini Srivathsa

National Technology Officer - Microsoft India

Contributors

About the authors

Gaurav Agarwal is a cloud solution architect at Microsoft India Corp. Ltd, working closely with Microsoft clients on Azure data and AI, Azure ML, big data, IoT, and Power BI. He has extensive experience in architecting and transforming data solutions for the modern cloud, and expertise in building large-scale enterprise data warehouse solutions.

Meenakshi Muralidharan leads modern data platforms and is a renowned chief architect for an Indian multinational information technology services and consulting company. She has extensive experience in building large-scale data platforms and applications in Microsoft Azure.

About the reviewers

Shaleen Thapa is a partner technology strategist for ISVs at Microsoft India and has over 20 years of experience. With more than 15 years of Microsoft experience, his primary job responsibilities are to help ISVs and start-ups adopt the latest technologies on Azure, build solutions on Azure, and strategize ISVs'/start-ups' roadmaps for their product development and GTM. He primarily focuses on Azure apps and infra, Azure services, Azure Synapse Analytics, blockchain, Azure DevOps, and GitHub. He has authored several articles published in MSDN Magazine on various Azure and on-premises technologies. He has delivered technical talks on various internal as well as external forums, such as Microsoft Ready, Open Source India, Barracuda Networks, and others.

Ajay Agarwal was born and brought up in India. He completed his master's of technology at BITS. He has significant experience in product management in the analytics domain. For years, he has managed and evolved multiple cloud capabilities and analytics products in the data science and machine learning domains. He is known for his passion for technology and leadership.

Table of Contents

Preface

Chapter 1: Choosing the Optimal Method for Loading Data to Synapse

Choosing a data loading option

Getting ready

How to do it…

How it works…

There's more…

Achieving parallelism in data loading using PolyBase

Moving and transforming using a data flow

Getting ready

How to do it…

How it works…

Adding a trigger to a data flow pipeline

Getting ready

How to do it…

How it works…

Unsupported data loading scenarios

How to do it…

There's more…

Data loading best practices

How to do it…

Chapter 2: Creating Robust Data Pipelines and Data Transformation

Reading and writing data from ADLS Gen2 using PySpark

Getting ready

How to do it…

How it works…

Visualizing data in a Synapse notebook

Getting ready

How to do it…

How it works…

Chapter 3: Processing Data Optimally across Multiple Nodes

Working with the resource consumption model of Synapse SQL

Architecture components of Synapse SQL

Resource consumption

Optimizing analytics with dedicated SQL pool and working on data distribution

Understanding columnstore storage details

Knowing when to use round-robin, hash-distributed, and replicated distributions

Knowing when to partition a table

Checking for skewed data and space usage

Best practices

Workload management for dedicated SQL pool

Working with serverless SQL pool

Getting ready

How to do it…

There's more…

Processing and querying very large datasets

Getting ready

How to do it…

Script for statistics in Synapse SQL

How to do it…

There's more…

Chapter 4: Engineering Real-Time Analytics with Azure Synapse Link Using Cosmos DB

Integrating an Azure Synapse ETL pipeline with Cosmos DB

Introducing Cosmos DB

Azure Synapse Link integration

Supported features of Azure Synapse Link

Azure Synapse runtime support

Structured streaming support

Network and data security support for Azure Synapse Link with Cosmos DB

Setting up Azure Cosmos DB analytical store

Getting ready

How to do it…

Enabling Azure Synapse Link and connecting Azure Cosmos DB to Azure Synapse

Getting ready

How to do it…

IoT end-to-end solutions and getting real-time insights

Getting ready

How to do it…

Use cases using Synapse Link

Chapter 5: Data Transformation and Processing with Synapse Notebooks

Landing data in ADLS Gen2

Getting ready

How to do it…

Exploring data with ADLS Gen2 to pandas DataFrame in Synapse notebook

Getting ready

How to do it…

There's more…

Processing data from a PySpark notebook within Synapse

How to do it…

Performing read-write operations to a Parquet file using Spark in Synapse

Getting ready

How to do it…

Analytics with Spark

Getting ready

How it works…

Chapter 6: Enriching Data Using the Azure ML AutoML Regression Model

Training a model using AutoML in Synapse

Getting ready

How to do it…

How it works…

Building a regression model from Azure Machine Learning in Synapse Studio

Getting ready

How to do it…

How it works…

Modeling and scoring using SQL pools

Getting ready

How to do it…

How it works…

An overview of Spark MLlib and Azure Synapse

Integrating AI and Cognitive Services

Getting ready

How to do it…

How it works…

Chapter 7: Visualizing and Reporting Petabytes of Data

Combining Power BI and aserverless SQL pool

Getting ready

How to do it…

How it works…

Working on a composite model

Getting ready

How to do it…

How it works…

Using materialized views to improve performance

Getting ready

How to do it…

How it works…

Chapter 8: Data Cataloging and Governance

Configuring your Azure Purview account for Synapse SQL pool

Getting ready

How to do it…

How it works…

Scanning data using the Purview data catalog

Getting ready

How to do it…

How it works…

Enumerating resources within Synapse Studio

Getting ready

How to do it…

How it works…

Chapter 9: MPP Platform Migration to Synapse

Understanding data migration challenges

Tables and databases

Data modeling

Data Manipulation Language statements

Functions, stored procedures, sequences, and triggers

Configuring Azure Synapse Pathway

Getting ready

How to do it…

How it works…

Evaluating a data source to be migrated

Getting ready

How to do it…

Generating a data migration assessment

Getting ready

How to do it…

Supported data sources for migration

IBM Netezza and Azure Synapse platform differences

Oracle Exadata and Azure Synapse platform differences

Snowflake and Azure Synapse platform differences

Microsoft SQL Server and Azure Synapse platform differences

Other Books You May Enjoy

Chapter 1: Choosing the Optimal Method for Loading Data to Synapse

In this chapter, we will cover how to enrich and load data to Azure Synapse