Azure Databricks Cookbook - Phani Raj - E-Book

Azure Databricks Cookbook E-Book

Phani Raj

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Azure Databricks is a unified collaborative platform for performing scalable analytics in an interactive environment. The Azure Databricks Cookbook provides recipes to get hands-on with the analytics process, including ingesting data from various batch and streaming sources and building a modern data warehouse.
The book starts by teaching you how to create an Azure Databricks instance within the Azure portal, Azure CLI, and ARM templates. You’ll work through clusters in Databricks and explore recipes for ingesting data from sources, including files, databases, and streaming sources such as Apache Kafka and EventHub. The book will help you explore all the features supported by Azure Databricks for building powerful end-to-end data pipelines. You'll also find out how to build a modern data warehouse by using Delta tables and Azure Synapse Analytics. Later, you’ll learn how to write ad hoc queries and extract meaningful insights from the data lake by creating visualizations and dashboards with Databricks SQL. Finally, you'll deploy and productionize a data pipeline as well as deploy notebooks and Azure Databricks service using continuous integration and continuous delivery (CI/CD).
By the end of this Azure book, you'll be able to use Azure Databricks to streamline different processes involved in building data-driven apps.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 363

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Azure Databricks Cookbook

Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Phani Raj

Vinod Jaiswal

BIRMINGHAM—MUMBAI

Azure Databricks Cookbook

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh

Publishing Product Manager: Aditi Gour

Senior Editor: Mohammed Yusuf Imaratwale

Content Development Editor: Sean Lobo

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Jyoti Chauhan

First published: September 2021

Production reference: 2150921

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78980-971-8

www.packt.com

Contributors

About the authors

Phani Raj is an Azure data architect at Microsoft. He has more than 12 years of IT experience and works primarily on the architecture, design, and development of complex data warehouses, OLTP, and big data solutions on Azure for customers across the globe.

Vinod Jaiswal is a data engineer at Microsoft. He has more than 13 years of IT experience and works primarily on the architecture, design, and development of complex data warehouses, OLTP, and big data solutions on Azure using Azure data services for a variety of customers. He has also worked on designing and developing real-time data processing and analytics reports from the data ingested from streaming systems using Azure Databricks.

About the reviewers

Ankur Nayyar is an architect who builds and deploys advanced, customer-specific big data analytics solutions using his advanced knowledge of data architecture, data mining, machine learning, computer programming, and emerging open source technologies such as Apache Spark. He stays up to date with the recent advancements in the machine learning/AI industry, such as TensorFlow, and actively prototypes with these open source frameworks. He has participated in internal and external technology and data science forums, competitions, conferences, and discussions.

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a mechanical engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a master's degree from the Faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.

Table of Contents

Preface

Chapter 1: Creating an Azure Databricks Service

Technical requirements

Creating a Databricks workspace in the Azure portal

Getting ready

How to do it…

How it works…

Creating a Databricks service using the Azure CLI (command-line interface)

Getting ready

How to do it…

How it works…

There's more…

Creating a Databricks service using Azure Resource Manager (ARM) templates

Getting ready

How to do it…

How it works…

Adding users and groups to the workspace

Getting ready

How to do it…

How it works…

There's more…

Creating a cluster from the user interface (UI)

Getting ready

How to do it…

How it works…

There's more…

Getting started with notebooks and jobs in Azure Databricks

Getting ready

How to do it…

How it works…

Authenticating to Databricks using a PAT

Getting ready

How to do it…

How it works…

There's more…

Chapter 2: Reading and Writing Data from and to Various Azure Services and File Formats

Technical requirements

Mounting ADLS Gen2 and Azure Blob storage to Azure DBFS

Getting ready

How to do it…

How it works…

There's more…

Reading and writing data from and to Azure Blob storage

Getting ready

How to do it…

How it works…

There's more…

Reading and writing data from and to ADLS Gen2

Getting ready

How to do it…

How it works…

Reading and writing data from and to an Azure SQL database using native connectors

Getting ready

How to do it…

How it works…

Reading and writing data from and to Azure Synapse SQL (dedicated SQL pool) using native connectors 

Getting ready

How to do it…

How it works…

Reading and writing data from and to Azure Cosmos DB 

Getting ready

How to do it…

How it works…

Reading and writing data from and to CSV and Parquet 

Getting ready

How to do it…

How it works…

Reading and writing data from and to JSON, including nested JSON 

Getting ready

How to do it…

How it works…

Chapter 3: Understanding Spark Query Execution

Technical requirements

Introduction to jobs, stages, and tasks

Getting ready

How to do it…

How it works…

Checking the execution details of all the executed Spark queries via the Spark UI

Getting ready

How to do it…

How it works…

Deep diving into schema inference

Getting ready

How to do it…

How it works…

There's more…

Looking into the query execution plan

Getting ready

How to do it…

How it works…

How joins work in Spark 

Getting ready

How to do it…

How it works…

There's more…

Learning about input partitions 

Getting ready

How to do it…

How it works…

Learning about output partitions 

Getting ready

How to do it…

How it works…

Learning about shuffle partitions

Getting ready

How to do it…

How it works…

Storage benefits of different file types

Getting ready

How to do it…

How it works…

Chapter 4: Working with Streaming Data

Technical requirements

Reading streaming data from Apache Kafka

Getting ready

How to do it…

How it works…

Reading streaming data from Azure Event Hubs

Getting ready

How to do it…

How it works…

Reading data from Event Hubs for Kafka

Getting ready

How to do it…

How it works…

Streaming data from log files

Getting ready

How to do it…

How it works…

Understanding trigger options

Getting ready

How to do it…

How it works…

Understanding window aggregation on streaming data

Getting ready

How to do it…

How it works…

Understanding offsets and checkpoints

Getting ready

How to do it…

How it works…

Chapter 5: Integrating with Azure Key Vault, App Configuration, and Log Analytics

Technical requirements

Creating an Azure Key Vault to store secrets using the UI

Getting ready

How to do it…

How it works…

Creating an Azure Key Vault to store secrets using ARM templates

Getting ready

How to do it…

How it works…

Using Azure Key Vault secrets in Azure Databricks

Getting ready

How to do it…

How it works…

Creating an App Configuration resource

Getting ready

How to do it…

How it works…

Using App Configuration in an Azure Databricks notebook

Getting ready

How to do it…

How it works…

Creating a Log Analytics workspace

Getting ready

How to do it…

How it works…

Integrating a Log Analytics workspace with Azure Databricks

Getting ready

How to do it…

How it works…

Chapter 6: Exploring Delta Lake in Azure Databricks

Technical requirements

Delta table operations – create, read, and write

Getting ready

How to do it…

How it works…

There's more…

Streaming reads and writes to Delta tables

Getting ready

How to do it…

How it works…

Delta table data format

Getting ready

How to do it…

How it works…

There's more…

Handling concurrency

Getting ready

How to do it…

How it works…

Delta table performance optimization

Getting ready

How to do it…

How it works…

Constraints in Delta tables

Getting ready

How to do it…

How it works…

Versioning in Delta tables

Getting ready

How to do it…

How it works…

Chapter 7: Implementing Near-Real-Time Analytics and Building a Modern Data Warehouse

Technical requirements

Understanding the scenario for an end-to-end (E2E) solution

Getting ready

How to do it…

How it works…

Creating required Azure resources for the E2E demonstration

Getting ready

How to do it…

How it works…

Simulating a workload for streaming data

Getting ready

How to do it…

How it works…

Processing streaming and batch data using Structured Streaming

Getting ready

How to do it…

How it works…

Understanding the various stages of transforming data

Getting ready

How to do it…

How it works…

Loading the transformed data into Azure Cosmos DB and a Synapse dedicated pool

Getting ready

How to do it…

How it works…

Creating a visualization and dashboard in a notebook for near-real-time analytics

Getting ready

How to do it…

How it works…

Creating a visualization in Power BI for near-real-time analytics

Getting ready

How to do it…

How it works…

Using Azure Data Factory (ADF) to orchestrate the E2E pipeline

Getting ready

How to do it…

How it works…

Chapter 8: Databricks SQL

Technical requirements

How to create a user in Databricks SQL

Getting ready

How to do it…

How it works…

Creating SQL endpoints

Getting ready

How to do it…

How it works…

Granting access to objects to the user

Getting ready

How to do it…

How it works…

Running SQL queries in Databricks SQL

Getting ready

How to do it…

How it works…

Using query parameters and filters

Getting ready

How to do it…

How it works…

Introduction to visualizations in Databricks SQL

Getting ready

How to do it…

Creating dashboards in Databricks SQL

Getting ready

How to do it…

How it works…

Connecting Power BI to Databricks SQL

Getting ready

How to do it…

Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks

Technical requirements

How to integrate Azure DevOps with an Azure Databricks notebook

Getting ready

How to do it…

Using GitHub for Azure Databricks notebook version control

Getting ready

How to do it…

How it works…

Understanding the CI/CD process for Azure Databricks

Getting ready

How to do it…

How it works…

How to set up an Azure DevOps pipeline for deploying notebooks

Getting ready

How to do it…

How it works…

Deploying notebooks to multiple environments

Getting ready

How to do it…

How it works…

Enabling CI/CD in an Azure DevOps build and release pipeline

Getting ready

How to do it…

Deploying an Azure Databricks service using an Azure DevOps release pipeline

Getting ready

How to do it…

Chapter 10: Understanding Security and Monitoring in Azure Databricks

Technical requirements

Understanding and creating RBAC in Azure for ADLS Gen-2

Getting ready

How to do it…

Creating ACLs using Storage Explorer and PowerShell

Getting ready

How to do it…

How it works…

How to configure credential passthrough

Getting ready

How to do it…

How to restrict data access to users using RBAC

Getting ready

How to do it…

How to restrict data access to users using ACLs

Getting ready

How to do it…

Deploying Azure Databricks in a VNet and accessing a secure storage account

Getting ready

How to do it…

There's more…

Using Ganglia reports for cluster health

Getting ready

How to do it…

Cluster access control

Getting ready

How to do it…

Why subscribe?

Other Books You May Enjoy