Cloud Analytics with Microsoft Azure - Has Altaiar - E-Book

Cloud Analytics with Microsoft Azure E-Book

Has Altaiar

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Cloud Analytics with Microsoft Azure serves as a comprehensive guide for big data analysis and processing using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.
The book begins by introducing you to the power of data with big data analytics, the Internet of Things (IoT), machine learning, artificial intelligence, and DataOps. You will learn about cloud-scale analytics and the services Microsoft Azure offers to empower businesses to discover insights. You will also be introduced to the new features and functionalities added to the modern data warehouse.
Finally, you will look at two real-world business use cases to demonstrate high-level solutions using Microsoft Azure. The aim of these use cases will be to illustrate how real-time data can be analyzed in Azure to derive meaningful insights and make business decisions. You will learn to build an end-to-end analytics pipeline on the cloud with machine learning and deep learning concepts.
By the end of this book, you will be proficient in analyzing large amounts of data with Azure and using it effectively to benefit your organization.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 207

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Cloud Analytics with Microsoft Azure, Second Edition

Transform your business with the power of analytics in Azure

Has Altaiar, Jack Lee, and Michael Peña

Cloud Analytics with Microsoft Azure, Second Edition

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Authors: Has Altaiar, Jack Lee, and Michael Peña

Technical Reviewer: Aaditya Pokkunuri

Managing Editors: Aditya Datar and Neha Pande

Acquisitions Editor: Ben Renow-Clarke

Production Editor: Deepak Chavan

Editorial Board: Vishal Bodwani, Ben Renow-Clarke, Arijit Sarkar, Dominic Shakeshaft, and Lucy Wan

First Published: October 2019

Production Reference: 2270121

ISBN: 978-1-80020-243-6

Published by Packt Publishing Ltd.

Livery Place, 35 Livery Street

Birmingham B3 2PB, UK

Table of Contents

Preface   i

1. Introducing analytics on Azure   1

The power of data   3

Big data analytics   4

Internet of Things (IoT)   5

Machine learning   6

Artificial intelligence (AI)   7

DataOps   8

Why Microsoft Azure?   9

Security   11

Cloud scale   12

Top business drivers for adopting data analytics in the cloud   14

Rapid growth and scale   14

Reducing costs   15

Driving innovation   16

Why do you need a modern data warehouse?   16

Bringing your data together   18

Creating a data pipeline   21

Data ingestion   22

Data storage   22

Data pipeline orchestration and monitoring   22

Data sharing   22

Data preparation   23

Data transform, predict, and enrich   23

Data serve   23

Data visualization   24

Smarter applications   26

Summary   26

2. Introducing the Azure Synapse Analytics workspace and Synapse Studio   27

What is Azure Synapse Analytics?   28

Why do we need Azure Synapse Analytics?   29

Customer challenges   30

Azure Synapse Analytics to the rescue   30

Deep dive into Azure Synapse Analytics   32

Introducing the Azure Synapse Analytics workspace   33

Free Azure account   33

Quickstart guide   34

Introducing Synapse Studio   41

Launching Synapse Studio   41

Provisioning a dedicated SQL pool   43

Exploring data in the dedicated SQL pool   48

Creating an Apache Spark pool   50

Integrating with pipelines   60

The Monitor hub   64

Summary   65

3. Processing and visualizing data   67

Power BI   68

Features and benefits   69

Power BI and Azure Synapse Analytics   70

Features and benefits   71

Quick start guide (Data modeling and visualization)   72

Machine learning on Azure   89

ML.NET   91

Automated machine learning    91

Cognitive services   92

Bot framework   92

Azure Machine Learning features and benefits   93

Software Development Kit (SDK)   94

Designer   94

AutoML   94

Flexible deployment targets   94

Accelerated Machine Learning Operations (MLOps)   94

Azure Machine Learning and Azure Synapse Analytics   96

Quick start guide (Azure Machine Learning)   96

Prerequisites   96

Creating a machine learning model using Designer   99

Summary   103

4. Business use cases   105

Use case 1: Real-time customer insights with Azure Synapse Analytics   106

The problem   106

Capturing and processing new data   107

Bringing all the data together   107

Finding insights and patterns in data   108

Real-time discovery   108

Design brainstorming   110

Data ingestion   110

Data storage   111

Data science   112

Dashboards and reports   112

The solution   112

Data flow   113

Azure services   114

Azure Data Lake Storage Gen2   115

Azure Synapse Analytics   117

Azure Synapse Hybrid Integration (Pipelines)    118

Power BI   124

Azure supporting services   126

Insights and actions   128

Reducing waste by 18%   129

Social media trends drive sales up by 14%   130

Conclusion   131

Use case 2: Using advanced analytics on Azure to create a smart airport   132

The problem   132

Business challenges   132

Technical challenges   134

Design brainstorming   136

Data sources   136

Data storage   137

Data ingestion   137

Security and access control   138

Discovering patterns and insights   138

The solution   138

Why Azure for NIA?   138

Solution architecture   140

Azure services   142

Azure Synapse Analytics   142

Azure Cosmos DB   144

Azure Machine Learning    146

Azure Container Registry   149

Azure Kubernetes Service   150

Power BI   152

Supporting services   154

Insights and actions   154

Reducing flight delays by 17% using predictive analytics   154

Reducing congestion and improving retail using smart visualization   155

Conclusion   156

5. Conclusion   157

Final words   160

For further learning   160

Index   161

Preface

About

This section briefly introduces the authors and reviewer, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements needed to complete all of the activities and exercises.

About Cloud Analytics with Microsoft Azure, Second Edition

Cloud Analytics with Microsoft Azure serves as a comprehensive guide for big data analysis and processing using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

The book begins by introducing you to the power of data with big data analytics, the Internet of Things (IoT), machine learning, artificial intelligence, and DataOps. You will learn about cloud-scale analytics and the services Microsoft Azure offers to empower businesses to discover insights. You will also be introduced to the new features and functionalities added to the modern data warehouse.

Finally, you will look at two real-world business use cases to demonstrate high-level solutions using Microsoft Azure. The aim of these use cases will be to illustrate how real-time data can be analyzed in Azure to derive meaningful insights and make business decisions. You will learn to build an end-to-end analytics pipeline on the cloud with machine learning and deep learning concepts.

By the end of this book, you will be proficient in analyzing large amounts of data with Azure and using it effectively to benefit your organization.

About the authors

Has Altaiar is a software engineer at heart and a consultant by trade. Has lives in Melbourne, Australia, and is the Executive Director at vNext Solutions. His work focuses on data, IoT, and AI on Microsoft Azure, and two of his latest IoT projects won multiple awards. Has is a Microsoft Azure MVP and a regular organizer and speaker at local and international conferences, including Microsoft Ignite, NDC, and ServerlessDays. He's also a board member of the Global AI Community. You can follow him on Twitter at @hasaltaiar.

Jack Lee is a senior Azure certified consultant and an Azure practice lead with a passion for software development, cloud, and DevOps innovations. He is an active Microsoft tech community contributor and has presented at various user groups and conferences, including the Global Azure Bootcamp at Microsoft Canada. Jack is an experienced mentor and judge at hackathons and is also the president of a user group that focuses on Azure, DevOps, and software development. He is the co-author of Azure for Architects and Cloud Analytics with Microsoft Azure, published by Packt Publishing. He has been recognized as a Microsoft MVP for his contributions to the tech community. You can follow Jack on Twitter at @jlee_consulting.

Michael Peña is an experienced technical consultant based in Sydney, Australia. He is a Microsoft MVP and a certified professional with over 10 years of experience in data, mobile, cloud, web, and DevOps. Throughout these years, he wore various hats but considered himself a developer at heart. He is also an international speaker, having spoken at numerous events, including Microsoft Ignite, NDC, DDD, Cross-Platform Summit, and various in-person and virtual meet-ups. Michael has interned with Microsoft and is also a Microsoft student partner alumnus. You can follow him on Twitter at @mjtpena. 

About the reviewer

Aaditya Pokkunuri is an experienced senior database engineer with a history of working in the information technology and services industry; he has a total of 11 years of experience. He is skilled in performance tuning, MS SQL Database server administration, SSIS, SSRS, Power BI, and SQL development.

He possesses strong knowledge about replication, clustering, SQL Server high availability options, and ITIL processes, as well as expertise in Windows administration tasks, Active Directory, and Microsoft Azure technologies.

He also has expertise in AWS Cloud and is an AWS Solution Architect Associate. Aaditya is a strong information technology professional with a Bachelor of Technology degree focused on computer science and engineering from Sastra University, Tamil Nadu.

Learning objectives

Explore the concepts of modern data warehouses and data pipelines Discover unique design considerations while applying a cloud analytics solution Design an end-to-end analytics pipeline on the cloud Differentiate between structured, semi-structured, and unstructured data Choose a cloud-based service for your data analytics solutions Use Azure services to ingest, store and analyze data of any scale 

Audience

This book is designed to benefit software engineers, Azure developers, cloud consultants, and anyone who is keen to learn the process of deriving business insights from huge amounts of data using Azure.

Though not necessary, a basic understanding of data analytics concepts such as data streaming, data types, the machine learning life cycle, and Docker containers will help you get the most out of the book.

Approach

Cloud Analytics with Microsoft Azure introduces complex concepts with real-world examples so that you get hands-on experience while also understanding the fundamentals. The book contains numerous quick-start guides that enable you to learn faster. 

Hardware and software requirements

Hardware requirements

For the optimal student experience, we recommend the following hardware configuration:

Memory: Minimum 4 GB RAMDisplay: Minimum 1440x900 or 1600x900 (16:9) recommended CPU: 1 gigahertz (GHz) or faster x86- or x64-bit processor recommended

Software requirements

We also recommend that you have the following software configuration in advance:

Windows 10 latest version or Windows Server latest versionAzure subscription. You can set up a free Azure account at https://azure.microsoft.com/free/synapse-analytics/Microsoft Edge latest version

Conventions

Code words in the text, database names, folder names, filenames, and file extensions are shown as follows.

The following code snippet makes use of the Azure SQL Database linked service to create a dataset that references sales_table in Coolies' SQL Database:

{

   "name": "CooliesSalesDataset",

   "properties":

   {

      "type": "AzureSqlTable",

      "linkedServiceName": {

         "referenceName": "CooliesSalesAzureSqlDbLS",

         "type": "LinkedServiceReference"

      },

      "schema": [ {optional} ],

         "typeProperties": {

            "tableName": "sales_table"

         }

   }

}

Installation and setup

You can install Power BI desktop (https://packt.live/37hUTmK) and start creating interactive reports.

1. Introducing analytics on Azure

According to a survey by Dresner Advisory Service in 2019, an all-time high of 48% of organizations say business intelligence in the cloud is either critical or very important in conducting their business operations. The Cloud Computing and Business Intelligence Market Study also showed that sales and marketing teams get the most value out of analytics.

As businesses grow, they generate massive amounts of data every day. This data comes from different sources, such as mobile phones, the Internet of Things (IoT) sensors, and various Software as a Service (SaaS) products such as Customer Relationship Management (CRM) systems. Enterprises and businesses need to scale and modernize their data architecture and infrastructure in order to cope with the demand to stay competitive in their respective industries.

Having cloud-scale analytics capabilities is the go-to strategy for achieving this growth. Instead of managing your own data center, harnessing the power of the cloud allows your businesses to be more accessible to your users. With the help of a cloud service provider such as Microsoft Azure, you can accelerate your data analytics practice without the limitations of your IT infrastructure. The game has changed in terms of maintaining IT infrastructures, as data lakes and cloud data warehouses are capable of storing and maintaining massive amounts of data.

Simply gathering data does not add value to your business; you need to derive insights from it and help your business grow using data analytics, or it will just be a data swamp. Azure is more than just a hub for gathering data; it is an invaluable resource for data analytics. Data analytics provides you with the ability to understand your business and customers better. By applying various data science concepts, such as ML, regression analysis, classification algorithms, and time series forecasting, you can test your hypotheses and make data-driven decisions for the future. However, one of the challenges that organizations continuously face is how to derive these analytical modeling capabilities quickly when processing billions of data rows. This is where having a modern data warehouse and data pipeline can help (more on this in the next sections).

There are a number of ways in which data analytics can help your business thrive. In the case of retail, if you understand your customers better, you will have a better idea of what products you should sell, where to sell them, when to sell them, and how to sell them. In the financial sector, data analytics is helping authorities fight crime by detecting fraudulent transactions and providing more informed risk assessments based on historical criminal intelligence.

This chapter will cover fundamental topics on the power of data with:

Big data analyticsIoT Machine Learning (ML)Artificial Intelligence (AI)DataOps

You will also learn why Microsoft Azure is the platform of choice for performing analytics on the cloud. Lastly, you will study the fundamental concepts of a modern data warehouse and data pipelines.

The power of data

As a consumer, you have seen how the advent of data has influenced our activities in the daily grind. Most popular entertainment applications, such as YouTube, now provide a customized user experience with features such as video recommendations based on our interests and search history logging information. It is now child's play to discover new content that's similar to our preferred content, and also to find new and popular trending content.

Due to the major shift in wearable technology, it has also become possible to keep track of our health statistics by monitoring heart rates, blood pressure, and so on. These devices then formulate a tailored recommendation based on the averages of these statistics. But these personalized health stats are only a sample of the massive data collection happening every day on a global scale, to which we actively contribute.

Millions of people all over the world use social networking platforms and search engines every day. Internet giants such as Facebook, Instagram, and Google use clickstream data to come up with innovations and improve their services. Data collection is also carried out extensively under projects such as The Great Elephant Census and eBird that aim to boost wildlife conservation. Data-driven techniques have been adopted for tiger conservation projects in India. It even plays an invaluable role in global efforts to compile evidence, causes, and possible responses to climate change—to understand sea surface temperature, analyze natural calamities such as coastal flooding, and highlight global warming patterns in a collective effort to save the ecosystem.

Organizations such as Global Open Data for Agriculture and Nutrition (GODAN), which can be used by farmers, ranchers, and consumers alike, contribute to this tireless data collection as well.

Furthermore (as with the advent of wearable technology), data analysis is contributing to pioneering advancements in the healthcare sector. Patient datasets are analyzed to identify patterns and early symptoms of diseases in order to divine better solutions to known problems.

The scale of data being talked about here is massive—hence, the popular term big data is used to describe the harnessing power of this data at scale.

Note

You can read more about open data https://www.data.gov/.

Big data analytics

The term "big data" is often used to describe massive volumes of data that traditional tools cannot handle. It can be characterized by the five Vs:

Volume: This indicates the volume of data that needs to be analyzed for big data analytics. We are now dealing with larger datasets than ever before. This has been made possible because of the availability of electronic products such as mobile devices and IoT sensors that have been widely adopted all over the globe for commercial purposes.Velocity: This refers to the rate at which data is being generated. Devices and platforms, such as those just mentioned, constantly produce data on a large scale and at rapid speed. This makes collecting, processing, analyzing, and serving data at rapid speeds necessary.Variety: This refers to the structure of data being produced. Data sources are inconsistent, having a mix of structured, unstructured, and some semi-structured data (you will learn more about this in the Bringing your data together section).Value: This refers to the value of the data being extracted. Accessible data may not always be valuable. With the right tools, you can derive value from the data in a cost-effective and scalable way.Veracity: This is the quality or trustworthiness of data. A raw dataset will usually contain a lot of noise (or data that needs cleaning) and bias and will need cleaning. Having a large dataset is not useful if most of the data is not accurate.

Big data analytics is the process of finding patterns, trends, and correlations in unstructured data to derive meaningful insights that shape business decisions. This unstructured data is usually large in file size (images, videos, and social graphs, for instance).

This does not mean that relational databases are not relevant for big data. In fact, modern data warehouse platforms such as Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse) support structured and semi-structured data (such as JSON) and can infinitely scale to support terabytes to petabytes of data. Using Microsoft Azure, you have the flexibility to choose any platform. These technologies can complement each other to achieve a robust data analytics pipeline.

Here are some of the best use cases of big data analytics:

Social media analysis: Through social media sites such as Twitter, Facebook, and Instagram, companies can learn what customers are saying about their products and services. Social media analysis helps companies to target their audiences by utilizing user preferences and market trends. The challenges here are the massive amount of data and the unstructured nature of tweets and posts.Fraud prevention: This is one of the most familiar use cases of big data. One of the prominent features of big data analytics when used for fraud prevention is the ability to detect anomalies in a dataset. Validating credit card transactions by understanding transaction patterns such as location data and categories of purchased items is an example of this. The biggest challenge here is ensuring that the AI/ML models are clean and unbiased. There might be a chance that the model was trained just for a specific parameter, such as a user's country of origin, hence the model will focus on determining patterns on just the user's location and might miss out on other parameters.Price optimization: Using big data analytics, you can predict what price points will yield the best results based on historical market data. This allows companies to ensure that they do not price their items too high or too low. The challenge here is that many factors can affect prices. Focusing on just a specific factor, such as a competitor's price, might eventually train your model to just focus on that area, and may disregard other factors such as weather and traffic data.

Big data for businesses and enterprises is usually accompanied by the concept of having an IoT infrastructure, where hundreds, thousands, or even millions of devices are connected to a network that constantly sends data to a server.

Internet of Things (IoT)

IoT plays a vital role in scaling your application to go beyond your current data sources. IoT is simply an interconnection of devices that are embedded to serve a single purpose in objects around us to send and receive data. IoT allows us to constantly gather data about "things" without manually encoding them into a database.

A smartwatch is a good example of an IoT device that constantly measures your body's vital signs. Instead of getting a measuring device and encoding it to a system, a smartwatch allows you to record your data automatically. Another good example is a device tracker for an asset that captures location, temperature, and humidity information. This allows logistics companies to monitor their items in transit, ensuring the quality and efficiency of their services.

At scale, these IoT devices generate anywhere from gigabytes to terabytes of data. This data is usually stored in a data lake in a raw, unstructured format, and is later analyzed to derive business insights. A data lake is a centralized repository of all structured, semi-structured, and unstructured data. In the example of the logistic company mentioned previously, patterns (such as the best delivery routes) could be generated. The data could also be used to understand anomalies such as data leakage or suspected fraudulent activities.

Machine learning

As your data grows in size, it opens a lot of opportunities for businesses to go beyond understanding business trends and patterns. Machine learning and artificial intelligence are examples of innovations that you can exploit with your data. Building your artificial intelligence and ML capabilities is relatively easy now because of the availability of the requisite technologies and the ability to scale your storage and compute on the cloud.

Machine learning and artificial intelligence are terms that are often mixed up. In a nutshell, machine learning is a subset (or application) of artificial intelligence. Machine learning aims to allow systems to learn from past datasets and adapt automatically without human assistance. This is made possible by a series of algorithms being applied to the dataset; the algorithm analyzes the data in near-real-time and then comes up with possible actions based on accuracy or confidence derived from previous experience.

The word "learning" indicates that the program is constantly learning from data fed to it. The aim of machine learning is to strive for accuracy rather than success. There are three main categories of machine learning algorithms: supervised, unsupervised, and reinforcement.

Supervised machine learning algorithms create a mapping function to map input variables with an output variable. The algorithm uses existing datasets to train itself to predict the output. Classification is a form of supervised ML that can be used in applications such as image categorization or customer segmentation, which is used for targeted marketing campaigns.

Unsupervised machine learning, on the other hand, is when you let a program find a pattern of its own without any labels. A good example is understanding customer purchase patterns when buying products. You get inherent groupings (clustering) according to purchasing behaviors, and the program can associate customers and products according to patterns of purchase. For instance, you may discern that customers who buy Product A tend to buy Product B too. This is an example of a user-based recommendation algorithm and market-based analysis. What it would eventually mean for users is that when they buy a particular item, such as a book, the user is also encouraged to buy other books that belong to the same series, genre, or category.

Reinforcement Learning (RL) provides meaningful insights and actions based on rewards and punishment. The main difference between this and supervised learning is that it does not need labeled input and output as part of the algorithm. An excellent example of this is the new financial trend for "robo-advisors." Robo-advisors run using agents that get rewarded and punished based on their stock performance (that is, gains and losses). In time, the agent can recognize whether to hold, buy, or sell stocks. This has been a game-changer because, in the past, analysts had to make every single decision; now most of the complicated data trends are already analyzed for you and analysts can choose to listen to the agent or not. However, financial trading is very complex given the nature of parameters present in the world, and so not all robo-advisors' predictions are accurate.

Artificial intelligence (AI)

Artificial intelligence extends beyond what machine learning can do. It is about making decisions and aiming for success rather than accuracy. One way to think of it is that machine learning aims to gain knowledge while artificial intelligence aims for wisdom or intelligence. An example of AI in action would be Boston Dynamic's Atlas robot, which can navigate freely in the open world and avoid obstacles without the aid of human control. The robot does not fully depend on the historical map data to navigate. However, for machine learning, it's about creating or predicting a pattern from historical data analysis. Similar to the robot's navigation, it is about understanding the most optimal route by creating patterns based on historical and crowd-sourced traffic data.