41,99 €
Develop and manage effective real-time streaming solutions by leveraging the power of Microsoft Azure
If you are looking for a resource that teaches you how to process continuous streams of data in real-time, this book is what you need. A basic understanding of the concepts in analytics is all you need to get started with this book
Microsoft Azure is a very popular cloud computing service used by many organizations around the world. Its latest analytics offering, Stream Analytics, allows you to process and get actionable insights from different kinds of data in real-time.
This book is your guide to understanding the basics of how Azure Stream Analytics works, and building your own analytics solution using its capabilities. You will start with understanding what Stream Analytics is, and why it is a popular choice for getting real-time insights from data. Then, you will be introduced to Azure Stream Analytics, and see how you can use the tools and functions in Azure to develop your own Streaming Analytics. Over the course of the book, you will be given comparative analytic guidance on using Azure Streaming with other Microsoft Data Platform resources such as Big Data Lambda Architecture integration for real time data analysis and differences of scenarios for architecture designing with Azure HDInsight Hadoop clusters with Storm or Stream Analytics. The book also shows you how you can manage, monitor, and scale your solution for optimal performance.
By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution that can work with any type of data.
A comprehensive guidance on developing real-time event processing with Azure Stream Analysis
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 231
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2017
Production reference: 1201117
ISBN 978-1-78839-590-8
www.packtpub.com
Authors
Anindita Basak Krishna Venkataraman Ryan Murphy Manpreet Singh
Copy Editors
Tasneem Fatehi
Safis Editing
Reviewers
Ryan Murphy Richard Iwasa Vandana Saini
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Pratik Shirodkar
Content Development Editor
Snehal Kolte
Graphics
Tania Dutta
Technical EditorSayli Nikalje
Production Coordinator
Shantanu Zagade
Anindita Basak is working as big data and cloud architect for a computer software giant and has been working with Microsoft Azure for the last 8+ years. She has worked with various teams of Microsoft as an FTE in the role of Azure Development Support Engineer, Pro-Direct Delivery Manager, Partner Technical Consultant. She has been a technical reviewer on five books by Packt on Azure HDInsight, SQL Server business intelligence, Hadoop development, smart learning with the Internet of Things (IoT), and decision science. Recently, she authored two video courses on Azure stream analytics by Packt. More details about her can be found on her LinkedIn profile at https://www.linkedin.com/in/aninditabasak.
Krishna Venkataraman is a cloud solution architect working with Microsoft. He has worked with a large public sector and Finserv customers around the world, building and deploying innovative solutions to solve their business challenges through technology and business process change. Currently, Krishna is helping Finserv and Telco with their cloud journey.
More details about his can be found on his LinkedIn profile: https://www.linkedin.com/in/krishnavenk/v.
Ryan Murphy is a solution architect living in Saint Louis, Missouri, USA. He has been building and innovating with data for nearly 20 years, including extensive work in the gaming and agriculture industries. Currently, Ryan is helping some of the world’s largest organizations modernize their business with data solutions powered by Microsoft Azure Cloud.
Manpreet Singh is a consultant and author with extensive expertise in architecture, design, and implementation of Business Intelligence and big data analytics solutions. He is passionate about enabling businesses to derive valuable insights from their data.Manpreet has been working on Microsoft technologies for more than 10 years, with a strong focus on Microsoft Business Intelligence Stack, SharePoint BI, and Microsoft’s Big Data Analytics Platforms (Analytics Platform System and HDInsight). He also specializes in Mobile Business Intelligence solution development and has helped businesses deliver a consolidated view of their data to their mobile workforces.Manpreet has coauthored books and technical articles on Microsoft technologies, focusing on the development of data analytics and visualization solutions with the Microsoft BI Stack and SharePoint. He holds a degree in computer science and engineering from Punjab University, India.
Ryan Murphy, is a solution architect living in Saint Louis, Missouri. He has been building and innovating with data for nearly twenty years, including extensive work in the gaming and agriculture industries. Currently, Ryan is helping some of the world’s largest organizations modernize their business with data solutions powered by the Microsoft Azure Cloud.
Richard Iwasa, is currently working as a cloud solution architect with Microsoft and has been using Microsoft Azure since its inception. He has worked in IT and consulting for over 20 years across multiple industries, including transportation, resources, telecom, and financial services. Richard is passionate about using the cloud to solve his customers' business and technical challenges.
Vandana Saini (Vinnie), is an experienced data scientist with extensive experience in data analytics, consulting, information systems, and managing corporate clients for IT and financial institutions. She has worked with major Canadian banks and cloud service providers in the areas of artificial intelligence. Lately, she has been sharing her expertise as a cloud solution architect with Microsoft partners, augmenting AI capabilities towards transforming various industries.
Besides, she has delivered keynotes at various events for big data and analytics.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788395905.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Introducing Stream Processing and Real-Time Insights
Understanding stream processing
Understanding queues, Pub/Sub, and events
Queues
Publish and Subscribe model
Real-world implementations of the Publish/Subscribe model
Azure implementation of queues and Publish/Subscribe models
What is an event?
Event streaming
Event correlation
Azure implementation of event processing
Architectural components of Event Hubs
Simple event processing
Event stream processing
Complex event processing
Summary
Introducing Azure Stream Analytics and Key Advantages
Services offered by Microsoft
Introduction to Azure Stream Analytics
Configuration of Azure Stream Analytics
Key advantages of Azure Stream Analytics
Security
Programmer productivity
Declarative SQL constructs
Built-in temporal semantics
Lowest total cost of ownership
Mission-critical and enterprise-less scalability and availability
Global compliance
Microsoft Cortana Intelligence suite integration
Azure IoT integration
Summary
Designing Real-Time Streaming Pipelines
Differencing stream processing and batch processing
Logical flow of processing
Out of order and late arrival of data
Session grouping and windowing challenges
Message consistency
Fault tolerance, recovery, and storage
Source
Communication and collection
Ingest, queue, and transform
Hot path
Cold path
Data retention
Presentation and action
Canonical Azure architecture
Summary
Developing Real-Time Event Processing with Azure Streaming
Stream Analytics tools for Visual Studio
Prerequisites for the installation of Stream Analytics tools
Development of a Stream Analytics job using Visual Studio
Defining a Stream Analytics query for Vehicle Telemetry job analysis using Stream Analytics tools
Query to define Vehicle Telemetry (Connected Car) engine health status and pollution index over cities
Testing Stream Analytics queries locally or in the cloud
Stream Analytics job configuration parameter settings in Visual Studio
Implementation of an Azure Stream Analytics job using the Azure portal
Provisioning for an Azure Stream Analytics job using the Azure Resource Manager template
Azure ARM Template - Infrastructure as code
Getting started with provisioning Azure Stream Analytics job using the ARM template
Deployment and validation of the Stream Analytics ARM template to Azure Resource Group
Configuration of the Azure Streaming job with different input data sources and output data sinks
Data input types-data stream and reference data
Data Stream inputs
Reference data
Job topology output data sinks of Stream Analytics
Summary
Building Using Stream Analytics Query Language
Built-in functions
Scalar functions
Aggregate and analytic functions
Array functions
Other functions
Data types and formats
Complex types
Query language elements
Windowing
Tumbling windows
Hopping windows
Sliding windows
Time management and event delivery guarantees
Summary
How to achieve Seamless Scalability with Automation
Understanding parts of a Stream Analytics job definition (input, output, reference data, and job)
Deployment of Azure Stream Analytics using ARM template
Configuring input
Configuring output
Building the sample test code
How to scale queries using Streaming units and partitions
Application and Arrival Time
Partitions
Input source
Output source
Embarrassingly parallel jobs and Not embarrassingly parallel jobs
Sample use case
Configuring SU using Azure portal
Out of order and late-arriving events
Summary
Integration of Microsoft Business Intelligence and Big Data
What is Big Data Lambda Architecture?
Concepts of batch processing and stream processing in data analytics
Specifications for slow/cold path of data - batch data processing
Moving to the streaming-based data solution pattern
Evolution of Kappa Architecture and benefits
Comparison between Azure Stream Analytics and Azure HDInsight Storm
Designing data processing pipeline of an interactive visual dashboard through Stream Analytics and Power BI
Integrating Power BI as an output job connector for Stream Analytics
Summary
Designing and Managing Stream Analytics Jobs
Reference data streams with Azure Stream Analytics
Configuration of Reference data for Azure Stream Analytics jobs
Integrating a reference data stream as job topology input for an Azure Stream Analytics job
Stream Analytics query configuration for Reference Data join
Refresh schedule of a reference data stream
Configuration of output data sinks for Azure Stream Analytics with Azure Data Lake Store
Configuring Azure Data Lake Store as an output data sink of Stream Analytics
Configuring Azure Data Lake Store as an output sink of Stream Analytics jobs
Configuring Azure Cosmos DB as an output data sink for Azure Stream Analytics
Features of Azure Cosmos DB for configuring output sinks of Azure Stream Analytics
Configuring Azure Cosmos DB integrated with Azure Stream Analytics as an output sink
Stream Analytics job output to Azure Function Apps as Serverless Architecture
Provisioning steps to an Azure Function
Configuring an Azure function as a serverless architecture model integrated with Stream Analytics job output
Summary
Optimizing Intelligence in Azure Streaming
Integration of JavaScript user-defined functions using Azure Stream Analytics
Adding JavaScript UDF with a Stream Analytics job
Stream Analytics and JavaScript data type conversions
Integrating intelligent Azure machine learning algorithms with Stream Analytics function
Data pipeline Streaming application building concepts using Azure .NET Management SDK
Implementation steps of Azure Stream Analytics jobs using .NET management SDK
Summary
Understanding Stream Analytics Job Monitoring
Troubleshooting with job metrics
Visual monitoring of job diagram
Logging of diagnostics logs
Enabling diagnostics logs
Exploring the logs sent to the storage account
Configuring job alerts
Viewing resource health information with Azure resource health
Exploring different monitoring experiences
Building a monitoring dashboard
Summary
Use Cases for Real-World Data Streaming Architectures
Solution architecture design and Proof-of-Concept implementation of social media sentiment analytics using Twitter and a sentiment analytics dashboard
Definition of sentiment analytics
Prerequisites required for the implementation of Twitter sentiment analytics PoC
Steps for implementation of Twitter sentiment analytics
Remote monitoring analytics using Azure IoT Suite
Provisioning of remote device monitoring analytics using Azure IoT Suite
Implementation of a connected factory use case using Azure IoT Suite
Connected factory solution with Azure IoT Suite
Real-world telecom fraud detection analytics using Azure Stream Analytics and Cortana Intelligence Gallery with interactive visuals from Microsoft Power BI
Implementation steps of fraud detection analytics using Azure Stream Analytics
Steps for building the fraud detection analytics solution
Summary
In this book, guidance will be provided for data architecture professionals, cloud architects, big data developers, and data scientists who would like to grab an end-to-end understanding of real-time complex streaming architecture. It's a comprehensive guidance on developing real-time event processing with Azure Stream Analysis.
And it's an implementation guidance for interactive data processing in fields such as the Internet of Things (IoT), social media, sensor data processing with BI, integration of streaming analytics with machine learning, and so on.
Chapter 1, Introducing Stream Processing and Real-Time Insights, describes a paradigm shift that is underway in data processing, from a legacy of handling static data in batches to handling continuously moving data in streams. We explore the fundamental architectural concepts of stream processing as well as its benefits in Real-Time Insights.
Chapter 2, Introducing Azure Stream Analytics and Key Advantages, introduces Microsoft's Azure Stream Analytics, a real-time analytics service built for the stream processing era. We walk through a basic Stream Analytics job configuration and then discuss its key features that drive down the total cost of ownership of streaming solutions.
Chapter 3, Designing Real-Time Streaming Pipeline, discusses the components of stream processing pipelines and how they differ from traditional batch pipelines, including temporal concepts such as windowing, hot and cold paths of data movement, and others. To see how streaming design concepts can be applied to a technical architecture, we then look at the canonical Azure streaming pipeline from data generation to intelligent action.
Chapter 4, Developing Real-Time Event Processing with Azure Streaming, covers various tools for provisioning a Stream Analytics job. The integration steps of job input and output are demonstrated.
Chapter 5, Building Using Stream Analytics Query Language, explores the SQL-like query language used in Azure Stream Analytics to run transformations and computations on streaming data. Common and complex stream processing requirements can be met with straightforward queries.
Chapter 6, How to achieve Seamless Scalability with Automation, covers deploying at the enterprise-grade with features and patterns for scaling and deployment automation. After demonstrating automated deployment using Azure Resource Manager (ARM), we explore vertical and horizontal partitioning and scaling in Stream Analytics to increase job capacity and performance.
Chapter 7, Integration of Microsoft Business Intelligence and Big Data, discusses the modern data solution architectures Lambda and Kappa, how to use Stream Analytics to comport with these architectures, and compare it with a popular alternative, HDInsight Storm. We then walk through a sample pipeline, implementing a real-time dashboard based on the Power BI output connector for Stream Analytics.
Chapter 8, Designing and Managing Stream Analytics Jobs, explore solutions to complex challenges of managing streaming jobs, starting with the common need to integrate streams with static data. We then discuss integration with Azure Data Lake Store and Cosmos DB as examples of Azure services whose native integration with Stream Analytics offers unique opportunities to enhance streaming pipelines.
Chapter 9, Optimizing Intelligence in Azure Streaming, discusses building intelligence directly into Stream Analytics jobs so that extensible functions and machine learning calls execute in real time as data moves. We cover integration with the Azure Machine Learning service and implementing user-defined JavaScript functions in Stream Analytics queries. Finally, we walk through using the Azure .NET SDK to enhance job management.
Chapter 10, Understanding Stream Analytics Job Monitoring, looks into ongoing maintenance and job management. We discuss and demonstrate the job metrics, diagram, and logging features offered by Stream Analytics, as well as service health dashboarding and alerting.
Chapter 11, Use Cases of Real-World Data Streaming Architectures, is an end-to-end real-life use case demonstration using the Azure IoT suite with Stream Analytics with implementation steps as PoC for Social Sentiment Analytics, IoT Remote Monitoring telemetry solution, connected factory, and PoC on fraud detection Analytics from the telecom industry.
A valid Azure subscription
Visual Studio 2017/2015
Azure SDK 2.7.1 or higher
Azure Storage Explorer
A Power BI Office 365 account
Python SDK 2.7 (x64) bit and packages
If you are looking for a resource that teaches you how to process continuous streams of data in real time, this book is what you need. A basic understanding of the concepts of analytics is all you need to get started with this book.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title on the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Stream-Analytics-with-Microsoft-Azure. We also have other code bundles from our rich catalogue of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/StreamAnalyticswithMicrosoftAzure_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
The popularity of stream data platforms is increasing significantly in recent times. Due to the requirement of real-time access to information. Enterprises are transitioning parts of their data infrastructure to a streaming paradigm due to changing business needs. The streaming model presents a significant shift by moving from point queries against stationary data to a standing temporal query that consumes moving data. Fundamentally, we enable insight on the data before it is stored in the analytics repository. This introduces a new paradigm in thinking. Before going deep into stream processing, we have to cover a couple of key basic concepts related to events and stream. In this chapter, we'll explore the basics of the following points:
Publish/Subscribe (Pub/Sub)
Stream processing
Real-Time Insights
The core theme of this book is the Azure Streaming Service. Before diving deeper into Azure Streaming Service, we should take a moment to consider why we need stream processing, or Real-Time Insights, and why it is a tool worth adding to your repertoire.
So what is stream processing and why is it important? In traditional data processing, data is typically processed in batch mode. The data will be dealt with on a regular schedule. One fundamental challenge with conventional data processing is it's inherently reactive because it focuses on ageing information. Stream processing, on the other hand, processes data as it flows through in real time.
The following are some of the highlights of why stream processing is critical:
Response time is critical
:
Reducing decision latency can unlock business value
Need to ask questions about data in motion
Can't wait for data to get to rest before running computation
Actions by human actors
:
See and seize insights
Live visualization
Alerts and alarms
Dynamic aggregation
Machine-to-machine interactions
:
Data movement with enrichment
Kick-off workflows for automation
Before one goes into stream analytics, it is essential to understand the core basics around events and different models of publishing and consuming events. Let's get more familiar with queues, Pub/Sub, and events, which will surely help you understand the later chapters better. In the following sections, we will explore queues, Pub/Sub, and events.
In this section, we will review two key concepts—queues and Publish/Subscribe models, followed by event-based messaging models.
A queue implements a one-way communication, where the sender places a message on the queue and a receiver will collect the message asynchronously. Features such as dead letter queues, paired namespaces, active/passive replication, and auto-forwarding to a chain queue that's part of the same name provide the rich feature set for message flowing between an application and providing a highly available solution.
A queue consists of three key elements:
Sender: Sends the message to the receiver through a durable entity.
Durable entity: Stores the received
durable
message and offers persistence. The messages are stored until they are collected by the receiver.
Receiver: The final recipient of the message.
The key advantages of a queue are as follows:
Queues operate on the principle of
first in
,
first out
(
FIFO
): For example, consider a simple queue where, at one end, you put messages, and on the other end you will receive them in the same respective order. For example, service bus queue implements the FIFO pattern.
Point-to-point: The fundamental concept of Queues is, they are point-to-point messaging; even though there may be multiple senders of messages, there is only one receiver of the messages.
Asynchronous communication: This implies that endpoint addresses are connected directly. A static structure may exist where senders and receivers communicate through named channels. Asynchronous communication helps with building decoupled architecture and allows higher resilience to add and process messages when either the publisher or consumer of messages has downtime.
