Artificial Intelligence for IoT Cookbook - Michael Roshak - E-Book

Artificial Intelligence for IoT Cookbook E-Book

Michael Roshak

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Artificial intelligence (AI) is rapidly finding practical applications across a wide variety of industry verticals, and the Internet of Things (IoT) is one of them. Developers are looking for ways to make IoT devices smarter and to make users’ lives easier. With this AI cookbook, you’ll be able to implement smart analytics using IoT data to gain insights, predict outcomes, and make informed decisions, along with covering advanced AI techniques that facilitate analytics and learning in various IoT applications.
Using a recipe-based approach, the book will take you through essential processes such as data collection, data analysis, modeling, statistics and monitoring, and deployment. You’ll use real-life datasets from smart homes, industrial IoT, and smart devices to train and evaluate simple to complex models and make predictions using trained models. Later chapters will take you through the key challenges faced while implementing machine learning, deep learning, and other AI techniques, such as natural language processing (NLP), computer vision, and embedded machine learning for building smart IoT systems. In addition to this, you’ll learn how to deploy models and improve their performance with ease.
By the end of this book, you’ll be able to package and deploy end-to-end AI apps and apply best practice solutions to common IoT problems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 253

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Artificial Intelligence for IoT Cookbook

 

 

 

 

 

Over 70 recipes for building AI solutions for smart homes, industrial IoT, and smart cities

 

 

 

 

 

 

 

 

Michael Roshak

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Artificial Intelligence for IoT Cookbook

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal ParikhPublishing Product Manager: Devika BattikeSenior Editor: David SugarmanContent Development Editor: Athikho Sapuni RishanaTechnical Editor: Manikandan KurupCopy Editor: Safis EditingProject Coordinator: Aishwarya MohanProofreader: Safis EditingIndexer: Rekha NairProduction Designer: Nilesh Mohite

First published: March 2021

Production reference: 1040221

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-83898-198-3

www.packt.com

Contributors

About the author

Michael Roshak is a cloud architect and strategist who has gained extensive subject matter expertise in enterprise cloud transformation programs and infrastructure modernization through designing and deploying cloud-oriented solutions and architectures. He is responsible for providing strategic advisory services for cloud adoption, consultative technical sales, and driving broad cloud services consumption with highly strategic accounts across multiple industries.

 

About the reviewer

Va Barbosa is a software engineer with the Qiskit Community at IBM, focused on building open source tools and creating educational content for developers, researchers, students, and educators in the field of quantum computing. Previously, Va was a developer advocate with the Center for Open Source Data and AI Technologies, where he helped developers to discover and make use of data science and machine learning technologies. He is fueled by his passion to help others and guided by his enthusiasm for open source technology.

Table of Contents

Title Page

Copyright and Credits

Artificial Intelligence for IoT Cookbook

Contributors

About the author

About the reviewer

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Get in touch

Reviews

Setting Up the IoT and AI Environment

Choosing a device

Dev kits  

Manifold 2-C with NVIDIA TX2

The i.MX series

LattePanda 

Raspberry Pi Class

Arduino

ESP8266 

Setting up Databricks

Storing data

Parquet

Avro

Delta Lake

Setting up IoT Hub

Getting ready

How to do it...

How it works...

Setting up an IoT Edge device

Getting ready

How to do it...

Configuring an IoT Edge device (cloud side)

Configuring an IoT Edge device (device side)

How it works...

Deploying ML modules to Edge devices

Getting ready

How to do it...

How it works...

There's more...

Setting up Kafka

Getting ready

How to do it...

How it works...

There's more...

Installing ML libraries on Databricks

Getting ready

How to do it...

Importing TensorFlow

Installing PyTorch

Installing GraphX and GraphFrames

How it works...

Handling Data

Storing data for analysis using Delta Lake

Getting ready

How to do it...

How it works...

Data collection design

Getting ready

How to do it...

Variance

Z-Spikes

Min/max

Windowing

Getting ready

How to do it...

Tumbling

Hopping

Sliding

How it works...

Exploratory factor analysis

Getting ready

How to do it...

Visual exploration

Chart types

Redundant sensors

Sample co-variance and correlation

How it works...

There's more...

Implementing analytic queries in Mongo/hot path storage

Getting ready

How to do it...

How it works...

Ingesting IoT data into Spark

Getting ready

How to do it...

How it works...

Machine Learning for IoT

Analyzing chemical sensors with anomaly detection

Getting ready

How to do it...

How it works...

There's more...

Logistic regression with the IoMT 

Getting ready

How to do it...

How it works...

There's more...

Classifying chemical sensors with decision trees

How to do it...

How it works...

There's more...

Simple predictive maintenance with XGBoost

Getting ready

How to do it...

How it works...

Detecting unsafe drivers

Getting ready

How to do it...

How it works...

There's more...

Face detection on constrained devices

Getting ready

How to do it...

How it works...

Deep Learning for Predictive Maintenance

Enhancing data using feature engineering

Getting ready

How to do it...

How it works...

There's more...

Using keras for fall detection

Getting ready

How to do it...

How it works...

There's more...

Implementing LSTM to predict device failure

Getting ready

How to do it...

How it works...

Deploying models to web services

Getting ready

How to do it...

How it works...

There's more...

Anomaly Detection

Using Z-Spikes on a Raspberry Pi and Sense HAT

Getting ready

How to do it...

How it works...

Using autoencoders to detect anomalies in labeled data

Getting ready

How to do it...

How it works...

There's more...

Using isolated forest for unlabeled datasets

Getting ready

How to do it...

How it works...

There's more...

Detecting time series anomalies with Luminol

Getting ready

How to do it...

How it works...

There's more...

Detecting seasonality-adjusted anomalies

Getting ready

How to do it...

How it works...

Detecting spikes with streaming analytics

Getting ready

How to do it...

How it works...

Detecting anomalies on the edge

Getting ready

How to do it...

How it works...

Computer Vision

Connecting cameras through OpenCV

Getting ready

How to do it...

How it works...

There's more...

Using Microsoft's custom vision to train and label your images

Getting ready

How to do it...

How it works...

Detecting faces with deep neural nets and Caffe

Getting ready

How to do it...

How it works...

Detecting objects using YOLO on Raspberry Pi 4

Getting ready

How to do it...

How it works...

Detecting objects using GPUs on NVIDIA Jetson Nano

Getting ready

How to do it...

How it works...

There's more...

Training vision with PyTorch on GPUs

Getting ready

How to do it...

How it works...

There's more...

NLP and Bots for Self-Ordering Kiosks

Wake word detection

Getting ready

How to do it...

How it works...

There's more...

Speech-to-text using the Microsoft Speech API

Getting ready

How to do it...

How it works...

Getting started with LUIS

Getting ready

How to do it...

How it works...

There's more...

Implementing smart bots

Getting ready

How to do it...

How it works...

There's more...

Creating a custom voice

Getting ready

How to do it...

How it works...

Enhancing bots with QnA Maker

Getting ready

How to do it...

How it works...

There's more...

Optimizing with Microcontrollers and Pipelines

Introduction to ESP32 with IoT 

Getting ready

How to do it...

How it works...

There's more...

Implementing an ESP32 environment monitor

Getting ready

How to do it...

How it works...

There's more...

Optimizing hyperparameters

Getting ready

How to do it...

How it works...

Dealing with BOM changes

Getting ready

How to do it...

How it works...

There's more...

Building machine learning pipelines with sklearn

Getting ready

How to do it...

How it works...

There's more...

Streaming machine learning with Spark and Kafka

Getting ready

How to do it...

How it works...

There's more...

Enriching data using Kafka's KStreams and KTables

Getting ready

How to do it...

How it works...

There's more...

Deploying to the Edge

OTA updating MCUs

Getting ready

How to do it...

How it works...

There's more...

Deploying modules with IoT Edge

Getting ready

Setting up our Raspberry Pi

Coding setup

How to do it...

How it works...

There's more...

Offloading to the web with TensorFlow.js

Getting ready

How to do it...

How it works...

There's more...

Deploying mobile models

Getting ready

How to do it...

How it works...

Maintaining your fleet with device twins

Getting ready

How to do it...

How it works...

There's more...

Enabling distributed ML with fog computing

Getting ready

How to do it...

How it works...

There's more...

About Packt

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Leave a review - let other readers know what you think

Preface

Artificial intelligence (AI) is rapidly finding practical applications across a wide variety of industry verticals, and the Internet of Things (IoT) is one of them. Developers are looking for ways to make IoT devices smarter and to make users' lives easier. With this AI cookbook, you'll learn how to implement smart analytics using IoT data to gain insights, predict outcomes, and make informed decisions, along with covering advanced AI techniques that facilitate analytics and learning in various IoT applications.

Using a recipe-based approach, the book will take you through essential processes such as data collection, data analysis, modeling, statistics and monitoring, and deployment. You'll use real-life datasets from smart homes, industrial IoT, and smart devices to train and evaluate simple and complex models and make predictions using trained models. Later chapters will take you through the key challenges faced while implementing machine learning, deep learning, and other AI techniques such as natural language processing (NLP), computer vision, and embedded machine learning to build smart IoT systems. In addition to this, you'll learn how to deploy models and improve their performance with ease.

By the end of this book, you'll be able to package and deploy end-to-end AI apps and apply best practice solutions to common IoT problems.

Who this book is for

If you're an IoT practitioner looking to incorporate AI techniques to build smart IoT solutions without having to trawl through a lot of AI theory, this AI IoT book is for you. Data scientists and AI developers who want to build IoT-focused AI solutions will also find this book useful. Knowledge of the Python programming language and basic IoT concepts is required to grasp the concepts covered in this AI book effectively.

What this book covers

Chapter 1, Setting Up the IoT and AI Environment, will focus on getting the right environment set up for success. You will learn how to choose a device that meets your needs for AI, whether that model needs to be on the edge or in the cloud. You will also learn how to securely communicate with modules within a device, other devices, or the cloud. Finally, you will set up a way to ingest data in the cloud and then set up Spark and AI tools to perform analysis of data, train models, and run machine learning models at scale.

Chapter 2, Handling Data, talks about the basics of ensuring that data in any format can be used by data scientists effectively.

Chapter 3, Machine Learning for IoT, will discuss using machine learning models such as logistic regression and decision trees to solve common IoT issues such as classifying medical results, detecting unsafe drivers, and classifying chemical readings.

Chapter 4, Deep Learning for Predictive Maintenance, will focus on various classification techniques to enable IoT devices to be smart devices.

Chapter 5, Anomaly Detection, will explain how when alarm detection does not classify a particular issue, it can lead to the discovery of issues, and how if a device is acting in an anomalous way, you might want to send out a repair worker to examine the device.

Chapter 6, Computer Vision, will discuss implementing computer vision in the cloud as well as on edge devices such as NVIDIA Jetson Nano.

Chapter 7, NLP and Bots for a Self-Ordering Kiosk, will discuss using NLP and using bots to enable interaction with users ordering foods at a restaurant kiosk.

Chapter 8, Optimizing with Microcontrollers and Pipelines, will discuss how reinforcement learning can be used with a smart traffic intersection to make traffic light decisions that decrease the wait time at traffic lights and allow traffic to flow better.

Chapter 9, Deploying to the Edge, will discuss various ways of applying pre-trained machine learning models to an edge device. This chapter will discuss IoT Edge in detail. Deploying is an important part of the AI pipeline. This chapter will also talk about deploying machine learning models to web applications and mobile using TensorFlow.js and ONNX.

To get the most out of this book

Readers should have a basic understanding of software development. This book uses the Python, C, Java languages. A basic understanding of how to install libraries and packages in these languages as well as basic coding concepts such as arrays and loops will be helpful. A few websites that can help you brush up on the basics of different languages are:

https://www.learnpython.org/

https://www.learnjavaonline.org/

https://www.learn-c.org/

To get the most out of this book a basic understanding of machine learning principles will be beneficial. The hardware used in this book are off the shelf sensors and common IoT development kits and can be purchased from sites such as Adafruit.com and Amazon.com. Most of the code is portable across devices. Device code written in Python can be easily ported to a variety of microprocessors such as a Raspberry Pi, Nvidia Jetson, Lotte Panda, or sometimes even a PC. While code written in C can be ported to a variety of microcontrollers such as the ESP32, ESP8266, and Arduino. Code written in Java can be ported to any android device such as a tablet or phone. 

This book uses Databricks for some of the experiments. Databricks has a free version at https://community.cloud.databricks.com.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Artificial-Intelligence-for-IoT-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838981983_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "This will give you a list of the running containers. Then, open the /data folder."

A block of code is set as follows:

import numpy as np import torchfrom torch import nnfrom torch import optimimport torch.nn.functional as Ffrom torchvision import datasets, transforms, modelsfrom torch.utils.data.sampler import SubsetRandomSampler

Any command-line input or output is written as follows:

cd jetson-inference

mkdir build

cd build

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Click on the New project tile. Then, fill out the Create new project wizard."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Setting Up the IoT and AI Environment

The Internet of Things (IoT) and artificial intelligence (AI) are leading to a dramatic impact on people's lives. Industries such as medicine are being revolutionized by wearable sensors that can monitor patients after they leave the hospital. Machine learning (ML) used on industrial devices is leading to better monitoring and less downtime with techniques such as anomaly detection, predictive maintenance, and prescriptive actions.

Building an IoT device capable of delivering results relies on gathering the right information. This book gives recipes that support the end-to-end IoT/ML life cycle. The next chapter has recipes for making sure that devices have the right sensors and the data is the best it can be for ML outcomes. Tools such as explanatory factor analysis and data collection design are used.

This chapter will cover the following topics:

Choosing a device

Setting up Databricks

The following recipes will be covered:

Setting up IoT Hub

Setting up an IoT Edge device

Deploying ML modules to Edge devices

Setting up Kafka

Installing ML libraries on Databricks

Choosing a device

Before starting with the classic recipe-by-recipe formatting of a cookbook, we'll start by covering a couple of base topics. Choosing the right hardware sets the stage for AI. Working with IoT means working with constraints. Using ML in the cloud is often a cost-effective solution as long as the data is small. Image, video, and sound data will often bog down networks. Worse yet, if you are using a cellular network, it can be highly expensive. The adage there is no money in hardware refers to the fact that most of the money made from IoT comes from the selling of services, not from producing expensive devices.

Dev kits  

Often, companies have their devices designed by electrical engineers. This is a cost-effective option. Custom boards do not have extra components, such as unnecessary Bluetooth or extra USB ports. However, predicting CPU and RAM requirements of an ML model at board design time is difficult. Starter kits can be useful tools to use until the hardware requirements are understood. The following boards are among the most widely adopted boards on the market:

Manifold 2-C with NVIDIA TX2

The i.MX series

LattePanda

Raspberry Pi Class

Arduino

ESP8266

They are often used as a scale of functionality. A Raspberry Pi Class device, for example, would struggle with custom vision applications but would do great for audio or general ML applications. One determining factor for many data scientists is the programming language. The ESP8266 and Arduino need to be programmed in a low-level language such as C or C++, while devices such as Raspberry Pi Class or above can be programmed in any language.

Different devices come at different prices and functionalities. Devices that are Raspberry Pi Class or above can handle ML running on the Edge, reducing cloud cost but increasing the cost of the device. Deciding on whether you are billing your customers with a one-time price for the device or a subscription model may help you determine what type of device you need.

Manifold 2-C with NVIDIA TX2

The NVIDIA Jetson is one of the best choices for running complex ML models such as real-time video on the Edge. The NVIDIA Jetson comes with a built-in NVIDIA GPU. The Manifold version of the product is designed to fit onto a DJI drone and perform tasks such as image recognition or self-flying. The only downside to running NVIDIA Jetson is its use of the ARM64 architecture. ARM64 does not work well with TensorFlow, although other libraries such as PyTorch work fine on ARM64. The Manifold retails for $500, which makes it a high-price option, but this is often necessary when doing real-time ML on the Edge:

Price

Typical Models

Use Cases

$500

Re-enforcement learning, computer vision

Self-flying drones, robotics

 

The i.MX series

The i.MX series of chips is open source and boasts impressive RAM and CPU capabilities. The open design helps engineers build boards easily. The i.MX series uses Freescale semiconductors. Freescale semiconductors have guaranteed production life runs of 10 through 15 years, which means the board design will be stable for years. The i.MX 6 can range from $200 to $300 in cost and can handle CPU-intensive tasks easily, such as object recognition in live streaming video:

Price

Typical Models

Use Cases

$200+

Computer vision, NLP

Sentiment analysis, face recognition, object recognition, voice recognition

 

LattePanda 

Single Board Computers (SBCs) such as the LattePanda are capable of running heavy sensor workloads. These devices can often run Windows or Linux. Like the i.MX series, they are capable of running object recognition on the device; however, the frame rate for recognizing objects can be slow:

Price

Typical Models

Use Cases

$100+

Face detection, voice recognition, high-speed Edge models

Audio-enabled kiosk, high-frequency heart monitoring 

Raspberry Pi Class

Raspberry Pis are a standard starter kit for IoT. With their $35 price tag, they give you a lot of capability for the cost: they can run ML on the Edge with containers. They have a Linux or IoT Core operating system, which allows the easy plugging and playing of components and a community of developers building similar platform tools. Although Raspberry Pi Class devices are capable of handling most ML tasks, they tend to have performance issues on some of the more intensive tasks, such as video recognition:

Price

Typical Models

Use Cases

$35

Decision trees, artificial neural networks, anomaly detection

Smart home, industrial IoT

 

Arduino

At $15, the Arduino is a cost-effective solution. Arduino is supported by a large community and uses the Arduino language, a set of C/C++ functions. If you need to run ML models on an Arduino device, it is possible to package ML models built on popular frameworks such as PyTorch into the Embedded Learning Library (ELL). The ELL allows ML models to be deployed on the device without needing the overhead of a large operating system. Porting ML models using ELL or TensorFlow Lite can be challenging due to the limited memory and compute capacity of the Arduino:

Price

Typical Models

Use Cases

$15

Linear regression

Sensor reading classification

 

ESP8266 

At under $5, devices such as the ESP8266 and smaller represent a class of devices that take data in and transmit it to the cloud for ML evaluations. Besides being inexpensive, they are also often low-power devices, so they can be powered by solar power, network power, or a long-life battery:

Price

Typical Models

Use Cases

$5 or below

In the cloud only

In the cloud only

Setting up Databricks

Processing large amounts of data is not possible on a single computer. That is where distributed systems such as Spark (made by Databricks) come in. Spark allows you to parallelize large workloads over many computers. 

Spark was developed to help solve the Netflix Prize, which had a $1 million prize for the team that made the best recommendation engine. Spark uses distributed computing to wrangle large and complex datasets. There are distributed Python equivalent libraries, such as Koalas, which is a distributed equivalent of pandas. Spark also supports analytics and feature engineering that requires a large amount of compute and memory, such as graph theory problems. Spark has two modes: a batch mode for training large datasets and a streaming mode for scoring data in near real time. 

IoT data tends to be large and imbalanced. A device may have 10 years of data showing it is running in normal conditions and only a few records showing it needs to be shut down immediately to prevent damage. The value of Databricks in IoT is twofold. The first is working with data and training models. Working with data at the terabyte and petabyte scale can overwhelm a single machine. Databricks solves this with its ability to scale out. The second is its streaming capabilities. ML models can be run in the cloud in near real time. Messages can then be pushed back down to the device.

Setting up Databricks is fairly straightforward. You can either go to your cloud provider and sign up for an account in the portal or sign up for the free community edition. If you are taking your product to production, then you should definitely sign up with Azure, AWS, or Google Cloud.

IoT and ML are fundamentally a big data problem. A device may send telemetry for years before it sends telemetry that would indicate an issue with the device. Searching through millions or billions of records to find the few records that are needed can be challenging from a data management perspective. Therefore, optimal data storage is key.

Storing data

Today, there are tools that make it easy to work with large amounts of data. There are a few things to remember though. There are optimal ways of storing data at scale that can make dealing with large datasets easier.

Working with data, the type of large datasets that come from IoT devices can be prohibitively expensive for many companies. Storing data in Delta Lake, for example, can give the user a 340-times performance boost over accessing the data over JSON. The next three sections will introduce three storage methods that can cut down a data analytics job from weeks to hours.

Parquet

Parquet is one of the most common file formats in big data. Parquet's columnar storage format allows it to store highly compressed data. Its advantage is that it takes up less space on the hard disk and takes up less network bandwidth, making it ideal for loading into a DataFrame. Parquet ingestion into Spark has been benchmarked at 34 times the speed of JSON.

Avro

The Avro format is a popular storage format for IoT. While it does not have the high compression ratio that Parquet does, it is less compute expensive to store data because it uses a row-level data storage schema. Avro is a common format for streaming data such as IoT Hub or Kafka.

Delta Lake

Delta Lake is an open source project released by Databricks in 2019. It stores files in Parquet. In addition, it is able to keep track of data check-ins, enabling the data scientist to look at data as it existed at a given time. This can be useful when trying to determine why accuracy in a particular ML model drifted. It also keeps metadata about the data, giving it a 10-times performance increase over standard Parquet for analytics workloads.

While considerations are given to both choosing a device and setting up Databricks, the rest of this chapter will follow a modular, recipe-based format.

Setting up IoT Hub