Machine Learning with the Elastic Stack. - Rich Collier - E-Book

Machine Learning with the Elastic Stack. E-Book

Rich Collier

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection.
The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.
By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 441

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning with the Elastic Stack

Second Edition

Gain valuable insights from your data with Elastic Stack's machine learning features

Rich Collier

Camilla Montonen

Bahaaldine Azarmi

BIRMINGHAM—MUMBAI

Machine Learning with the Elastic Stack

Second Edition

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh

Publishing Product Manager: Devika Battike

Senior Editor: David Sugarman

Content Development Editor: Joseph Sunil

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Nair

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Alishon Mendonca

First published: January 2019

Second published: May 2021

Production reference: 1270521

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-003-4

www.packt.com

Contributors

About the authors

RichCollier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years of experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.

CamillaMontonen is a senior machine learning engineer at Elastic.

BahaaldineAzarmi, or Baha for short, is a solutions architect at Elastic. Prior to this position, Baha co-founded ReachFive, a marketing data platform focused on user behavior and social analytics. Baha also worked for different software vendors such as Talend and Oracle, where he held solutions architect and architect positions. Before Machine Learning with the Elastic Stack, Baha authored books including Learning Kibana 5.0, Scalable Big Data Architecture, and Talend for Big Data. Baha is based in Paris and has an MSc in computer science from Polytech Paris.

About the reviewers

Apoorva Joshi is currently a security data scientist at Elastic (previously Elasticsearch) where she works on using machine learning for malware detection on endpoints. Prior to Elastic, she was a research scientist at FireEye where she applied machine learning to problems in email security. She has a diverse engineering background with a bachelor's in electrical engineering and a master's in computer engineering (with a machine learning focus).

LijuanZhong is an experienced Elastic and cloud engineer. She has a master's degree in information technology and nearly 20 years of working experience in IT and telecom, and is now working with Elastic's major partner in Sweden: Netnordic. She began her journey in Elastic in 2019 and became an Elastic certified engineer. She has also completed the machine learning course by Stanford University. She leads lots of Elastic and machine learning POC and projects, and customers were extremely satisfied with the outcome. She has been the co-organizer of the Elastic Stockholm meetup since 2020. She took part in the Elastic community conference 2021 and gave a talk about machine learning with the Elastic Stack. She was awarded the Elastic bronze contributor award in 2021.

Table of Contents

Preface

Section 1 – Getting Started with Machine Learning with Elastic Stack

Chapter 1: Machine Learning for IT

Overcoming the historical challenges in IT

Dealing with the plethora of data

The advent of automated anomaly detection

Unsupervised versus supervised ML

Using unsupervised ML for anomaly detection

Defining unusual

Learning what's normal

Probability models

Learning the models

De-trending

Scoring of unusualness

The element of time

Applying supervised ML to data frame analytics

The process of supervised learning

Summary

Chapter 2: Enabling and Operationalization

Technical requirements

Enabling Elastic ML features

Enabling ML on a self-managed cluster

Enabling ML in the cloud – Elasticsearch Service

Understanding operationalization

ML nodes

Jobs

Bucketing data in a time series analysis

Feeding data to Elastic ML

The supporting indices

Anomaly detection orchestration

Anomaly detection model snapshots

Summary

Section 2 – Time Series Analysis – Anomaly Detection and Forecasting

Chapter 3: Anomaly Detection

Technical requirements

Elastic ML job types

Dissecting the detector

The function

The field

The partition field

The by field

The over field

The "formula"

Exploring the count functions

Other counting functions

Detecting changes in metric values

Metric functions

Understanding the advanced detector functions

rare

Frequency rare

Information content

Geographic

Time

Splitting analysis along categorical features

Setting the split field

The difference between splitting using partition and by_field

Understanding temporal versus population analysis

Categorization analysis of unstructured messages

Types of messages that are good candidates for categorization

The process used by categorization

Analyzing the categories

Categorization job example

When to avoid using categorization

Managing Elastic ML via the API

Summary

Chapter 4: Forecasting

Technical requirements

Contrasting forecasting with prophesying

Forecasting use cases

Forecasting theory of operation

Single time series forecasting

Looking at forecast results

Multiple time series forecasting

Summary

Chapter 5: Interpreting Results

Technical requirements

Viewing the Elastic ML results index

Anomaly scores

Bucket-level scoring

Normalization

Influencer-level scoring

Influencers

Record-level scoring

Results index schema details

Bucket results

Record results

Influencer results

Multi-bucket anomalies

Multi-bucket anomaly example

Multi-bucket scoring

Forecast results

Querying for forecast results

Results API

Results API endpoints

Getting the overall buckets API

Getting the categories API

Custom dashboards and Canvas workpads

Dashboard "embeddables"

Anomalies as annotations in TSVB

Customizing Canvas workpads

Summary

Chapter 6: Alerting on ML Analysis

Technical requirements

Understanding alerting concepts

Anomalies are not necessarily alerts

In real-time alerting, timing matters

Building alerts from the ML UI

Defining sample anomaly detection jobs

Creating alerts against the sample jobs

Simulating some real-time anomalous behavior

Receiving and reviewing the alerts

Creating an alert with a watch

Understanding the anatomy of the legacy default ML watch

Custom watches can offer some unique functionality

Summary

Chapter 7: AIOps and Root Cause Analysis

Technical requirements

Demystifying the term ''AIOps''

Understanding the importance and limitations of KPIs

Moving beyond KPIs

Organizing data for better analysis

Custom queries for anomaly detection datafeeds

Data enrichment on ingest

Leveraging the contextual information

Analysis splits

Statistical influencers

Bringing it all together for RCA

Outage background

Correlation and shared influencers

Summary

Chapter 8: Anomaly Detection in Other Elastic Stack Apps

Technical requirements

Anomaly detection in Elastic APM

Enabling anomaly detection for APM

Viewing the anomaly detection job results in the APM UI

Creating ML Jobs via the data recognizer

Anomaly detection in the Logs app

Log categories

Log anomalies

Anomaly detection in the Metrics app

Anomaly detection in the Uptime app

Anomaly detection in the Elastic Security app

Prebuilt anomaly detection jobs

Anomaly detection jobs as detection alerts

Summary

Section 3 – Data Frame Analysis

Chapter 9: Introducing Data Frame Analytics

Technical requirements

Learning how to use transforms

Why are transforms useful?

The anatomy of a transform

Using transforms to analyze e-commerce orders

Exploring more advanced pivot and aggregation configurations

Discovering the difference between batch and continuous transforms

Analyzing social media feeds using continuous transforms

Using Painless for advanced transform configurations

Introducing Painless

Working with Python and Elasticsearch

A brief tour of the Python Elasticsearch clients

Summary

Further reading

Chapter 10: Outlier Detection

Technical requirements

Discovering the four techniques used for outlier detection

Understanding feature influence

How does outlier detection differ from anomaly detection?

Applying outlier detection in practice

Evaluating outlier detection with the Evaluate API

Hyperparameter tuning for outlier detection

Summary

Chapter 11: Classification Analysis

Technical requirements

Classification: from data to a trained model

Feature engineering

Evaluating the model

Taking your first steps with classification

Classification under the hood: gradient boosted decision trees

Introduction to decision trees

Gradient boosted decision trees

Hyperparameters

Interpreting results

Summary

Further reading

Chapter 12: Regression

Technical requirements

Using regression analysis to predict house prices

Using decision trees for regression

Summary

Further reading

Chapter 13: Inference

Technical requirements

Examining, exporting, and importing your trained models with the Trained Models API

A tour of the Trained Models API

Exporting and importing trained models with the Trained Models API and Python

Understanding inference processors and ingest pipelines

Handling missing or corrupted data in ingest pipelines

Using inference processor configuration options to gain more insight into your predictions

Importing external models into Elasticsearch using eland

Learning about supported external models in eland

Training a scikit-learn DecisionTreeClassifier and importing it into Elasticsearch using eland

Summary

Appendix: Anomaly Detection Tips

Technical requirements

Understanding influencers in split versus non-split jobs

Using one-sided functions to your advantage

Ignoring time periods

Ignoring an upcoming (known) window of time

Ignoring an unexpected window of time, after the fact

Using custom rules and filters to your advantage

Creating custom rules

Benefiting from custom rules for a "top-down" alerting philosophy

Anomaly detection job throughput considerations

Avoiding the over-engineering of a use case

Using anomaly detection on runtime fields

Summary

Why subscribe?

Other Books You May Enjoy

Preface

Elastic Stack, previously known as the ELK Stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as classification, regression, and outlier detection.

The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within the Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.

By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning into your distributed search and data analysis platform.

Who this book is for

If you're a data professional looking to gain insights into Elasticsearch data without having to rely on a machine learning specialist or custom development, then this Elastic Stack machine learning book is for you. You'll also find this book useful if you want to integrate machine learning with your observability, security, and analytics applications. Working knowledge of the Elastic Stack is needed to get the most out of this book.

What this book covers

Chapter 1, Machine Learning for IT, acts as an introductory and background primer on the historical challenges of manual data analysis in IT and security operations. This chapter also provides a comprehensive overview of the theory of operation of Elastic machine learning in order to get an intrinsic understanding of what is happening under the hood.

Chapter 2, Enabling and Operationalization, explains enabling the capabilities of machine learning in the Elastic Stack, and also details the theory of operation of the Elastic machine learning algorithms. Additionally, a detailed explanation of the logistical operation of Elastic machine learning is explained.

Chapter 3, Anomaly Detection, goes into detail regarding the unsupervised automated anomaly detection techniques that are at the heart of time series analysis.

Chapter 4, Forecasting, explains how Elastic machine learning's sophisticated time series models can be used for more than just anomaly detection. Forecasting capabilities enable users to extrapolate trends and behaviors into the future so as to assist with use cases such as capacity planning.

Chapter 5, Interpreting Results, explains how to fully understand the results of anomaly detection and forecasting and use them to your advantage in visualizations, dashboards, and infographics.

Chapter 6, Alerting on ML Analysis, explains the different techniques for integrating the proactive notification capability of Elastic alerting with the insights uncovered by machine learning in order to make anomaly detection even more actionable.

Chapter 7, AIOps and Root Cause Analysis, explains how leveraging Elastic machine learning to holistically inspect and analyze data from disparate data sources into correlated views gives the analyst a leg up in terms of legacy approaches.

Chapter 8, Anomaly Detection in other Elastic Stack Apps, explains how anomaly detection is leveraged by other apps within the Elastic Stack to bring added value to data analysis.

Chapter 9, Introducing Data Frame Analysis, covers the concepts of data frame analytics, how it is different from time series anomaly detection, and what tools are available to the user to load, prepare, transform, and analyze data with Elastic machine learning.

Chapter 10, Outlier Detection covers the outlier detection analysis capabilities of data frame analytics along with Elastic machine learning.

Chapter 11, Classification Analysis, covers the classification analysis capabilities of data frame analytics along with Elastic machine learning.

Chapter 12, Regression covers the regression analysis capabilities of data frame analytics along with Elastic machine learning.

Chapter 13, Inference, covers the usage of trained machine learning models for "inference" – to actually predict output values in an operationalized manner.

Appendix: Anomaly Detection Tips, includes a variety of practical advice topics that didn't quite fit in other chapters. These useful tidbits will help you to get the most out of Elastic ML.

To get the most out of this book

You will need a system with a good internet connection and an Elastic account.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-with-Elastic-Stack-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801070034_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The analysis can also be split along categorical fields by setting partition_field_name."

A block of code is set as follows:

18/05/2020 15:16:00 DB Not Updated [Master] Table

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

export DATABRICKS_AAD_TOKEN=<azure-ad-token>

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Let's now click the Viewresults button to investigate in detail what the anomaly detection job has found in the data."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1 – Getting Started with Machine Learning with Elastic Stack

This section provides an intuitive understanding of the way Elastic ML works – from the perspective of not only what the algorithms are doing but also the logistics of the operation of the software within the Elastic Stack.

This section covers the following chapters:

Chapter 1, Machine Learning for ITChapter 2, Enabling and Operationalization

Chapter 2: Enabling and Operationalization

We have just learned the basics of what Elastic ML is doing to accomplish both unsupervised automated anomaly detection and supervised data frame analysis. Now it is time to get detailed about how Elastic ML works inside the Elastic Stack (Elasticsearch and Kibana).

This chapter will focus on both the installation (really, the enablement) of Elastic ML features and a detailed discussion of the logistics of the operation, especially with respect to anomaly detection. Specifically, we will cover the following topics:

Enabling Elastic ML featuresUnderstanding operationalization

Technical requirements

The information in this chapter will use the Elastic Stack as it exists in v7.10 and the workflow of the Elasticsearch Service of Elastic Cloud as of November 2020.

Enabling Elastic ML features

The process for enabling Elastic ML features inside the Elastic Stack is slightly different if you are doing so within a self-managed cluster versus using the ElasticsearchService (ESS) of Elastic Cloud. In short, on a self-managed cluster, the features of ML are enabled via a license key (either a commercial key or a trial key). In ESS, a dedicated ML node needs to be provisioned within the cluster in order to utilize Elastic ML. In the following sections, we will explain the details of how this is accomplished in both scenarios.

Enabling ML on a self-managed cluster

If you have a self-managed cluster that was created from the downloading of Elastic's default distributions of Elasticsearch and Kibana (available at elastic.co/downloads/), enabling Elastic ML features via a license key is very simple. Be sure to not use the Apache 2.0 licensed open source distributions that do not contain the X-Pack code base.

Elastic ML, unlike the bulk of the capabilities of the Elastic Stack, is not free – it requires a commercial (specifically, a Platinum level) license. It is, however, open source in that the source code is out in the open on GitHub (github.com/elastic/ml-cpp) and that users can look at the code, file issues, make comments, or even execute pull requests. However, the usage of Elastic ML is governed by a commercial agreement with Elastic, the company.

When Elastic ML was first released (back in the v5.x days), it was part of the closed source features known as X-Pack that required a separate installation step. However, as of version 6.3, the code of X-Pack was "opened" (elastic.co/what-is/open-x-pack) and folded into the default distribution of Elasticsearch and Kibana. Therefore, a separate X-Pack installation step was no longer necessary, just the "enablement" of the features via a commercial license (or a trial license).

The installation procedure for Elasticsearch and Kibana itself is beyond the scope of this book, but it is easily accomplished by following the online documentation on the Elastic website (available at elastic.co/guide/).

Once Elasticsearch and Kibana are running, navigate to the Stack option from the left-side navigation menu and select License Management. You will see a screen like the following:

Figure 2.1 – The License management screen in Kibana

Notice that, by default, the license level applied is the free Basic tier. This enables you to use some of the advanced features not found in the Apache 2.0 licensed open source distribution, or on third-party services (such as the Amazon Elasticsearch Service). A handy guide for comparing the features that exist at the different license levels can be found on the Elastic website at elastic.co/subscriptions.

As previously stated, Elastic ML requires a Platinum tier license. If you have purchased a Platinum license from Elastic, you can apply that license by clicking on the Update license button, as shown on the screen in Figure 2.1. If you do not have a Platinum license, you can start a free 30-day trial by clicking the Start my trial button to enable Elastic ML and the other Platinum features (assuming you agree to the license terms and conditions):

Figure 2.2 – Starting a free 30-day trial

Once this is complete, the licensing screen will indicate that you are now in an active trial of the Platinum features of the Elastic Stack:

Figure 2.3 – Trial license activated

Once this is done, you can start to use Elastic ML right away. Additional configuration steps are needed to take advantage of the other Platinum features, but those steps are outside the scope of this book. Consult the Elastic documentation for further assistance on configuring those features.

Enabling ML in the cloud – Elasticsearch Service

If downloading, installing, and self-managing the Elastic Stack is less interesting than just getting the Elastic Stack platform offered as a service, then head on over to Elastic Cloud (cloud.elastic.co) and sign up for a free trial, using only your email:

Figure 2.4 – Elastic Cloud welcome screen

You can then perform the following steps:

Once inside the Elastic Cloud interface after logging in, you will have the ability to start a free trial by clicking the Start your free trial button:

Figure 2.5 – Elastic Cloud home screen

Once the button is clicked, you will see that your 14-day free trial of ESS has started:

Figure 2.6 – Elasticsearch Service trial enabled

Of course, in order to try out Elastic ML, you first need an Elastic Stack cluster provisioned. There are a few options to create what ESS refers to as deployments, with some that are tailored to specific use cases. For this example, we will use the Elastic Stack template on the left of Figure 2.6 and choose the I/O Optimized hardware profile, but feel free to experiment with the other options during your trial:

Figure 2.7 – Creating an ESS deployment

You can also choose what cloud provider and which region