34,79 €
Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection.
The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.
By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 441
Veröffentlichungsjahr: 2021
Gain valuable insights from your data with Elastic Stack's machine learning features
Rich Collier
Camilla Montonen
Bahaaldine Azarmi
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Devika Battike
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Aparna Nair
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Alishon Mendonca
First published: January 2019
Second published: May 2021
Production reference: 1270521
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-003-4
www.packt.com
RichCollier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years of experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.
CamillaMontonen is a senior machine learning engineer at Elastic.
BahaaldineAzarmi, or Baha for short, is a solutions architect at Elastic. Prior to this position, Baha co-founded ReachFive, a marketing data platform focused on user behavior and social analytics. Baha also worked for different software vendors such as Talend and Oracle, where he held solutions architect and architect positions. Before Machine Learning with the Elastic Stack, Baha authored books including Learning Kibana 5.0, Scalable Big Data Architecture, and Talend for Big Data. Baha is based in Paris and has an MSc in computer science from Polytech Paris.
Apoorva Joshi is currently a security data scientist at Elastic (previously Elasticsearch) where she works on using machine learning for malware detection on endpoints. Prior to Elastic, she was a research scientist at FireEye where she applied machine learning to problems in email security. She has a diverse engineering background with a bachelor's in electrical engineering and a master's in computer engineering (with a machine learning focus).
LijuanZhong is an experienced Elastic and cloud engineer. She has a master's degree in information technology and nearly 20 years of working experience in IT and telecom, and is now working with Elastic's major partner in Sweden: Netnordic. She began her journey in Elastic in 2019 and became an Elastic certified engineer. She has also completed the machine learning course by Stanford University. She leads lots of Elastic and machine learning POC and projects, and customers were extremely satisfied with the outcome. She has been the co-organizer of the Elastic Stockholm meetup since 2020. She took part in the Elastic community conference 2021 and gave a talk about machine learning with the Elastic Stack. She was awarded the Elastic bronze contributor award in 2021.
Elastic Stack, previously known as the ELK Stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as classification, regression, and outlier detection.
The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within the Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.
By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning into your distributed search and data analysis platform.
If you're a data professional looking to gain insights into Elasticsearch data without having to rely on a machine learning specialist or custom development, then this Elastic Stack machine learning book is for you. You'll also find this book useful if you want to integrate machine learning with your observability, security, and analytics applications. Working knowledge of the Elastic Stack is needed to get the most out of this book.
Chapter 1, Machine Learning for IT, acts as an introductory and background primer on the historical challenges of manual data analysis in IT and security operations. This chapter also provides a comprehensive overview of the theory of operation of Elastic machine learning in order to get an intrinsic understanding of what is happening under the hood.
Chapter 2, Enabling and Operationalization, explains enabling the capabilities of machine learning in the Elastic Stack, and also details the theory of operation of the Elastic machine learning algorithms. Additionally, a detailed explanation of the logistical operation of Elastic machine learning is explained.
Chapter 3, Anomaly Detection, goes into detail regarding the unsupervised automated anomaly detection techniques that are at the heart of time series analysis.
Chapter 4, Forecasting, explains how Elastic machine learning's sophisticated time series models can be used for more than just anomaly detection. Forecasting capabilities enable users to extrapolate trends and behaviors into the future so as to assist with use cases such as capacity planning.
Chapter 5, Interpreting Results, explains how to fully understand the results of anomaly detection and forecasting and use them to your advantage in visualizations, dashboards, and infographics.
Chapter 6, Alerting on ML Analysis, explains the different techniques for integrating the proactive notification capability of Elastic alerting with the insights uncovered by machine learning in order to make anomaly detection even more actionable.
Chapter 7, AIOps and Root Cause Analysis, explains how leveraging Elastic machine learning to holistically inspect and analyze data from disparate data sources into correlated views gives the analyst a leg up in terms of legacy approaches.
Chapter 8, Anomaly Detection in other Elastic Stack Apps, explains how anomaly detection is leveraged by other apps within the Elastic Stack to bring added value to data analysis.
Chapter 9, Introducing Data Frame Analysis, covers the concepts of data frame analytics, how it is different from time series anomaly detection, and what tools are available to the user to load, prepare, transform, and analyze data with Elastic machine learning.
Chapter 10, Outlier Detection covers the outlier detection analysis capabilities of data frame analytics along with Elastic machine learning.
Chapter 11, Classification Analysis, covers the classification analysis capabilities of data frame analytics along with Elastic machine learning.
Chapter 12, Regression covers the regression analysis capabilities of data frame analytics along with Elastic machine learning.
Chapter 13, Inference, covers the usage of trained machine learning models for "inference" – to actually predict output values in an operationalized manner.
Appendix: Anomaly Detection Tips, includes a variety of practical advice topics that didn't quite fit in other chapters. These useful tidbits will help you to get the most out of Elastic ML.
You will need a system with a good internet connection and an Elastic account.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-with-Elastic-Stack-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801070034_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The analysis can also be split along categorical fields by setting partition_field_name."
A block of code is set as follows:
18/05/2020 15:16:00 DB Not Updated [Master] Table
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
export DATABRICKS_AAD_TOKEN=<azure-ad-token>
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Let's now click the Viewresults button to investigate in detail what the anomaly detection job has found in the data."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This section provides an intuitive understanding of the way Elastic ML works – from the perspective of not only what the algorithms are doing but also the logistics of the operation of the software within the Elastic Stack.
This section covers the following chapters:
Chapter 1, Machine Learning for ITChapter 2, Enabling and OperationalizationWe have just learned the basics of what Elastic ML is doing to accomplish both unsupervised automated anomaly detection and supervised data frame analysis. Now it is time to get detailed about how Elastic ML works inside the Elastic Stack (Elasticsearch and Kibana).
This chapter will focus on both the installation (really, the enablement) of Elastic ML features and a detailed discussion of the logistics of the operation, especially with respect to anomaly detection. Specifically, we will cover the following topics:
Enabling Elastic ML featuresUnderstanding operationalizationThe information in this chapter will use the Elastic Stack as it exists in v7.10 and the workflow of the Elasticsearch Service of Elastic Cloud as of November 2020.
The process for enabling Elastic ML features inside the Elastic Stack is slightly different if you are doing so within a self-managed cluster versus using the ElasticsearchService (ESS) of Elastic Cloud. In short, on a self-managed cluster, the features of ML are enabled via a license key (either a commercial key or a trial key). In ESS, a dedicated ML node needs to be provisioned within the cluster in order to utilize Elastic ML. In the following sections, we will explain the details of how this is accomplished in both scenarios.
If you have a self-managed cluster that was created from the downloading of Elastic's default distributions of Elasticsearch and Kibana (available at elastic.co/downloads/), enabling Elastic ML features via a license key is very simple. Be sure to not use the Apache 2.0 licensed open source distributions that do not contain the X-Pack code base.
Elastic ML, unlike the bulk of the capabilities of the Elastic Stack, is not free – it requires a commercial (specifically, a Platinum level) license. It is, however, open source in that the source code is out in the open on GitHub (github.com/elastic/ml-cpp) and that users can look at the code, file issues, make comments, or even execute pull requests. However, the usage of Elastic ML is governed by a commercial agreement with Elastic, the company.
When Elastic ML was first released (back in the v5.x days), it was part of the closed source features known as X-Pack that required a separate installation step. However, as of version 6.3, the code of X-Pack was "opened" (elastic.co/what-is/open-x-pack) and folded into the default distribution of Elasticsearch and Kibana. Therefore, a separate X-Pack installation step was no longer necessary, just the "enablement" of the features via a commercial license (or a trial license).
The installation procedure for Elasticsearch and Kibana itself is beyond the scope of this book, but it is easily accomplished by following the online documentation on the Elastic website (available at elastic.co/guide/).
Once Elasticsearch and Kibana are running, navigate to the Stack option from the left-side navigation menu and select License Management. You will see a screen like the following:
Figure 2.1 – The License management screen in Kibana
Notice that, by default, the license level applied is the free Basic tier. This enables you to use some of the advanced features not found in the Apache 2.0 licensed open source distribution, or on third-party services (such as the Amazon Elasticsearch Service). A handy guide for comparing the features that exist at the different license levels can be found on the Elastic website at elastic.co/subscriptions.
As previously stated, Elastic ML requires a Platinum tier license. If you have purchased a Platinum license from Elastic, you can apply that license by clicking on the Update license button, as shown on the screen in Figure 2.1. If you do not have a Platinum license, you can start a free 30-day trial by clicking the Start my trial button to enable Elastic ML and the other Platinum features (assuming you agree to the license terms and conditions):
Figure 2.2 – Starting a free 30-day trial
Once this is complete, the licensing screen will indicate that you are now in an active trial of the Platinum features of the Elastic Stack:
Figure 2.3 – Trial license activated
Once this is done, you can start to use Elastic ML right away. Additional configuration steps are needed to take advantage of the other Platinum features, but those steps are outside the scope of this book. Consult the Elastic documentation for further assistance on configuring those features.
If downloading, installing, and self-managing the Elastic Stack is less interesting than just getting the Elastic Stack platform offered as a service, then head on over to Elastic Cloud (cloud.elastic.co) and sign up for a free trial, using only your email:
Figure 2.4 – Elastic Cloud welcome screen
You can then perform the following steps:
Once inside the Elastic Cloud interface after logging in, you will have the ability to start a free trial by clicking the Start your free trial button:Figure 2.5 – Elastic Cloud home screen
Once the button is clicked, you will see that your 14-day free trial of ESS has started:
Figure 2.6 – Elasticsearch Service trial enabled
Of course, in order to try out Elastic ML, you first need an Elastic Stack cluster provisioned. There are a few options to create what ESS refers to as deployments, with some that are tailored to specific use cases. For this example, we will use the Elastic Stack template on the left of Figure 2.6 and choose the I/O Optimized hardware profile, but feel free to experiment with the other options during your trial:Figure 2.7 – Creating an ESS deployment
You can also choose what cloud provider and which region