Machine Learning with the Elastic Stack - Rich Collier - E-Book

Machine Learning with the Elastic Stack E-Book

Rich Collier

0,0
36,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Leverage Elastic Stack's machine learning features to gain valuable insight from your data




Key Features



  • Combine machine learning with the analytic capabilities of Elastic Stack


  • Analyze large volumes of search data and gain actionable insight from them


  • Use external analytical tools with your Elastic Stack to improve its performance





Book Description



Machine Learning with the Elastic Stack is a comprehensive overview of the embedded commercial features of anomaly detection and forecasting. The book starts with installing and setting up Elastic Stack. You will perform time series analysis on varied kinds of data, such as log files, network flows, application metrics, and financial data.






As you progress through the chapters, you will deploy machine learning within the Elastic Stack for logging, security, and metrics. In the concluding chapters, you will see how machine learning jobs can be automatically distributed and managed across the Elasticsearch cluster and made resilient to failure.






By the end of this book, you will understand the performance aspects of incorporating machine learning within the Elastic ecosystem and create anomaly detection jobs and view results from Kibana directly.




What you will learn



  • Install the Elastic Stack to use machine learning features


  • Understand how Elastic machine learning is used to detect a variety of anomaly types


  • Apply effective anomaly detection to IT operations and security analytics


  • Leverage the output of Elastic machine learning in custom views, dashboards, and proactive alerting


  • Combine your created jobs to correlate anomalies of different layers of infrastructure


  • Learn various tips and tricks to get the most out of Elastic machine learning



Who this book is for



If you are a data professional eager to gain insight on Elasticsearch data without having to rely on a machine learning specialist or custom development, Machine Learning with the Elastic Stack is for you. Those looking to integrate machine learning within their search and analytics applications will also find this book very useful. Prior experience with the Elastic Stack is needed to get the most out of this book.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 250

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning with the Elastic Stack

 

 

 

 

 

Expert techniques to integrate machine learning with distributed search and analytics

 

 

 

 

 

 

 

 

 

Rich Collier
Bahaaldine Azarmi

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Machine Learning with the Elastic Stack

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Amey VarangaonkarAcquisition Editor:Aditi GourContent Development Editor:Pratik AndradeTechnical Editor:Jovita AlvaCopy Editor: Safis EditingProject Coordinator:Namrata SwettaProofreader: Safis EditingIndexer:Priyanka DhadkeGraphics:Jisha ChirayilProduction Coordinator:Arvindkumar Gupta

First published: January 2019

Production reference: 1300119

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78847-754-3

www.packtpub.com

To the incredibly smart and talented development engineers of the Elastic Machine Learning team – thanks for making an incredible product that artfully balances complexity with simplicity.                         

                                                                                                                                                                                                                              – Rich and Baha
 
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the authors

Rich Collier is a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions. Rich's technical specialties include big data analytics, machine learning, anomaly detection, threat detection, security operations, application performance management, web applications, and contact center technologies. Rich is based in Boston, Massachusetts.

 

 

 

 

Bahaaldine Azarmi, or Baha for short, is a solutions architect at Elastic. Prior to this position, Baha co-founded ReachFive, a marketing data platform focused on user behavior and social analytics. Baha also worked for different software vendors such as Talend and Oracle, where he held solutions architect and architect positions. Before Machine Learning with the Elastic Stack, Baha authored books including Learning Kibana 5.0, Scalable Big Data Architecture, and Talend for Big Data. Baha is based in Paris and has an MSc in computer science from Polytech'Paris.

About the reviewers

Dan Noble is an accomplished full-stack web developer, data engineer, entrepreneur, and author with more than 12 years of industry experience and a passion for building novel software solutions that solve meaningful problems. Dan is the founder of Geofable, a software company that helps people tell stories with spatial data. He enjoys working with a variety of programming languages and tools, particularly Python, JavaScript, React, Elasticsearch, and Postgres.

Dan has been a user and advocate of Elasticsearch since 2011. He is the author of the book Monitoring Elasticsearch, and was a technical reviewer for several other books, including The Elasticsearch Cookbook, by Alberto Paro, and Learning Elasticsearch, by Abhishek Andhavarapu.

 

 

 

 

MatiasCascallares is a software engineer with more than 15 years of experience in software development in a variety of roles, with a deep focus on open source technologies and highly scalable environments. Having lived on three different continents, he has a wealth of experience in multicultural and distributed teams.

Nowadays, in the position of principal solutions architect at Elastic, he helps organizations to get value from their data and find success using the Elastic Stack. He has been involved in projects across multiple verticals, including finance and banking, transportation, e-commerce, and telecommunications.

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Machine Learning with the Elastic Stack

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Machine Learning for IT

Overcoming the historical challenges

The plethora of data

The advent of automated anomaly detection

Theory of operation

Defining unusual

Learning normal, unsupervised

Probability models

Learning the models

De-trending

Scoring of unusualness

Operationalization

Jobs

ML nodes

Bucketization

The datafeed

Supporting indices

.ml-state

.ml-notifications

.ml-anomalies-*

The orchestration

Summary

Installing the Elastic Stack with Machine Learning

Installing the Elastic Stack

Downloading the software

Installing Elasticsearch

Installing Kibana

Enabling Platinum features

A guided tour of Elastic ML features

Getting data for analysis

ML job types in Kibana

Data Visualizer

The Single metric job

Multi-metric job

Population job

Advanced job

Controlling ML via the API

Summary

Event Change Detection

How to understand the normal rate of occurrence

Exploring count functions

Summarized counts

Splitting the counts

Other counting functions

Non-zero count

Distinct count

Counting in population analysis

Detecting things that rarely occur

Counting message-based logs via categorization

Types of messages that can be categorized by ML

The categorization process

Counting the categories

Putting it all together

When not to use categorization

Summary

IT Operational Analytics and Root Cause Analysis

Holistic application visibility

The importance and limitations of KPIs

Beyond the KPIs

Data organization

Effective data segmentation

Custom queries for ML jobs

Data enrichment on ingest

Leveraging the contextual information

Analysis splits

Statistical influencers

Bringing it all together for root cause analysis

Outage background

Visual correlation and shared influencers

Summary

Security Analytics with Elastic Machine Learning

Security in the field

The volume and variety of data

The geometry of an attack

Threat hunting architecture

Layer-based ingestion

Threat intelligence

Investigation analytics

Assessment of compromise

Summary

Alerting on ML Analysis

Results presentation

The results index

Bucket results

Record results

Influencer results

Alerts from the Machine Learning UI in Kibana

Anatomy of the default watch from the ML UI in Kibana

Creating ML alerts manually

Summary

Using Elastic ML Data in Kibana Dashboards

Visualization options in Kibana

Visualization examples

Timelion

Time series visual builder 

Preparing data for anomaly detection analysis

The dataset

Ingesting the data

Creating anomaly detection jobs

Global traffic analysis job

A HTTP response code profiling of the host making requests

Traffic per host analysis

Building the visualizations

Configuring the index pattern

Using ML data in TSVB

Creating a correlation Heat Map

Using ML data in Timelion

Building the dashboard

Summary

Using Elastic ML with Kibana Canvas

Introduction to Canvas

What is Canvas?

The Canvas expression

Building Elastic ML Canvas slides

Preparing your data

Anomalies in a Canvas data table

Using the new SQL integration

Summary

Forecasting

Forecasting versus prophesying

Forecasting use cases

Forecasting – theory of operation

Single time series forecasting

Dataset preparation

Creating the ML job for forecasting

Forecast results

Multiple time series forecasting

Summary

ML Tips and Tricks

Job groups

Influencers in split versus non-split jobs

Using ML on scripted fields

Using one-sided ML functions to your advantage

Ignoring time periods

Ignoring an upcoming (known) window of time

Creating a calendar event

Stopping and starting a datafeed to ignore the desired timeframe

Ignoring an unexpected window of time, after the fact

Clone the job and re-run historical data

Revert the model snapshot

Don't over-engineer the use case

ML job throughput considerations

Top-down alerting by leveraging custom rules

Sizing ML deployments

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Data analysis, manual charting, thresholding, and alerting have been an inherent part of IT and security operations for decades. Until the advent of sophisticated machine learning algorithms and techniques, much of the burden of proactive insight, problem detection, and root cause analysis fell onto the shoulders of the analysts. As the complexity and scale of modern applications and infrastructure has grown exponentially, it is apparent that humans need help. Elastic machine learning (ML) is an effective, easy-to-use solution for anomaly detection and forecasting use cases in relation to time-series machine data. This definitive elastic ML guide will get the reader proficient in the operation and techniques of advanced analytics without the need to be well-versed in data science.

Who this book is for

If you are an IT professional eager to gain further insights into machine data within Elasticsearch without having to rely on an ML specialist or custom development, ML with the Elastic Stack is for you. Those looking to augment manual data analysis with automated, advanced anomaly detection and forecasting will find this book very useful. Prior experience with the Elastic Stack will be helpful in order to get the most out of this book.

What this book covers

Chapter 1, Machine Learning for IT, is an introductory and background primer on the historical challenges of manual data analysis in IT and security operations. This chapter provides a comprehensive overview of the theory of operation of Elastic ML in order to get an intrinsic understanding of what is happening under the hood.

Chapter 2, Installing the Elastic Stack with Machine Learning, walks you through the comprehensive and descriptive installation procedures for Elasticsearch, Kibana, Metricbeat, and the enabling of the ML feature. This is followed by several working examples of data analysis executed on Metricbeat data to introduce the basics of the mechanics of the ML analysis jobs.

Chapter 3, Event Change Detection, goes into detail regarding the count-based analysis techniques that are at the crux of effective log file analysis.

Chapter 4, IT Operational Analytics and Root Cause Analysis, explains how leveraging Elastic ML to holistically inspect and analyze data from disparate data sources into correlated views gives the analyst a leg up in terms of legacy approaches.

Chapter 5, Security Analytics with Elastic Machine Learning, explains how anomaly detection and behavioral analytics have become a must-have feature for assisting security experts in detecting and unraveling the advanced persistent threats posed by today's cyber adversaries. Elastic ML's approach of detecting behavioral outliers fits perfectly into the strategies of those analysts who use the Elastic Stack for security-based machine data.

Chapter 6, Alerting on ML Analysis, explains the different techniques for integrating the proactive notification capability of Elastic Alerting with the insights uncovered by ML in order to make anomaly detection even more actionable.

Chapter 7, Using Elastic ML Data in Kibana Dashboards, explains how to augment your traditional Kibana dashboard visualizations with information gleaned from ML.

Chapter 8, Using Elastic ML with Kibana Canvas, covers how to create pixel-perfect live reports with real-time data analysis from ML.

Chapter 9, Forecasting, explains how Elastic ML's sophisticated time-series models can be used for more than just anomaly detection. Forecasting capabilities enable users to extrapolate trends and behaviors into the future so as to assist with use cases such as capacity planning.

Chapter 10, ML Tips and Tricks, includes a variety of practical advice topics that didn't quite fit in other chapters. These useful tidbits will help you to get the most out of Elastic ML.

To get the most out of this book

While this book starts from the ground up in terms of instructions on installation and configuration of the Elastic Stack and the ML feature, it is helpful to have prior experience of setting up and using the Elastic Stack or a similar big data analysis platform.

While the majority of product installation and utilization can be managed by means of a personal computer/laptop (that meets the minimum specifications), the reader can also register for a free trial setup on https://cloud.elastic.co/login?redirectTo=%2Fdeployments if that is logistically easier.

No prior experience of IT and/or security operations is necessary to get the most out of this book, but many topics and concepts are written with a view to addressing the plight of an operations analyst.

Many examples shown in this book use demo data sets that are available on the GitHub repository for this book. However, some examples (in Chapter 3, Event Change Detection and Chapter 5, Security Analytics with Elastic Machine Learning for example) use datasets that could not be distributed publicly. In those cases, you can either replicate the examples using similar kinds of data sets (that is, web access logs) or just follow along conceptually.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-with-the-Elastic-Stack. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781788477543_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The log section will print a message to an output file, which by default is the Elasticsearch log file."

A block of code is set as follows:

GET _cat/indices/metricbeat*

Any command-line input or output is written as follows:

cd kibana-x.y.z-darwin-x86_64/

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "In the Management section of Kibana, click on the Index Patterns link."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Machine Learning for IT

A decade ago, the idea of using machine learning (ML)-based technology in IT operations or IT security seemed a little like science fiction. Today, however, it is one of the most common buzzwords used by software vendors. Clearly, there has been a major shift in both the perception of the need for the technology and the capabilities that the state-of-the-art implementations of the technology can bring to bear. This evolution is important to understand to fully appreciate how Elastic's ML came to be and what problems it was designed to solve.

This chapter is dedicated to reviewing the history and concepts behind how Elastic's ML works. If you are uninterested and want to jump right into the installation and usage of the product, feel free to skip to Chapter 2, Installing the Elastic Stack with ML.

Overcoming the historical challenges

IT application support specialists and application architects have a demanding job with high expectations. Not only are they tasked with moving new and innovative projects into place for the business, but they also have to also keep currently deployed applications up and running as smoothly as possible. Today's applications are significantly more complicated than ever before—they are highly componentized, distributed, and possibly virtualized. They could be developed using Agile, or by an outsourced team. Plus, they are most likely constantly changing. Some DevOps teams claim they can typically make more than a hundred changes per day to a live production system. Trying to understand a modern application's health and behavior is like a mechanic trying to inspect an automobile while it is moving.

IT security operations analysts have similar struggles in keeping up with day-to-day operations, but they obviously have a different focus of keeping the enterprise secure and mitigating emerging threats. Hackers, malware, and rogue insiders have become so ubiquitous and sophisticated that the prevailing wisdom is that there is no longer a question of if an organization will be compromised—it's more of a question of when they will find out about it. Clearly, knowing about it as early as possible (before too much damage is done) is much more preferable than learning about it for the first time from law enforcement or the evening news.

So, how can they be helped? Is the crux of the problem that application experts and security analysts lack access to data to help them do their job effectively? Actually, in most cases, it is the exact opposite. Many IT organizations are drowning in data.

The plethora of data

IT departments have invested in monitoring tools for decades and it is not uncommon to have a dozen or more tools actively collecting and archiving data that can be measured in terabytes, or even petabytes, per day. The data can range from rudimentary infrastructure- and network-level data to deep diagnostic data and/or system and application log files. Business-level key performance indicators (KPIs) could also be tracked, sometimes including data about the end user's experience. The sheer depth and breadth of data available, in some ways, is the most comprehensive that it has ever been.

To detect emerging problems or threats hidden in that data, there have traditionally been several main approaches to distilling the data into informational insights:

Filter/search

: Some tools allow the user to define searches to help trim down the data into a more manageable set. While extremely useful, this capability is most often used in an ad hoc fashion once a problem is suspected. Even then, the success of using this approach usually hinges on the ability for the user to know what they are looking for and their level of experience—both with prior knowledge of living through similar past situations and expertise in the search technology itself.

Visualizations

: Dashboards, charts, and widgets are also extremely useful to help us understand what data has been doing and where it is trending. However, visualizations are passive and require being

watched

 for meaningful deviations to be detected. Once the number of metrics being collected and plotted surpasses the number of eyeballs available to watch them (or even the screen real estate to display them), visual-only analysis becomes less and less useful.

Thresholds/rules

: To get around the requirement of having data be physically watched in order for it to be proactive, many tools allow the user to define rules or conditions that get triggered upon known conditions or known dependencies between items. However, it is unlikely that you can realistically define all appropriate operating ranges or model all of the actual dependencies in today's complex and distributed applications. Plus, the amount and velocity of changes in the application or environment could quickly render any static rule set useless. Analysts found themselves chasing down many false positive alerts, setting up a

boy who cried wolf

 paradigm that led to resentment of the tools generating the alerts and skepticism to the value that alerting could provide.

Ultimately, there needed to be a different approach—one that wasn't necessarily a complete repudiation of past techniques, but one that could bring a level of automation and empirical augmentation of the evaluation of data in a meaningful way. Let's face it, humans are imperfect—we have hidden biases, limitations of capacity for remembering information, and we are easily distracted and fatigued. Algorithms, if done correctly, can easily make up for these shortcomings.

The advent of automated anomaly detection

ML, while a very broad topic that encompasses everything from self-driving cars to game-winning computer programs, was a natural place to look for a solution. If you realize that the majority of the requirements of effective application monitoring or security threat hunting are merely variations on the theme of find me something that is different than normal, then the discipline of anomaly detection emerges as the natural place to begin using ML techniques to solve these problems for IT professionals.

The science of anomaly detection is certainly nothing new, however. Many very smart people have researched and employed a variety of algorithms and techniques for many years. However, the practical application of anomaly detection for IT data poses some interesting constraints that makes the otherwise academically-worthy algorithms inappropriate for the job. These include the following:

Timeliness

: Notification of an outage, breach, or other significant anomalous situation should be known as quickly as possible in order to mitigate it. The cost of downtime or the risk of a continued security compromise is minimized if remedied or contained quickly. Algorithms that cannot keep up with the real-time nature of today's IT data have limited value.

Scalability

: As mentioned earlier, the volume, velocity, and variation of IT data continues to explode in modern IT environments. Algorithms that inspect this vast data must be able to scale linearly with the data to be usable in a practical sense.

Efficiency

: IT budgets are often highly scrutinized for wasteful spending, and many organizations are constantly being asked to

do more with less

. Tacking on an additional fleet of super-computers to run algorithms is not practical. Rather, modest commodity hardware with typical specifications must be able to be employed as part of the solution.

Applicability

: While highly specialized data science is often the best way to solve a specific information problem, the diversity of data in IT environments drive a need for something that can be broadly applicable across the vast majority of use cases. Reusability of the same techniques is much more cost-effective in the long run.

Adaptability

: Ever-changing IT environments will quickly render a brittle algorithm useless in no time. Training and retraining the ML model would only introduce yet another time-wasting venture that cannot be afforded.

Accuracy

: We already know that alert fatigue from legacy threshold and rule-based systems is a real problem. Swapping one false alarm generator for another will not impress anyone.

Ease of use

: Even if all of the previously mentioned constraints could be satisfied, any solution that requires an army of data scientists to implement it would be too costly and would be disqualified immediately.

So, now we are getting to the real meat of the challenge—creating a fast, scalable, accurate, low-cost anomaly detection solution that everyone will use and love because it works flawlessly. No problem!

As daunting as that sounds, Prelert Founder and CTO Steve Dodson took on that challenge back in 2010. While Steve certainly brought his academic chops to the table, the technology that would eventually become Elastic's X-Pack ML had its genesis in the throes of trying to solve real IT application problems—the first being a pesky intermittent outage in a trading platform at a major London finance company. Steve, and a handful of engineers who joined the venture, helped the bank's team use the anomaly detection technology to automatically surface only the needles in the haystacks that allowed the analysts to focus on the small set of relevant metrics and log messages that were going awry. The identification of the root cause (a failing service whose recovery caused a cascade of subsequent network problems that wreaked havoc) ultimately brought stability to the application and prevented the need for the bank to spend lots of money on the prior solution, which was an unplanned, costly network upgrade.

As time passed, however, it became clear that even that initial success was only the beginning. A few years and a few thousand real-world use cases later, the marriage of Prelert and Elastic was a natural one—a combination of a platform making big data easily accessible with technology that helped overcome the limitations of human analysis.

What is described in this text is the theory and operation of the technology in Elastic ML as of version 6.5.

Theory of operation

To get a more intrinsic understanding of how the technology works, we will discuss the following:

A rigorous definition of

unusual

 with respect to the technology

An intuitive example of learning in an unsupervised manner

A description of how the technology models, de-trends, and scores the data

Defining unusual

Anomaly detection is something almost all of us have a basic intuition on. Humans are quite good at pattern recognition, so it should be of no surprise that if I asked a hundred people on the street "what's unusual?" in the following graph, a vast majority (including non-technical people) would identify the spike in the green line: