Learning Elastic Stack 6.0 - Pranav Shukla - E-Book

Learning Elastic Stack 6.0 E-Book

Pranav Shukla

0,0
27,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The Elastic Stack is a powerful combination of tools for distributed search, analytics, logging, and visualization of data from medium to massive data sets. The newly released Elastic Stack 6.0 brings new features and capabilities that empower users to find unique, actionable insights through these techniques. This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications.
After a quick overview of the newly introduced features in Elastic Stack 6.0, you’ll learn how to set up the stack by installing the tools, and see their basic configurations. Then it shows you how to use Elasticsearch for distributed searching and analytics, along with Logstash for logging, and Kibana for data visualization. It also demonstrates the creation of custom plugins using Kibana and Beats. You’ll find out about Elastic X-Pack, a useful extension for effective security and monitoring. We also provide useful tips on how to use the Elastic Cloud and deploy the Elastic Stack in production environments.
On completing this book, you’ll have a solid foundational knowledge of the basic Elastic Stack functionalities. You’ll also have a good understanding of the role of each component in the stack to solve different data processing problems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 409

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Learning Elastic Stack 6.0

 

 

 

 

 

 

 

 

 

A beginner’s guide to distributed search, analytics, and visualization using Elasticsearch, Logstash, and Kibana

 

 

 

 

 

 

 

 

 

 

Pranav Shukla

 

Sharath Kumar M N

 

 

 

 

BIRMINGHAM - MUMBAI

Learning Elastic Stack 6.0

Copyright © 2017 Packt Publishing

 

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

First published: December 2017

 

Production reference: 1201217

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78728-186-8

 

www.packtpub.com

Credits

Authors

Pranav Shukla

Sharath Kumar M N

Copy Editors

Safis Editing

Vikrant Phadkay

Reviewer

Marcelo Ochoa

Project Coordinator

Nidhi Joshi

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Varsha Shetty

Indexer

Aishwarya Gangawane

Content Development Editor

Cheryl Dsa

Graphics

Tania Dutta

Technical Editor

Sagar Sawant

Production Coordinator

Shantanu Zagade

Disclaimer

 

 

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. Kibana is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. Logstash is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. Packetbeat is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.  Elastic is a trademark of Elasticsearch BV or Elastic Cloud is a trademark of Elasticsearch BV or Elastic Cloud Enterprise is a trademark of Elasticsearch BV or X-Pack is a trademark of Elasticsearch BV or Beats is a trademark of Elasticsearch BV or Winlogbeat is a trademark of Elasticsearch BV or Libbeat is a trademark of Elasticsearch BV or Metricbeat is a trademark of Elasticsearch BV or Filebeat is a trademark of Elasticsearch BV or Topbeat is a trademark of Elasticsearch BV or Heartbeat is a trademark of Elasticsearch BV.

About the Authors

Pranav Shukla is the founder and CEO of Valens DataLabs, a technologist, husband, and father of two. He is a big data architect and software craftsman who uses JVM-based languages. Pranav has diverse experience of over 14 years in architecting enterprise applications for Fortune 500 companies and start-ups. His core expertise lies in building JVM-based, scalable, reactive, and data-driven applications using Java/Scala, the Hadoop ecosystem, Apache Spark, and NoSQL databases. He is a big data engineering, analytics, and machine learning enthusiast.

Pranav founded Valens DataLabs with a vision to help companies leverage data to their competitive advantage. Valens DataLabs specializes in developing next-generation, cloud-based, reactive, and data-intensive applications using big data and web technologies. The company believes in agile practices, lean principles, test-driven and behavior-driven development, continuous integration, and continuous delivery for sustainable software systems.

In his free time, he enjoys reading books, playing musical instruments, singing, listening to music, and watching cricket. You can reach him via email at [email protected] and follow him on Twitter at @pranavshukla81. 

I would like to thank my wife Kruti Shukla for her unconditional love and support, our sons Sauhadra and Pratishth, my parents Dr Sharad Shukla and Varsha Shukla. I would like to thank my brother Vishal Shukla for playing an inspirational role in my career and also for inspiring me to write this book. I would like to thank Parth Mistry, Gopal Ghanghar, and Krishna Meet for their valuable feedback for the book. I am grateful to many who have contributed in shaping my career through fruitful interactions, particularly I would like to thank Umesh Kakkad, Eddie Moojen, Wart Fransen, Praveen Sameneni, Vinod Patel, Gopal Shah, and Sachin Bakshi.

 

Sharath Kumar M N has done his masters in Computer Science at The University of Texas, Dallas, USA. He has been in the IT industry for more than ten years now and is the Elasticsearch Solutions Architect at Oracle. He is an Elastic Stack advocate, and being an avid speaker he has also given several tech talks in conferences such as the Oracle Code Event. Sharath is a certified trainer—Elastic Certified Instructor—one of the few technology experts in the world who has been certified by Elastic Inc to deliver their official from the creators of Elastic training. He is also a data science and machine learning enthusiast.

In his free time, he enjoys trekking, listening to music, playing with his lovely pets Guddu and Milo and the geek in him loves exploring his Python skills for stock market analysis. You can reach him via email at [email protected].

I would like to thank my parents, Geetha and Nanjaiah, sister Dr Shilpa M N, brother-in-law Dr Sridhar and my friends - without their support I wouldn't have been able to finish my part of this book in time. I would also like to thank Packt Publishing team(specially Cheryl, Samuel, Varsha, Sagar) for providing a great opportunity for me to take part in this exciting journey. 

About the Reviewer

Marcelo Ochoa works at the systems laboratory of Facultad de Ciencias Exactas, Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina. He is the CTO at www.scotas.com, a company that specializes in near-real-time search solutions using Apache Solr and Oracle. He divides his time between university jobs and external projects related to Oracle and big data technologies. He has worked on several Oracle-related projects, such as the translation of Oracle manuals and multimedia CBTs. His background is in database, network, web, and Java technologies. In the XML world, Marcelo is known as the developer of DB Generator for the Apache Cocoon project. He has worked on the open source projects DBPrism and DBPrism CMS, Lucene-Oracle integration using the Oracle JVM Directory implementation, and the Restlet.org project, where he worked on the Oracle XDB Restlet Adapter, an alternative to writing native REST web services inside a database-resident JVM.

Since 2006, he has been part of an Oracle ACE program and has recently linked to a Docker Mentor program.

Marcelo has coauthored Oracle Database Programming Using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press. He has been a technical reviewer on several Packt books, such as Mastering Elastic Stack, Mastering Elasticsearch 5.x - Third Edition, Elasticsearch 5.x Cookbook - Third Edition, and so on.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available?

You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.in/dp/1787281868.

If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Introducing Elastic Stack

What is Elasticsearch, and why use it?

Schemaless and document-oriented

Searching

Analytics

Rich client library support and the REST API

Easy to operate and easy to scale 

Near real time

Lightning fast

Fault tolerant

Exploring the components of Elastic Stack

Elasticsearch

Logstash

Beats

Kibana

X-Pack

Security

Monitoring

Reporting

Alerting

Graph

Elastic Cloud

Use cases of Elastic Stack

Log and security analytics

Product search

Metrics analytics

Web search and website search

Downloading and installing

Installing Elasticsearch

Installing Kibana

Summary

Getting Started with Elasticsearch

Using the Kibana Console UI

Core concepts

Index

Type

Document

Node

Cluster

Shards and replicas

Mappings and data types

Data types

Core datatypes

Complex datatypes

Other datatypes

Mappings

Creating an index with the name catalog

Defining the mappings for the type of product

Inverted index

CRUD operations

Index API

Indexing a document by providing an ID

Indexing a document without providing an ID

Get API

Update API

Delete API

Creating indexes and taking control of mapping

Creating an index

Creating type mapping in an existing index

Updating a mapping

REST API overview

Common API conventions

Formatting the JSON response

Dealing with multiple indices

Searching all documents in one index

Searching all documents in multiple indexes

Searching all documents of a particular type in all indices

Summary

Searching-What is Relevant

Basics of text analysis

Understanding Elasticsearch analyzers

Character filters

Tokenizer

Standard Tokenizer

Token filters

Using built-in analyzers

Standard Analyzer

Implementing autocomplete with a custom analyzer

Searching from structured data

Range query

Range query on numeric types

Range query with score boosting

Range query on dates

Exists query

Term query

Searching from full text

Match query

Operator

minimum_should_match

Fuzziness

Match phrase query

Multi match query

Querying multiple fields with defaults

Boosting one or more fields

With types of multi match queries

Writing compound queries

Constant score query

Bool query

Combining OR conditions

Combining conditions AND and OR conditions

Adding NOT conditions

Summary

Analytics with Elasticsearch

The basics of aggregations

Bucket aggregations

Metric aggregations

Matrix aggregations

Pipeline aggregations

Preparing data for analysis

Understanding the structure of data

Loading the data using Logstash

Metric aggregations

Sum, average, min, and max aggregations

Sum aggregation

Average aggregation

Min aggregation

Max aggregation

Stats and extended stats aggregations

Stats aggregation

Extended stats Aggregation

Cardinality aggregation

Bucket aggregations

Bucketing on string data

Terms aggregation

Bucketing on numeric data

Histogram aggregation

Range aggregation

Aggregations on filtered data

Nesting aggregations

Bucketing on custom conditions

Filter aggregation

Filters aggregation

Bucketing on date/time data

Date Histogram aggregation

Creating buckets across time

Using a different time zone

Computing other metrics within sliced time intervals

Focusing on a specific day and changing intervals

Bucketing on geo-spatial data

Geo distance aggregation

GeoHash grid aggregation

Pipeline aggregations

Calculating the cumulative sum of usage over time

Summary

Analyzing Log Data

Log analysis challenges

Logstash 

Installation and configuration

Prerequisites

Downloading and installing Logstash

Installing on Windows

Installing on Linux

Running Logstash

Logstash architecture

Overview of Logstash plugins

Installing or updating plugins

Input plugins

Output plugins

Filter plugins

Codec plugins

Exploring plugins

Exploring Input plugins

File

Beats

JDBC

IMAP

Output plugins

Elasticsearch

CSV

Kafka

PagerDuty

Codec plugins

JSON

Rubydebug 

Multiline

Filter plugins

Ingest node

Defining a pipeline 

Ingest APIs

Put pipeline API

Get Pipeline API

Delete pipeline API

Simulate pipeline API

Summary

Building Data Pipelines with Logstash

Parsing and enriching logs using Logstash

Filter plugins

CSV filter 

Mutate filter

Grok filter

Date filter

Geoip filter

Useragent filter

Introducing Beats

Beats by Elastic.co

Filebeat

Metricbeat

Packetbeat

Heartbeat

Winlogbeat

Auditbeat

Community Beats

Logstash versus Beats

Filebeat

Downloading and installing Filebeat

Installing on Windows

Installing on Linux

Architecture

Configuring Filebeat

Filebeat prospectors

Filebeat global options

Filebeat general options

Output configuration 

Filebeat modules

Summary

Visualizing data with Kibana

Downloading and installing Kibana

Installing on Windows

Installing on Linux

Configuring Kibana

Data preparation

Kibana UI

User interaction

Configuring the index pattern

Discover

Elasticsearch query string

Elasticsearch DSL query

Visualize

Kibana aggregations

Bucket aggregations

Metric

Creating a visualization

Visualization types

Line, area, and bar charts

Data table

MarkDown widget

Metric

Goal

Gauge

Pie charts

Co-ordinate maps

Region maps

Tag cloud

Visualizations in action

Response codes over time

Top 10 URLs requested

Bandwidth usage of top five countries over time

Web traffic originating from different countries

Most used user agent

Dashboards

Creating a dashboard

Saving the dashboard 

Cloning the dashboard

Sharing the dashboard 

Timelion

Timelion UI

Timelion expressions

Using plugins

Installing plugins

Removing plugins

Summary

Elastic X-Pack

Installing X-Pack 

Installing X-Pack on Elasticsearch

Installing X-Pack on Kibana

Uninstalling X-Pack

Configuring X-Pack

Security

User authentication

User authorization

Security in action

New user creation

Deleting a user

Changing the password

New role creation

How to Delete/Edit a role

Document-level security or field-level security

X-Pack security APIs

User management APIs

Role management APIs

Monitoring Elasticsearch

Monitoring UI

Elasticsearch metrics

Overview tab

Nodes tab

The Indices tab

Alerting

Anatomy of a watch

Alerting in action

Create a new alert

Threshold Alert

Advanced Watch

How to Delete/Deactivate/Edit a Watch

Summary

Running Elastic Stack in Production

Hosting Elastic Stack on a managed cloud

Getting up and running on Elastic Cloud

Using Kibana

Overriding configuration 

Recovering from a snapshot

Hosting Elastic Stack on your own

Selecting hardware

Selecting an operating system

Configuring Elasticsearch nodes

JVM heap size

Disable swapping

File descriptors

Thread pools and garbage collector

Managing and monitoring Elasticsearch

Running in Docker containers

Special considerations while deploying to a cloud

Choosing instance type

Changing default ports; do not expose ports!

Proxy requests

Binding HTTP to local addresses

Installing EC2 discovery plugin

Installing S3 repository plugin

Setting up periodic snapshots

Backing up and restoring

Setting up a repository for snapshots

Shared filesystem

Cloud or distributed filesystems

Taking snapshots

Restoring a specific snapshot

Setting up index aliases

Understanding index aliases

How index aliases can help

Setting up index templates

Defining an index template

Creating indexes on the fly

Modeling time series data

Scaling the index with unpredictable volume over time

Unit of parallelism in Elasticsearch

The effect of the number of shards on the relevance score

The effect of the number of shards on the accuracy of aggregations

Changing the mapping over time

New fields get added

Existing fields get removed

Automatically deleting older documents

How index-per-timeframe solves these issues

Scaling with index-per-timeframe

Changing the mapping over time

Automatically deleting older documents

Summary

Building a Sensor Data Analytics Application

Introduction to the application

Understanding the sensor-generated data

Understanding the sensor metadata

Understanding the final stored data

Modeling data in Elasticsearch

Defining an index template

Understanding the mapping

Setting up the metadata database

Building the Logstash data pipeline

Accept JSON requests over the web

Enrich the JSON with the metadata we have in the MySQL database

The jdbc_streaming plugin 

The mutate plugin

Move the looked-up fields that are under lookupResult directly in JSON

Combine the latitude and longitude fields under lookupResult as a location field

Remove the unnecessary fields

Store the resulting documents in Elasticsearch

Sending data to Logstash over HTTP

Visualizing the data in Kibana

Set up an index pattern in Kibana

Build visualizations

How does the average temperature change over time?

How does the average humidity change over time?

How do temperature and humidity change at each location over time?

Can I visualize temperature and humidity over a map?

How are the sensors distributed across departments?

Create a dashboard

Summary

Monitoring Server Infrastructure

Metricbeat

Downloading and installing Metricbeat

Installing on Windows

Installing on Linux

Architecture

Event structure

Configuring Metricbeat

Module configuration

Enabling module configs in the modules.d directory

Enabling module config in the metricbeat.yml file

General settings

Output configuration 

Logging

Capturing system metrics

Running Metricbeat with the system module

Specifying aliases

Visualizing system metrics using Kibana

 Deployment architecture

Summary

Preface

Elastic Stack is a powerful combination of tools for the distributed search, analytics, logging, and visualization of data from medium to massive data sets. The newly released Elastic Stack 6.0 brings new features and capabilities that empower users to find unique, actionable insights through these techniques. This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications. After a quick overview of the newly introduced features in Elastic Stack 6.0, you'll learn how to set up the stack by installing the tools, and see their basic configurations. Then the book shows you how to use Elasticsearch for distributed searching and analytics, along with Logstash for logging, and Kibana for data visualization. It also demonstrates the creation of custom plugins using Kibana and Beats. You'll find out about Elastic X-Pack, a useful extension for effective security and monitoring. We also provide useful tips on how to use the Elastic Cloud and deploy Elastic Stack in production environments.

What this book covers

Chapter 1, Introducing Elastic Stack, motivates the reader by introducing the core components of Elastic Stack, importance of distributed, scalable search and analytics that Elastic Stack offers with use cases of ElasticSearch. The chapter gives a brief introduction to all core components, shows where do they fit in the overall stack, and details the purpose of each component. It concludes with instructions for downloading and installing ElasticSearch and Kibana to get started.

Chapter 2, Getting Started with ElasticSearch, introduces the core concepts involved in ElasticSearch, which forms the backbone of the Elastic Stack. Concepts such as indexes, types, nodes, and clusters are introduced. The reader is introduced to the REST API for performing essential operations, datatypes, and mappings.

Chapter 3, Searching What Is Relevant, focuses on the search use-case for ElasticSearch. It introduces the concepts of text analysis, tokenizers, analyzers, and the need for analysis and relevance-based searching. The chapter uses and example use-case to cover the relevance based search topics.

Chapter 4, Analytics with ElasticSearch, covers various types of aggregations with examples to gain fundamental understanding. It starts off with very simple to complex aggregations to get powerful insights from terabytes of data. The chapter also covers reasons for using different types of aggregations.

Chapter 5, Analyzing Log Data, lays the foundation for the motivation behind logstash, the architecture of logstash, and installing and configuring logstash to set up basic data pipelines. Elastic 5 introduced Ingest Node, which can be used instead of a dedicated Logstash setup. We will also cover building pipelines using Elastic Ingest Nodes.

Chapter 6, Building Data Pipelines with Logstash, builds on the fundamental knowledge of Logstash by transformations and aggregation related filters. It covers how a rich set of filters brings Logstash closer to the other real-time and near-real-time stream processing frameworks with zero coding. It introduces the Beats platform, and the FileBeat component, which is used to transport log files from the edge machines.

Chapter 7, Visualizing Data with Kibana, covers how to effectively use Kibana to build beautiful dashboards for effective storytelling about your data. It uses a sample dataset and provides step-by-step guidance on creating visualizations in a few clicks.

Chapter 8, Elastic X-Pack, since we have covered ElasticSearch and the core components that help us build data pipelines and visualize data, it's now time to add the extensions needed for specific use cases. This chapter shows you how to install and configure X-Pack components in Elastic Stack and teaches you to secure, monitor, and use alerting extensions.  

Chapter 9, Building a Sensor Data Analytics Application, puts together a complete application for sensor data analytics with the concepts learned so far. It shows you how to model your data in ElasticSearch, how to build the data-pipeline to ingest the data and how to visualize it using Kibana. The chapter also demonstrates how to effectively use X-Pack components to secure and monitor your pipeline, and get alerts when certain conditions are met.

Chapter 10, Running Elastic Stack in Production, covers recommendations on how to deploy Elastic Stack to production. It provides recommendations for taking your application to production and guidelines on typical configurations that need to be looked at for different use cases. It also covers deploying into cloud-based hosted providers such as Elastic Cloud.

Chapter 11, Monitoring Server Infrastructure, shows how we can use Elastic Stack to set up a real-time monitoring solution for your servers, applications that are built completely using Elastic Stack. It introduces another component of the Beats platform, MetricBeat, which is used to monitor servers/applications.

What you need for this book

This book will guide you through the installation of all the tools that you need to follow the examples and download the following files with the version:

Elasticsearch 6.0

Kibana 6.0

Who this book is for

This book is for data professionals who want to get amazing insights and business metrics from their data sources. If you want to get a fundamental understanding of Elastic Stack for the distributed, real-time processing of data, this book will help you. A fundamental knowledge of JSON would be useful, but is not mandatory. No previous experience with Elastic Stack is required.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next lines of code read the link and assign it to the BeautifulSoupfunction." A block of code is set as follows:

#import packages into the project from bs4 import BeautifulSoup from urllib.request import urlopen import pandas as pd

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default] exten => s,1,Dial(Zap/1|30) exten => s,2,

Voicemail

(u100) exten => s,

102

,Voicemail(b100) exten => i,1,Voicemail(

s0

)

Any command-line input or output is written as follows:

C:\Python34\Scripts> pip install -upgrade pip

C:\Python34\Scripts> pip install pandas

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go toFiles|Settings|Project Name|Project Interpreter."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:

Log in or register to our website using your email address and password.

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Elastic-Stack-6. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/LearningElasticStack6_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Introducing Elastic Stack

We are living in an advanced stage of the information age. The emergence of the web, mobiles, social networks, blogs, and photo sharing has created a massive amount of data in recent years. These new data sources create information that cannot be handled using traditional data storage technology, typically relational databases. As an application developer or business intelligence developer, your job is to fulfill the search and analytics needs of the application.

A number of big data scale data stores have emerged in the last few years. This includes Hadoop ecosystem projects, several NoSQL databases, and search and analytics engines such as Elasticsearch. Hadoop and each NoSQL database have their own strengths and use cases. 

Elastic Stack is a rich ecosystem of components serving as a full search and analytics stack. The main components of Elastic Stack are Kibana, Logstash, Beats, X-Pack, and Elasticsearch. Elasticsearch is at the heart of Elastic Stack, providing storage, search, and analytics capabilities. Kibana, which is also called a window into Elastic Stack, is a great visualization and user interface for Elastic Stack. Logstash and Beats help in getting the data into Elastic Stack. X-Pack provides powerful features including monitoring, alerting, and security to make your system production ready. Since Elasticsearch is at the heart of Elastic Stack, we will cover the stack inside-out, starting from the heart and moving on to the surrounding components.

In this chapter, we will cover the following topics:

What is Elasticsearch, and why use it?

A brief history of Elasticsearch and Apache Lucene

Elastic Stack components 

Use cases of Elastic Stack

We will look at what Elasticsearch is and why you should consider it as your data store. Once you know the key strengths of Elasticsearch, we will look at the history of Elasticsearch and its underlying technology, Apache Lucene. We will then look at some use cases of Elastic Stack, and we will provide an overview of the Elastic Stack components.

What is Elasticsearch, and why use it?

Since you are reading this book, you probably already know what Elasticsearch is. For the sake of completeness, let us define Elasticsearch.

Elasticsearch is a realtime, distributed search and analytics engine that is horizontally scalable and capable of solving a wide variety of use cases. At the heart of Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Elasticsearch is at the core of Elastic Stack, playing the central role of a search and analytics engine. Elasticsearch is built on a radically different technology, Apache Lucene. This fundamentally different technology in Elasticsearch sets it apart from traditional relational databases and other NoSQL solutions. Let us look at the key benefits of using Elasticsearch as your data store:

Schemaless, document-oriented

Searching

Analytics

Rich client library support and the REST API

Easy to operate and easy to scale

Near real time

Lightning fast

Fault tolerant

Let us look at each benefit one by one.

Schemaless and document-oriented

Elasticsearch does not impose a strict structure on your data; you can store any JSON documents. JSON documents are first class citizens in Elasticsearch as opposed to rows and columns in a relational database. A document is roughly equivalent to a record in a relational database table. Traditional relational databases require a schema to be defined beforehand to specify a fixed set of columns and their datatypes and sizes. Often the nature of data is very dynamic, requiring support for new or dynamic columns. The JSON documents naturally support this type of data. For example, take a look at the following document:

{ "name": "John Smith", "address": "121 John Street, NY, 10010", "age": 40 }

This document may represent a customer's record. Here the record has the name, address, and age of the customer. Another record may look like the following one:

{ "name": "John Doe", "age": 38, "email": "[email protected]" }

Note that the second customer doesn't have the address field, but instead has an email address. In fact, other customer documents may have completely different sets of fields. This provides a tremendous amount of flexibility in terms of what can be stored.

Searching

The core strength of Elasticsearch lies in its text processing capabilities. Elasticsearch is great at searching, especially a full-text search. Let us understand what a full-text search is.

Full-text search means searching through all the terms of all the documents available in the database. This requires the entire contents of all documents to be parsed and stored beforehand. When you hear full-text search, think of Google Search. You can enter any search term and Google looks through all of the web pages on the internet to find the best matching web pages. This is quite different from simple SQL queries run against columns of type string in relational databases. Normal SQL queries with a WHERE clause and an equals () or LIKE clause try to do an exact or wild-card match with underlying data. SQL queries can, at best, just match the search term to a sub-string within the text column.

When you want to perform a search similar to Google search on your own data, Elasticsearch is your best bet. You can index emails, text documents, PDF files, web pages, or practically any unstructured text documents and search across all your documents with search terms.

At a high level, Elasticsearch breaks up text data into terms and makes every term searchable by building Lucene indexes. You can build your own Google-like search for your application which is very fast and flexible.

In addition to supporting text data, Elasticsearch also supports other data types such as numbers, dates, geolocations, IP addresses, and many more. We will take an in-depth look at search in Chapter 3, Searching-What is Relevant.

Analytics

Apart from search, the second most important functional strength of Elasticsearch is analytics. Yes, what was originally known just as a full-text search engine is now used as an analytics engine in a variety of use cases. Many organizations are running analytics solutions powered by Elasticsearch in production.

Search is like zooming in and finding a needle in a haystack. Search helps zoom in on precisely what is needed in huge amounts of data. Analytics is exactly the opposite of search; it is about zooming out and taking a look at the bigger picture. For example, you may want to know how many visitors on your website are from the United States as opposed to every other country, or you may want to know how many of your websites visitors use macOS, Windows, or Linux.

Elasticsearch supports a wide variety of aggregations for analytics. Elasticsearch aggregations are quite powerful and can be applied to various datatypes. We will take a look at the analytics capabilities of Elasticsearch in Chapter 4, Analytics with Elasticsearch.

Rich client library support and the REST API

Elasticsearch has very rich client library support to make it accessible by many programming languages. There are client libraries available for Java, C#, Python, JavaScript, PHP, Perl, Ruby, and many more. Apart from the official client libraries, there are community driven libraries for 20 plus programming languages. 

Additionally, it has a very rich REST (Representational State Transfer) API which works on an HTTP protocol. The REST API is very well documented and quite comprehensive, making all operations available over HTTP.

All this means that Elasticsearch is very easy to integrate in any application to fulfill your search and analytics needs.

Easy to operate and easy to scale 

Elasticsearch can run on a single node and easily scale out to hundreds of nodes. It is very easy to start a single node instance of Elasticsearch; it works out of the box without any configuration changes and scales to hundreds of nodes.

Horizontal scalability is the ability to scale a system horizontally by starting up multiple instances of the same type rather than making one instance more and more powerful. Vertical scaling is about upgrading a single instance by adding more processing power (by increasing the number of CPUs or CPU cores), memory, or storage capacity. There is a practical limit to how much a system can be scaled vertically due to cost and other factors, such as the availability of higher end hardware. 

Unlike most traditional databases which only allow vertical scaling, Elasticsearch can be scaled horizontally. It can run on tens or hundreds of commodity nodes instead of one extremely expensive server. Adding a node to an existing Elasticsearch cluster is as easy as starting up a new node in the same network, with virtually no extra configuration. The client application doesn't need to change, whether it is running against a single node or a hundred node cluster.

Near real time

Data is available for querying typically within a second after it has been indexed (saved). Not all big data storage systems are real-time capable. Elasticsearch allows you to index thousands to hundreds of thousands of documents per second and makes them available for searching almost immediately.

Lightning fast

Elasticsearch uses Apache Lucene as its underlying technology.By default, Elasticsearch indexes all the fields of your documents. This is extremely invaluable as you can query or search by any field in your records. You will never be in a situation in which you think if only I had chosen to create an index on this field. Elasticsearch contributors have leveraged Apache Lucene to its best advantage, and there are other optimizations which make it lightning fast.

Fault tolerant

Elasticsearch clusters can keep running even when there are hardware failures such as node failure and network failure. In the case of node failure, it replicates all the data that was on the failed node to another node in the cluster. In the case of network failure, Elasticsearch seamlessly elects master replicas to keep the cluster running. Whether it is node or network failure, you can rest assured that your data is safe.

Now that you know when and why Elasticsearch could be a great choice, let us take a high level view of the ecosystem—the Elastic Stack.

Exploring the components of Elastic Stack

The Elastic Stack components are shown in the following figure. It is not necessary to include all of them in your solution. Some components are general purpose and they can be used outside of Elastic Stack without using any of the other components.

Let us look at the purpose of each component and how they fit in the stack:

Elasticsearch

Elasticsearch is at the heart of Elastic Stack. It stores all your data and provides search and analytic capabilities in a scalable way.We have already looked at the strengths of Elasticsearch and why you would want to use it. Elasticsearch can be used without using any other components to power your application in terms of search and analytics. We will cover Elasticsearch in great detail in Chapter 2, Getting Started with Elasticsearch, Chapter 3, Searching-What is Relevant, and Chapter 4, Analytics with Elasticsearch.

Logstash

Logstash helps in centralizing event data such as logs,metrics, or any other data in any format. It can perform a number of transformations before sending it to a stash of your choice.It is a key component of Elastic Stack, used to centralize the collection and transformation processes in your data pipeline.

Logstash is a server side component. Its role is to centralize the collection of data from a wide number of input sourcesin a scalable way, and transform and send the data to an output of your choice. Typically, the output is sent to Elasticsearch, but Logstash is capable of sending it to a wide variety of outputs. Logstash has a plugin-based, extensible architecture. It supports three types of plugin: input plugins, filter plugins, and output plugins. Logstash has a collection of 200 plus supported plugins and the count is ever increasing.

Logstash is an excellent general purpose data flow engine which helps in building real-time, scalable data pipelines.

Beats

Beats is a platform of open source lightweight data shippers. Its role is complementary to Logstash. Logstash is a server-side component, whereas Beats has a role on the client side. Beats consists of a core library, libbeat, which provides an API for shipping data from the source, configuring the input options, and implementing logging. Beats is installed on machines that are not part of server-side components such as Elasticsearch, Logstash, or Kibana. These agents reside on non-cluster nodes which may also be called edge nodes sometimes.

There are many Beat components that have already been built by the Elastic team and the open source community. The Elastic team has built Beats including, Packetbeat, Filebeat, Metricbeat, Winlogbeat, Audiobeat, and Heartbeat. 

Filebeat is a single-purpose Beat built to ship log files from your servers to a centralized Logstash server or Elasticsearch server. Metricbeat is a server monitoring agent that periodically collects metrics from the operating systems and services running on your servers. There are already around 40 community Beats built for specific purposes such as monitoring Elasticsearch, Cassandra, the Apache web server, JVM performance, and so on. You can build your own beat using libbeat if you don't find one that fits your needs.

We will take a deep dive into Logstash and Beats in Chapter 5, Analyzing Log Data and Chapter 6, Building Data Pipelines with Logstash.

Kibana

Kibana is the visualization tool of Elastic Stack which can help you gain powerful insights about your data in Elasticsearch. It is often called a window into Elastic Stack. It offers many visualizations includinghistograms,maps, line charts, time series,and more. You can build visualizations with just a few clicks and interactively explore the data. It lets you build beautiful dashboards by combining different visualizations, sharing with others, and exporting high quality reports.

Kibana also has management and development tools. You can manage settings and configure X‑Pack security features for the Elastic Stack. Kibana also has development tools which enable developers to build and test REST API requests.

We will explore Kibana in Chapter 7, Visualizing Data with Kibana.

X-Pack

X-Pack adds essential features to make Elastic Stack production ready. It adds security, monitoring, alerting, reporting, and graph capabilities to Elastic Stack.

Security

The security plugin within X-Pack adds authentication and authorization capabilities to Elasticsearch and Kibana so that only authorized people have access to the data, and they see only what they are allowed to see. The security plugin works across components seamlessly, securing access to Elasticsearch and Kibana.

The security extension also lets you configure fields and document level security with the licensed version.

Monitoring

You can monitor your Elastic Stack components so that there is no downtime. The monitoring component in X-Pack lets you monitor your Elasticsearch clusters and Kibana.

You can monitor clusters, nodes, and index level metrics. The monitoring plugin maintains a history of performance so that you can compare the current metrics with the past metrics. It also has a capacity planning feature.

Reporting

The reporting plugin within X-Pack allows for generating printable, high-quality reports from Kibana visualizations. The reports can be scheduled to run periodically or on a per event basis.

Alerting

X-Pack has sophisticated alerting capabilities that can alert you in multiple possible ways when certain conditions are met. It gives tremendous flexibility in terms of when, how, and who to alert. 

You may be interested in detecting security breaches, such as when someone has five login failures within an hour from different locations, or when your product is trending on social media. You can use the full power of Elasticsearch queries to check when complex conditions are met.

Alerting provides a wide variety of options in terms of how alerts are sent. It can send alerts via email, Slack, Hipchat, and PagerDuty.

Graph

Graph lets you explore relationships in your data. The data in Elasticsearch is generally perceived as a flat list of entities without connections to other entities. This relationship opens up the possibility of new use cases. Graph can surface relationships among entities which share common properties such as people, places, products, or preferences. 

Graph consists of Graph API and a UI within Kibana to let you explore this relationship. Under the hood, it leverages distributed querying, indexing at scale, and the relevance capabilities of Elasticsearch.

We will look at the some of X-Pack components in Chapter 8, Elastic X-Pack.

Elastic Cloud

Elastic Cloud is the cloud-based, hosted, and managed setup of Elastic Stack components. The service is provided by the company Elastic (https://www.elastic.co/). Elastic is the company behind the development of Elasticsearch and other Elastic Stack components. All Elastic Stack components are open source except X-Pack (and Elastic Cloud). The company Elastic provides services for Elastic Stack components including training, development, support, and cloud hosting.

Apart from Elastic Cloud, there are other hosted solutions available for Elasticsearch including one from Amazon Web Services (AWS). The advantage of Elastic Cloud is that it is developed and maintained by the original creators of Elasticsearch and other Elastic Stack components.

Use cases of Elastic Stack

Elastic Stack components have a variety of practical use cases, and new use cases are emerging as more plugins are added to existing components. As mentioned earlier, you may use a subset of the components for your use case. The following example use cases are by no means exhaustive, but are some of the most common ones:

Log and security analytics

Product search

Metrics analytics

Web search and website search

Let us look at each use case.

Log and security analytics

The Elasticsearch, Logstash, and Kibana trio was very popular as an ELK stack previously. The presence of Elasticsearch, Logstash, and Kibana (also known as ELK) makes Elastic Stack an excellent stack for aggregating and analyzing logs in a central place.

The application support teams face a great challenge administering and managing large numbers of applications deployed across tens or hundreds of servers. The application infrastructure could have the following components:

Web servers

Application servers

Database servers

Message brokers

Typically, enterprise applications have all or most of the types of servers which were explained earlier, and there are multiple instances of each server. In the event of an error or production issue, the support team has to log in to individual servers and look at the errors. It is quite inefficient to log in to individual servers and look at the raw log files. Elastic Stack provides a complete tool set to collect, centralize, analyze, visualize, alert, and report the errors as they occur. Here is how each component can be used to solve this problem:

The Beats framework, Filebeat in particular, can run as a lightweight agent to collect and forward the logs.

Logstash can centralize the events received from Beats, and parse and transform each log entry before sending it to the Elasticsearch cluster.

Elasticsearch indexes the logs. It enables both search and analytics on the parsed logs.

Kibana then lets you create visualizations based on errors, warnings, and other information logs. It lets you create dashboards where you can centrally monitor events as they occur, in real time.

With X-Pack, you can secure the solution, configure alerts, get reports, and analyze relationships in the data.

As you can see, you can get a complete log aggregation and monitoring solution using Elastic Stack.

A security analytics solution would be very similar to this; the logs and events being fed into the system would pertain to firewalls, switches, and other key network elements.

Product search