29,99 €
A beginner's guide to analyzing and visualizing your Elasticsearch data using Kibana 7 and Timelion
Kibana is a window into the Elastic Stack, that enables the visual exploration and real-time analysis of your data in Elasticsearch. This book will help you understand the core concepts of the use of Kibana 7 for rich analytics and data visualization.
If you’re new to the tool or want to get to grips with the latest features introduced in Kibana 7, this book is the perfect beginner's guide. You’ll learn how to set up and configure the Elastic Stack and understand where Kibana sits within the architecture. As you advance, you’ll learn how to ingest data from different sources using Beats or Logstash into Elasticsearch, followed by exploring and visualizing data in Kibana. Whether working with time series data to create complex graphs using Timelion or embedding visualizations created in Kibana into your web applications, this book covers it all. It also covers topics that every Elastic developer needs to be aware of, such as installing and configuring Application Performance Monitoring (APM) servers and agents. Finally, you’ll also learn how to create effective machine learning jobs in Kibana to find anomalies in your data.
By the end of this book, you’ll have a solid understanding of Kibana, and be able to create your own visual analytics solutions from scratch.
If you’re an aspiring Elastic developer or data analysts, this book is for you. You’ll also find it useful if you want to get up to speed with the new features of Kibana 7 and perform data visualization on enterprise data. No prior knowledge of Kibana is expected, but some experience with Elasticsearch will be helpful.
Anurag Srivastava is a senior technical lead in a multinational software company. He has more than 12 years' experience in web-based application development. He is proficient in designing architecture for scalable and highly available applications. He has handled dev teams and several clients from all over the globe over the last 10 years of his professional career. He has significant experience with the Elastic stack (Elasticsearch, Logstash, and Kibana) for creating dashboards using system metrics data, log data, application data, or relational databases. He has authored two books – Mastering Kibana 6.x, and Kibana 7 Quick Start Guide, published previously by Packt. Bahaaldine Azarmi, or Baha for short, is the head of solutions architecture in EMEA South at Elastic. Prior to this position, Baha co-founded ReachFive, a marketing data platform focused on user behavior and social analytics. He also worked for a number of different software vendors, including Talend and Oracle, where he held positions as a solutions architect and architect. Prior to Machine Learning with the Elastic Stack, Baha authored books including Learning Kibana 5.0, Scalable Big Data Architecture, and Talend for Big Data. He is based in Paris and holds an MSc in computer science from Polytech'Paris.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 256
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey VarangaonkarAcquisition Editor: Nelson MorrisContent Development Editors: Pratik Andrade, Anugraha ArunagiriSenior Editor: Ayaan HodaTechnical Editor: Snehal Dalmet, Dinesh PawarCopy Editor: Safis EditingProject Coordinator: Vaidehi SawantProofreader: Safis EditingIndexer: Manju ArasanProduction Designer: Deepika Naik
First published: February 2017 Second edition: July 2019
Production reference: 1190719
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83855-036-3
www.packtpub.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Anurag Srivastava is a senior technical lead in a multinational software company. He has more than 12 years' experience in web-based application development. He is proficient in designing architecture for scalable and highly available applications. He has handled dev teams and multiple clients from all over the globe over the past 10 years of his professional career. He has significant experience with the Elastic Stack (Elasticsearch, Logstash, and Kibana) for creating dashboards using system metrics data, log data, application data, or relational databases. He has authored other two books—Mastering Kibana 6.x, and Kibana 7 Quick Start Guide, both published by Packt.
Bahaaldine Azarmi, or Baha for short, is the head of solutions architecture in the EMEA South region at Elastic. Prior to this position, Baha co-founded ReachFive, a marketing data platform focused on user behavior and social analytics. He has also worked for a number of different software vendors, including Talend and Oracle, where he held positions as a solutions architect and architect. Prior to Machine Learning with the Elastic Stack, Baha authored books including Learning Kibana 5.0, Scalable Big Data Architecture, and Talend for Big Data. He is based in Paris and holds an MSc in computer science from Polytech'Paris.
Giacomo Veneri graduated in computer science from the University of Siena. He holds a PhD in neuroscience, along with having various scientific publications to his name. He is Predix IoT-certified and an influencer, as well as being certified in SCRUM and Oracle Java. He has 20 years' experience as an IT architect and team leader. He has been an expert on IoT in the fields of oil and gas and transportation since 2013. He lives in Tuscany, where he loves cycling. He is also the author of Hands-On Industrial Internet of Things and Maven Build Customization, both published by Packt.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Learning Kibana 7  Second Edition
Dedication
About Packt
Why subscribe?
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Understanding Kibana 7
Understanding Your Data for Kibana
Industry challenges
Use cases to explain industry issues
Understanding your data for analysis in Kibana
Data shipping
Data ingestion
Storing data at scale
Visualizing data
Technology limitations
Relational databases
Hadoop
NoSQL
Components of the Elastic Stack
Elasticsearch
Beats
Logstash
Kibana
X-Pack
Security
Monitoring
Alerting
Reporting
Summary
Installing and Setting Up Kibana
Installing Elasticsearch
Elasticsearch installation using the .zip or .tar.gz archives
Downloading and installing using the .zip archive
Downloading and installing using the .tar.gz archive
Running Elasticsearch
Elasticsearch installation on Windows using the .zip package
Downloading and installing the .zip package
Running Elasticsearch
Installing Elasticsearch as a service
Elasticsearch installation using the Debian package
Installing Elasticsearch using the apt repository
Manually installing using the Debian package
Elasticsearch installation using RPM
Installing using the apt repository
Manually installing using RPM
Running Elasticsearch
Running Elasticsearch with SysV
Running Elasticsearch with systemd
Checking whether Elasticsearch is running
Installing Kibana
Kibana installation using the .zip or .tar.gz archives
Downloading and installing using the .tar.gz archive
Running Kibana
Downloading and installing using the .zip archive
Running Kibana
Kibana installation using the Debian package
Installing using the apt repository
Manually installing Kibana using the Debian package
Running Kibana
Running Kibana with SysV
Running Kibana with systemd
Kibana installation using RPM
Installing using the apt repository
Manually installing using RPM
Running Kibana
Running Kibana with SysV
Running Kibana with systemd
Installing Logstash
Installing Logstash using the downloaded binary
Installing Logstash from the package repositories
Installing Logstash using the apt package
Installing Logstash using the yum package
Running Logstash as a service
Running Logstash using systemd
Running Logstash using upstart
Running Logstash using SysV
Installing Beats
Installing Filebeat
deb
rpm
macOS
Linux
win
Installing Metricbeat
deb
rpm
macOS
Linux
win
Installing Packetbeat
deb
rpm
macOS
Linux
win
Installing Heartbeat
deb
rpm
macOS
Linux
win
Installing Winlogbeat
Summary
Section 2: Exploring the Data
Business Analytics with Kibana
Understanding logs
Data modeling
Importing data
Beats
Configuring Filebeat to import data we need to enable the following command in the input section of the filebeat.yml file
Reading log files using Filebeat
Logstash
Reading CSV data using Logstash
Reading MongoDB data using Logstash
Reading MySQL data using Logstash
Creating an index pattern
Summary
Visualizing Data Using Kibana
Creating visualizations in Kibana
Identifying the data to visualize
Creating an area chart, a line chart, and a bar chart
Creating a pie chart
Creating the heatmap
Creating the data table
Creating the metric visualization
Creating the tag cloud
Inspecting the visualization
Sharing the visualization
Creating dashboards in Kibana
Sharing the dashboard
Generating reports
Summary
Section 3: Tools for Playing with Your Data
Dev Tools and Timelion
Introducing Dev Tools
Console
Search profiler
Aggregation profile
Grok Debugger
Timelion
.es()
.label()
.color()
.static()
.bars()
.points()
.derivative()
.holt()
.trend()
.mvavg()
A use case of Timelion
Summary
Space and Graph Exploration in Kibana
Kibana spaces
Creating a space
Editing a space
Deleting a space
Switching between spaces
Moving saved objects between spaces
Restricting space access
Creating a role to provide access to a space
Creating a user and assigning the space access role
Checking the user space access
Kibana graphs
Differences with industry graph databases
Creating a Kibana graph
Advanced graph exploration
Summary
Section 4: Advanced Kibana Options
Elastic Stack Features
Security
Roles
Users
Monitoring
Elasticsearch Monitoring
Kibana Monitoring
Alerting
Creating a threshold alert
Reporting
CSV reports
PDF and PNG reports
Summary
Kibana Canvas and Plugins
Kibana Canvas
Introduction to Canvas
Customizing the workpad
Managing assets
Adding elements
Data tables
Designing the data table
Pie charts
Images
Creating a presentation in Canvas
Kibana plugins
Installing plugins
Removing plugins
Available plugins
Summary
Application Performance Monitoring
APM components
APM agents
The APM Server
Installing the APM Server
APT
YUM
APM Server installation on Windows
Running the APM Server
Configuring the APM Server
Elasticsearch
Kibana
Configuring an application with APM
Configuring the APM agent for the Django application
Running the Django application
Monitoring the APM data
Summary
Machine Learning with Kibana
What is Elastic machine learning?
Machine learning features
Creating machine learning jobs
Data visualizer
Single metric jobs
Practical use case to explain machine learning
Forecasting using machine learning
Multi-metric jobs
Population jobs
Job management
Job settings
Job config
Datafeed
Counts
JSON
Job messages
Datafeed preview
Forecasts
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book is here to help you understand the core concepts and the practical implementation of Kibana in different use cases. It covers how to ingest data from different sources into Elasticsearch using Beats or Logstash. It then shows how to explore, analyze, and visualize the data in Kibana. This book covers how to play with time series data to create complex graphs using Timelion and show them along with other visualizations on your dashboard, then how to embed your dashboard or visualization on a web page. You will also learn how to use APM to monitor your application by installing and configuring the APM server and APM agents. We will explore how Canvas can be used to create awesome visualizations. We will also cover different X-Pack features such as user and role management in security, alerting, monitoring, and machine learning. This book will also explain how to create machine learning jobs to find anomalies in your data.
Aspiring Elastic developers, data analysts, and those interested in learning about the new features of Kibana 7 will find this book very useful. No prior knowledge of Kibana is expected. Previous experience with Elasticsearch will help, but is not mandatory.
Chapter 1, Understanding Your Data for Kibana, introduces the notion of data drive architecture by explaining the main challenges in the industry, how the Elastic Stack is structured, and what data we'll use to implement some of the use cases in Kibana.
Chapter 2, Installing and Setting Up Kibana, walks the reader through the installation of the Elastic Stack on different platforms.
Chapter 3, Business Analytics with Kibana, describes what a business analytics use case is through a real-life example, and then walks the reader through the process of data ingestion.
Chapter 4, Data Visualization Using Kibana, describes visualization and dashboarding. The readers will learn how to create different visualizations, before moving on to how to create a dashboard using these visualizations.
Chapter 5, Dev Tools and Timelion, is focused on Dev Tools and Timelion in Kibana. The readers will learn different options of Dev Tools, such as using Console to run Elasticsearch queries right from the Kibana interface. Then we will cover using Search Profiler to profile the Elasticsearch queries, and using Grok Debugger to create a Grok pattern with which we can convert unstructured data into structured data through Logstash. After that, we will cover Timelion, with which we can play with time-series data, because it provides some functions that can be chained together to create a complex visualization for specific use cases that can't be created using the Kibana Visualize option.
Chapter 6, Space and Graph Exploration in Kibana, describes the Elastic Stack Graph plugin, which provides graph analytics. The reader will be walked through the main use cases that the Graph plugin tries to solve, and will see how to interact with the data. After that, we will cover how to create different Spaces and add them with different roles and users.
Chapter 7, Elastic Stack Features, describes the importance of Elastic features. We will cover security using user and role management, and will then cover reporting, with which we can export CSV and PDF reports. After that, we will explore how to use monitoring to monitor the complete Elastic Stack, and with Watcher, we will configure the alerting system to send an email whenever a value crosses a specified threshold.
Chapter 8, Kibana Canvas and Plugins, describes the Kibana Canvas and explains how we can create custom dashboards with it.
Chapter 9, Application Performance Monitoring, describes Application Performance Monitoring (APM) and how it can be configured to monitor an application. We will cover the installation of APM Server and configure it to receive data from APM agents. Then, we will cover the installation and configuration of APM agents with the application in order to fetch the application data. Lastly, we will explain how to explore data with the built-in APM UI or Kibana Dashboard.
Chapter 10, Machine Learning with Kibana, introduces machine learning and explores how to find data anomalies and predict future trends.
In this book, you will need to download and install the Elastic Stack, specifically, Elasticsearch, Kibana, Beats, Logstash, and APM. All the software is available from http://www.elastic.co/downloads. The Elastic Stack can be run on various environments on different machines and setups. The support matrix is available at https://www.elastic.co/support/matrix.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Kibana-7-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838550363_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "For CentOS and older Red Hat-based distributions, we can use the yum command".
A block of code is set as follows:
input { file { path => "/home/user/Downloads/Popular_Baby_Names.csv" start_position => beginning }}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
elasticsearch { action => "index"
hosts => ["127.0.0.1:9200"]
index => "Popular_Baby_Names"}
Any command-line input or output is written as follows:
unzip elasticsearch-7.1.0-windows-x86_64.zip
cd elasticsearch-7.1.0/
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Now, we need to click on the Next step button."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In this section, we will start with a basic introduction to the Elastic Stack and then discuss what's new in Elastic Stack 7. We will then cover the installation process of the Elastic Stack. By the end of this section, we will know how we can create an index pattern in Kibana.
The following chapters will be covered in this section:
Chapter 1
,
Understanding Your Data for Kibana
Chapter 2
,
Installing and Setting Up Kibana
We are living in a digital world in which data is growing at an exponential rate; every digital device sends data on a regular basis and it is continuously being stored. Now, storing huge amounts of data is not a problem—we can use cheap hard drives to store as much data as we want. But the most important thing that we can do with that data is to get the information that we need or want out of it. Once we understand our data, we can then analyze or visualize it. This data can be from any domain, such as accounting, infrastructure, healthcare, business, medical, Internet of Things (IoT), and more, and it can be structured or unstructured. The main challenge for any organization is to first understand the data they are storing, analyze it to get the information they need, create visualizations, and, from this, gain an insight of the data in a visual format that is easy to understand and enables people in management roles to take quick decisions.
However, it can be difficult to fetch information from data due to the following reasons:
Data brings complexity
: It is not easy to get to the root cause of any issue; for example, let's say that we want to find out why the traffic system of a city behaves badly on certain days of a month. This issue could be dependent on another set of data that we may not be monitoring. In this case, we could get a better understanding by checking the weather report data for the month. We can then try and find any correlations between the data and discover a pattern.
Data comes from different sources
: As I have already mentioned, one dataset can depend on another dataset and they can come from two different sources. Now, there may be instances where we cannot get access to all the data sources that are dependent on each other and, for these situations, it is important to understand and gather data from other sources and not just the one that you are interested in.
Data is growing at a faster pace
: As we move toward a digital era, we are capturing more and more data. As data grows at a quicker pace, it also creates issues in terms of what to keep, how to keep it, and how to process such huge amounts of data to get the relevant information that we need from it.
We can solve these issues by using the Elastic Stack, as we can store data from different sources by pushing it to Elasticsearch and then analyzing and visualizing it in Kibana. Kibana solves many data analysis issues as it provides many features that allow us to play around with the data, and we can also do a lot of things with it. In this book, we will cover all of these features and try to cover their practical implementation as well.
In this chapter, we will cover the following topics:
Data analysis and visualization
challenges for industries
Understanding your data for analysis in Kibana
Limitations with existing tools
Components of the Elastic Stack
Depending on the industry, the use cases can be very different in terms of data usage. In any given industry, data is used in different ways and for different purposes—whether it's for security analytics or order management. Data comes in various formats and different scales of volumes. In the telecommunications industry, for example, it's very common to see projects about the quality of services where data is taken from 100,000 network devices.
The challenge for these industries is to handle the huge quantities of data and to get real-time visualizations from which decisions can be taken. Data capture is usually performed for applications, but to utilize this data for creating a real-time dashboard is a challenge. For that, Kibana can be used, along with Beats and Logstash, to push data from different sources, Elasticsearch can be used to store that data, and then, finally, Kibana can be used to analyze and visualize it. So, if we summarize the industry issue, it has the same canonical issues as the following:
How to handle huge quantities of data as this comes with a lot of complexity
How to visualize data effectively and in a real-time fashion so that we can get data insights easily
Once this is achieved, we can easily recognize the visual patterns in data and, based on that, we can derive the information out of it that we need without dealing with the burden of exploring tons of data. So, let me now explain a real scenario that will help you to understand the actual challenge of data capture. I will take a simple use case to explain the issues and will then explain the technologies that can be used to solve them.
If we consider the ways in which we receive huge amounts of data, then you will note that there are many different sources that we can use to get structured or unstructured data. In this digital world, we use many devices that keep on generating and sending data to a central server where the data is then stored. For instance, the applications that we access generate data, the smartphones or smartwatches we use generate data, and even the cab services, railways, and air travel systems we use for transportation all generate data.
A system and its running processes also generate data, and so, in this way, there are many different ways in which we can get data. We get this data at regular intervals and it either accumulates on the physical drive of a computer or, more frequently, it can be hidden within data centers that are hard to fetch and explore. In order to explore this data and to analyze it, we need to extract (ship) it from different locations (such as from log files, databases, or applications), convert it from an unstructured data format into a structured data format (transform), and then push the transformed data into a central place (store) where we can access it for analysis. This flow of data streaming in the system requires a proper architecture to be shipped, transformed, stored, and accessed in a scalable and distributed way.
End users, driven by the need to process increasingly higher volumes of data while maintaining real-time query responses, have turned away from more traditional, relational database or data warehousing solutions, due to poor scalability or performance. The solution is increasingly found in highly distributed, clustered data stores that can easily be monitored. Let's take the example of application monitoring, which is one of the most common use cases we meet across industries. Each application logs data, sometimes in a centralized way (for example, by using syslog), and sometimes all the logs are spread out across the infrastructure, which makes it hard to have a single point of access to the data stream.
The majority of large organizations don't retain logged data for longer than the duration of a log file rotation (that is, a few hours or even minutes). This means that, by the time an issue has occurred, the data that could provide the answers is lost.
So, when you actually have the data, what do you do? Well, there are different ways to extract the gist of logs. A lot of people start by using a simple string pattern search (GREP). Essentially, they try to find matching patterns in logs using a regular expression. That might work for a single log file but, if you want to search something from different log files, then you need to open individual log files for each date to apply the regular expression.
GREP is convenient but, clearly, it doesn't fit our need to react quickly to failure in order to reduce the Mean Time To Recovery (MTTR). Think about it: what if we were talking about a major issue in the purchasing API of an e-commerce website? What if the users experience a high latency on this page or, worse, can't go to the end of the purchase process? The time you will spend trying to recover your application from gigabytes of logs is money you could potentially lose. Another potential issue could be around a lack of security analytics and not being able to blacklist the IPs that try to brute force your application.
In the same context, I've seen use cases where people didn't know that every night there was a group of IPs attempting to get into their system, and this was just because they were not able to visualize the IPs on a map and trigger alerts based on their value. A simple, yet very effective, pattern in order to protect the system would have been to limit access to resources or services to the internal system only. The ability to whitelist access to a known set of IP addresses is essential. The consequence could be dramatic if a proper data-driven architecture with a solid visualization layer is not serving those needs. For example, it could lead to a lack of visibility and control, an increase in the MTTR, customer dissatisfaction, financial impact, security leaks, and bad response times and user experiences.
Here, we will discuss different aspects of data analysis such as data shipping, data ingestion, data storage, and data visualization. These are all very important aspects of data analysis and visualization, and we need to understand each of them in detail. The objective is to then understand how to avoid any confusion, and build an architecture that will serve the different following aspects.
Data-shipping architecture should support any sort of data or event transport that is either structured or unstructured. The primary goal of data shipping is to send data from remote machines to a centralized location in order to make it available for further exploration. For data shipping, we generally deploy lightweight agents that sit on the same server from where we want to get the data. These shippers fetch the data and keep on sending them to the centralized server. For data shipping, we need to consider the following:
