38,39 €
Learn how to make the most of the Elastic Stack (ELK Stack) products—including Elasticsearch, Kibana, Elastic Agent, and Logstash—to take data reliably and securely from any source, in any format, and then search, analyze, and visualize it in real-time. This cookbook takes a practical approach to unlocking the full potential of Elastic Stack through detailed recipes step by step.
Starting with installing and ingesting data using Elastic Agent and Beats, this book guides you through data transformation and enrichment with various Elastic components and explores the latest advancements in search applications, including semantic search and Generative AI. You'll then visualize and explore your data and create dashboards using Kibana. As you progress, you'll advance your skills with machine learning for data science, get to grips with natural language processing, and discover the power of vector search. The book covers Elastic Observability use cases for log, infrastructure, and synthetics monitoring, along with essential strategies for securing the Elastic Stack. Finally, you'll gain expertise in Elastic Stack operations to effectively monitor and manage your system.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 588
Veröffentlichungsjahr: 2024
Elastic Stack 8.x Cookbook
Over 80 recipes to perform ingestion, search, visualization, and monitoring for actionable insights
Huage Chen
Yazid Akadiri
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
The authors acknowledge the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the authors and edited by a professional publishing team.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kaustubh Manglurkar
Publishing Product Manager: Deepesh Patel
Book Project Manager: Aparna Ravikumar Nair
Senior Editor: Tazeen Shaikh
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Proofreader: Tazeen Shaikh
Indexer: Rekha Nair
Production Designer: Prashant Ghare
Senior DevRel Marketing Executive: Nivedita Singh
First published: June 2024
Production reference: 1070624
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83763-429-3
www.packtpub.com
To my parents, Ying and Shinlang, for their love and unconditional support.
To Noël Jaffré, who has always been an inspiration throughout my career.
– Huage Chen
To my beloved parents, who have tirelessly shaped a world where I could chase my dreams—your efforts have been my foundation. To my dear wife and my children, Safaa, Adam, and Zaki, whose love, patience, and incredible support have been my strength on this incredible journey. Thank you.
– Yazid Akadiri
As we explore the growing world of data, the skill to understand it and use it to its full strength becomes a key challenge for data practitioners, architects, search specialists, DevOps and SREs, and others.
Since the inception of Elasticsearch and the progressive addition of key components to form what is currently known as the Elastic Stack, we’ve wanted to help people make sense of their data through the power of search and analytics. The launch of Elastic Stack 8 marks a big milestone in our journey. It is a version enriched with new capabilities, optimized performance, and an ever-stronger foundation for machine learning and AI.
This book serves as a practical resource for anyone who interacts with data and wants to learn how to exploit the power of the Elastic Stack, including Elasticsearch, Kibana, and various integrations, to make data-driven decisions and gain richer insights from their data environments.
As you turn the pages of this cookbook, you will uncover the innovations introduced in version 8.x. Our goal has always been to simplify the complex, and this book aligns perfectly with that ethos—breaking down advanced concepts into easy-to-follow, step-by-step instructions. Whether you are taking your initial steps in Elasticsearch and the Elastic Stack or looking to expand your expertise, the cookbook format provides a unique opportunity to build your skills progressively and systematically.
I am excited about the endless possibilities that Elastic Stack 8.x unlocks, and I look forward to hearing about the innovative ways in which you employ these recipes.
Shay Banon
Creator of Elasticsearch and CTO of Elastic
Huage Chen is a member of Elastic’s customer engineering team and has been with Elastic for over five years, helping users throughout Europe to innovate and implement cloud-based solutions for search, data analysis, observability, and security. Before joining Elastic, he worked for 10 years in web content management, web portals, and digital experience platforms.
Yazid Akadiri has been a solutions architect at Elastic for over four years, helping organizations and users solve their data and most critical business issues by harnessing the power of the Elastic Stack. At Elastic, he works with a broad range of customers, with a particular focus on Elastic observability and security solutions. He previously worked in web services-oriented architecture, focusing on API management and helping organizations build modern applications.
Evelien Schellekens is a senior solutions architect at Elastic. Evelien enjoys sharing knowledge through public speaking and interacting with the technical community. She’s passionate about observability and open source technologies such as Kubernetes.
Giuseppe Santoro is a senior software engineer at Elastic. With deep expertise in Kubernetes, the cloud, and observability, Giuseppe contributes to the tech community through mentoring and technical writing.
We would like to express our gratitude to Evelien Schellekens and Giuseppe Santoro for their invaluable contributions and meticulous review of this book. Their expertise and thoughtful feedback have been instrumental in refining our work. We also extend our thanks to our fellow Elasticians for their contributions: Amanda Branch, Bahaaldine Azarmi, Carson Ip, Nicholas Drost, Sean Collin, Yan Savitski, Yannick Fhima, and the entire Elastic South EMEA Solutions Architect and Customer Architect teams.
In this cookbook, you will explore practical recipes and step-by-step instructions for solving real-world data challenges using the latest versions of the Elastic Stack’s components, including Elasticsearch, Kibana, Elastic Agent, Logstash, and Beats. This book equips you with the knowledge and skills necessary to unlock the full potential of the Elastic Stack.
The book begins with practical guides on installing the stack through various deployment methods. Subsequently, it delves into the ingestion and search of general content data, illustrating how to develop enhanced search experiences. As you progress, you will explore timestamped data ingestion, data transformation, and enrichment using various components of the Elastic Stack. You will also learn how to visualize, explore, and create dashboards with your data using Kibana. Moving forward, you will refine your skills in anomaly detection and data science, employing advanced techniques in data frame analytics and natural language processing. Equipped with these concepts, you will investigate the latest advancements in search technology, including semantic search and generative AI. Additionally, you will explore Elastic Observability use cases for log, infrastructure, and synthetic monitoring, alongside essential strategies for securing the Elastic Stack. Ultimately, you will gain expertise in Elastic Stack operations, enabling you to monitor and manage your system effectively.
By the end of the book, you will have acquired the necessary knowledge and skills to build scalable, reliable, and efficient data analytics and search solutions with the Elastic Stack.
Note
The Elastic Security solution, a significant component of the Elastic Stack, would have merited considerable attention in this book. However, due to considerations regarding the length of the book and the intended audience, we have opted not to include this section in the current edition.
This book is intended for Elastic Stack users, developers, observability practitioners, and data professionals of all levels, from beginners to experts, seeking practical experience with the Elastic Stack:
Developers will find easy-to-follow recipes for utilizing APIs and features to craft powerful applications.Observability practitioners will benefit from use cases that cover APM, Kubernetes, and cloud monitoring.Data engineers and AI enthusiasts will be provided with dedicated recipes focusing on vector search and machine learning.No prior knowledge of the Elastic Stack is required.
Chapter 1, Getting Started – Installing the Elastic Stack, explores the installation of the Elastic Stack across environments such as Elastic Cloud and Kubernetes, detailing the setup for Elasticsearch, Kibana, and Fleet along with insights on cluster components and deployment strategies for stack optimization.
Chapter 2, Ingesting General Content Data, dives into the data ingestion process, focusing on indexing, updating, and deleting operations within Elasticsearch, and emphasizes analyzers, index mappings, and templates for effective Elasticsearch index management.
Chapter 3, Building Search Applications, guides you through constructing search experiences using Elasticsearch’s Query DSL and new features in Elastic Stack 8, culminating in comprehensive search applications with advanced queries and analytics.
Chapter 4, Timestamped Data Ingestion, delves into data transformation using Elastic Stack tools, instructing on data structuring, enrichment, reorganization, and downsampling, while utilizing ingest pipelines, processors, Transforms, and Logstash.
Chapter 5, Transform Data, delves into data transformation techniques using Elastic Stack tools. You will learn how to structure, enrich, reorganize, and downsample your data to glean actionable insights. This chapter delivers practical know-how on utilizing ingest pipelines, processors, transforms, and Logstash for efficient data manipulation.
Chapter 6, Visualize and Explore Data, shows how to turn transformed data into visualizations, teaching data exploration in Discover, visual creation with Kibana Lens, and the use of dashboards and maps to deeply understand your data.
Chapter 7, Alerting and Anomaly Detection, outlines the setup of alerts and anomaly detection for proactive data management, covering alert creation and monitoring, anomaly investigation, and unsupervised machine learning job implementation.
Chapter 8, Advanced Data Analysis and Processing, delves into machine learning within the Elastic Stack, covering outlier detection, regression, and classification modeling, as well as deploying NLP models for deep data insights.
Chapter 9, Vector Search and Generative AI Integration, explores advanced search technologies and AI integrations, teaching you about vector search, hybrid search, and Generative AI applications for developing sophisticated AI-driven conversational tools.
Chapter 10, Elastic Observability Solution, demonstrates how to employ the Elastic Stack for comprehensive system insights, covering application instrumentation, real-user monitoring, Kubernetes observability, synthetic monitors, and incident detection.
Chapter 11, Managing Access Control, navigates access control within the Elastic Stack, detailing authentication management, custom role definition, Kibana space security, API key utilization, and single sign-on implementation.
Chapter 12, Elastic Stack Operation, provides essential recipes for Elastic Stack management, such as index life cycle, data stream optimization, and snapshot life cycle management, and explores cluster automation with Terraform and cross-cluster search.
Chapter 13, Elastic Stack Monitoring, equips you with techniques for Elastic Stack monitoring and troubleshooting, focusing on the stack monitoring setup, custom visualization creation, cluster health assessment, and audit logging strategies.
Before starting this book, you should have a basic understanding of databases, web servers, and data formats such as JSON. No prior Elastic Stack experience is needed, as the book starts with foundational topics. Familiarity with terminal commands and web technologies will be beneficial for following along. Each chapter progresses into more advanced Elastic Stack applications and techniques.
Software/hardware covered in the book
Operating system requirements
Elastic Stack 8.12
Windows, macOS, or Linux
Python 3.11+
Windows, macOS, or Linux
Docker 4.27.0
Windows, macOS, or Linux
Kubernetes 1.24+
Windows, macOS, or Linux
Node.js 19+
Windows, macOS, or Linux
Terraform 1.8.0
Windows, macOS, or Linux
Amazon Web Services (AWS)
Windows, macOS, or Linux
Google Cloud Platform (GCP)
Windows, macOS, or Linux
Okta
Windows, macOS, or Linux
Ollama
Windows, macOS, or Linux
OpenAI/Azure OpenAI
Windows, macOS, or Linux
If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook. In case there’s an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “The or and and operators yield results that are too broad or too strict; you can use the minimum_should_match parameter to filter less relevant results.”
A block of code is set as follows:
GET /movies/_search { "query": { "multi_match": { "query": "come home", "fields": ["title", "plot"] } } }When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
GET movies-dense-vector/_search { "knn": { "field": "plot_vector", "k": 5, "num_candidates": 50, "query_vector_builder": { "text_embedding": { "model_id": ".multilingual-e5-small_linux-x86_64", "model_text": "romantic moment" } } }, "fields": [ "title", "plot" ] }Any command-line input or output is written as follows:
$ kubectl apply -f elastic-agent-managed-kubernetes.yml$ sudo metricbeat modules enable tomcatBold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: “In Kibana, go to Observability | APM | Services, to check whether the different microservices have been correctly instrumented.”
Tips or important notes
Appear like this.
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There’s more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:
This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Elastic Stack 8.x Cookbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/978-1-83763-429-3
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyThe Elastic Stack is a suite of components that allows you to ingest, store, search, analyze, and visualize your data from diverse sources. Previously known as the ELK Stack, today, it consists of four core components: Elasticsearch, Logstash, Elastic Agent, and Kibana.
Elasticsearch is a distributed search and analytics engine that can handle petabytes of unstructured data. Logstash, Beats, and Elastic Agent are data ingestion tools that can collect, transform, and load data from various sources into Elasticsearch. Kibana is a web-based interface that allows you to visualize and explore your data, as well as access various solutions built on top of the Elastic Stack. All integrate seamlessly so you can use your data for a variety of use cases such as search, analytics, observability, and security.
The Elastic Stack can be deployed on Elastic Cloud, as well as on-premises, and it can be deployed in a hybrid and orchestrated setup. In this chapter, we will guide you through setting up and running Elastic deployments in different environments, including a hosted Elasticsearch service on Elastic Cloud, Kubernetes infrastructure, and self-managed solutions. We will also discuss additional components and nodes within the cluster. By the end of this chapter, you’ll have a comprehensive understanding of the various deployment strategies and how to use the Elastic Stack.
Figure 1.1 illustrates the key components of the Elastic Stack and the relationship between different components from a data flow perspective:
Figure 1.1 – The Elastic Stack components
In this chapter, we are going to learn how to install Elasticsearch, Kibana, and Fleet with different deployment options (Elastic Cloud, self-managed, and Elastic Cloud on Kubernetes (ECK)) highlighted in the right part of the following figure, and then we will proceed to the data ingestion part in the next chapters.
To determine the most suitable deployment option for your needs, Figure 1.2 provides a comparative summary of the key differences among the various deployment methods:
Figure 1.2 – Deployment options comparison
We’ll be covering the following recipes:
Deploying the Elastic Stack on Elastic CloudInstalling the Elastic Stack with ECKInstalling a self-managed Elastic StackAdding data tiering to your deploymentSetting up additional nodesSetting up Fleet ServerSetting up a snapshot repositoryElastic Cloud is the most straightforward way to deploy and manage your Elasticsearch, Kibana, Integrations Server (a combined component for the application performance monitoring server and Fleet Server), and other components of the Elastic Stack. This recipe will guide you through the process of getting started with Elastic Cloud, from signing up for an account to creating your first Elastic deployment.
Before we begin, let’s learn how to create a deployment on Elastic Cloud and verify it using this step-by-step guide:
We will create an account on Elastic Cloud:Visit the Elastic Cloud website at https://cloud.elastic.co/.Click on the Sign up button (a 14-day trial without needing a credit card is offered by default).Fill out the registration form with your details, including your name, email address, and desired password.Next, we will create a deployment.
On the next screen, you’ll be prompted to create your first deployment, you can choose between the following options as shown in Figure 1.3:Cloud provider: Google Cloud, Azure, or AWS.Region: The supported regions for different cloud providers (the list of supported regions can be found here: https://www.elastic.co/guide/en/cloud/current/ec-reference-regions.html).Hardware profile: You can simply start with the General-purpose profile. Elastic Cloud allows you to change hardware later.Version: The latest minor version of Elastic Stack 7 or 8.Figure 1.3 – Creating a cloud deployment
On the next screen (shown in Figure 1.4), you’ll be given a password. Be sure to save it as you’ll need it to log in to both Kibana (the application interface) as well as command-line operations:Figure 1.4 – Cloud deployment credentials
Finally, let’s check the created deployment.
After the deployment creation, you will be redirected to the Home page of Kibana, where you can choose one of the data onboarding guides as shown in Figure 1.5:Figure 1.5 – Kibana onboarding screen
You can also check the deployment status from Elastic Cloud’s main console (https://cloud.elastic.co/home):
Figure 1.6 – Cloud deployment status
You can then click on Manage to see the details of your deployment and management options as shown in Figure 1.7:Figure 1.7 – Cloud deployment console
At this stage, the following components have been provisioned automatically:
2 Elasticsearch hot nodes with 2 GB of RAM1 Elasticsearch master tie-breaker node with 1 GB of RAM1 Kibana node with 1 GB of RAM1 Integrations Server node with 1 GB of RAM1 Enterprise Search node with 2 GB of RAMYou can see the detailed list view of the components that we just mentioned in your deployment as shown in Figure 1.8. It gives you valuable information about each component such as Health, Size, Role, Zone, Disk, and Memories:
Figure 1.8 – Cloud deployment components view
We also get different endpoints to access different components of the Elastic Stack:
Figure 1.9 – Cloud deployment endpoints
Note
You will need to save your cloud ID from this screen, as it will be useful and convenient to configure Elasticsearch clients, Beats, Elastic Agent, and so on, with your cloud ID so you can send data to your Elastic deployment.
Once you deploy the Elastic Stack on Elastic Cloud, there are different possibilities to manage and configure your deployment. Let us look at a few possibilities.
Here’s how to scale and configure your deployment:
Scale/autoscale your deployment to meet your growing needs (https://www.elastic.co/guide/en/cloud/current/ec-autoscaling.html).Add or remove nodes, change the node type, adjust the node size, and make other configuration changes (More on this in the Creating and setting up additional Elasticsearch nodes recipe in this chapter).Configure the data tiering (see the Creating and setting up data tiering recipe in this chapter for more information).Monitor, backup your data, and configure your backup repository (More details in the Setting up snapshot repository recipe of this chapter).Monitor your deployment health (More details in the Setting up stack monitoring recipe in Chapter 13).Here’s how you secure and control access to your deployment:
Configure authentication methods, such as username/password or single sign-on (SSO) (See Chapter 11).Set up role-based access control (RBAC) to define user roles and permissions (See Chapter 11).Configure a deployment traffic filter (https://www.elastic.co/guide/en/cloud/current/ec-traffic-filtering-deployment-configuration.html).ECK is the official Kubernetes operator for automating the deployment and management of Elasticsearch and other Elastic components on Kubernetes. ECK enables the use of Kubernetes-native tools and APIs to manage Elasticsearch clusters, offering capabilities for monitoring and securing them. It supports scaling, rolling upgrades, availability zone awareness, and the implementation of hot-warm-cold storage architectures. ECK allows for the exploitation of Elasticsearch’s power and flexibility on Kubernetes, both on-premises and in the cloud. In this guide, we will first install the ECK operator in a Kubernetes cluster and then use it to deploy an Elasticsearch cluster and Kibana.
Ensure you have a Kubernetes cluster ready before deploying ECK and the Elastic Stack. For this recipe, you can use either minikube or Google Kubernetes Engine (GKE). Elastic Cloud on Kubernetes also supports other Kubernetes distributions such as OpenShift, Amazon Elastic Kubernetes Service (Amazon EKS), and Azure Kubernetes Service (Microsoft AKS). To ensure smooth deployment and optimal performance, allocate appropriate resources to your cluster. Your cluster should have at least 16 GB of RAM and 4 CPU cores to provide a seamless experience during the deployment of ECK, Elasticsearch, Kibana, Elastic Agent, and the sample application.
You can find all the related YAML files on the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/tree/main/Chapter1/eck.
The snippets of this recipe can be found at the following address: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/snippets.md#installing-elastic-stack-with-elastic-cloud-on-kubernetes.
Before installing ECK, you need to prepare your Kubernetes environment and ensure that you have the necessary resources and permissions. This recipe presumes that your Kubernetes cluster is already up and running. Your Kubernetes nodes need to have at least 2 GB of free memory. Make sure to check the supported versions of Kubernetes on the official Elastic documentation website: https://www.elastic.co/support/matrix#matrix_kubernetes.
Let’s start:
First, you need to have an ECK operator deployed in your Kubernetes cluster. Let’s begin by creating the ECK custom resource definitions:$ kubectl create -f https://download.elastic.co/downloads/eck/2.11.0/crds.yamlThe following Elastic resources will be created in your Kubernetes cluster:
Figure 1.10 – Created resources when deploying ECK
Now that the custom resources definitions have been created, proceed with the installation of the ECK operator:$ kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yamlExecuting the previous command will give you the following output:
Figure 1.11 – Installing the ECK operator in a Kubernetes cluster
Important note
The best practice is to use a dedicated Kubernetes namespace for all workloads related to ECK, which offers enhanced isolation for various applications and robust security with RBAC permissions by default. The provided manifest uses the elastic-system namespace by default.
We can then monitor the operator logs:$ kubectl -n elastic-system logs -f statefulset.apps/elastic-operatorNow, let’s deploy a three-node Elasticsearch cluster by applying the YAML file provided in the GitHub repository:$ kubectl apply -f elasticsearch.yamlTo check the status of Elasticsearch, you can get an overview of the clusters with the following kubectl command:$ kubectl get elasticsearchNote
This might take a couple of minutes if you need to pull the images.
Figure 1.12 shows the results of this command when the cluster has been successfully deployed:
Figure 1.12 – Checking the cluster status
Now, deploy the Kibana instance by applying the following kibana.yaml file in your cluster:$ kubectl apply -f kibana.yamlSimilar to Elasticsearch, you can find details about Kibana instances with the following command:$ kubectl get kibanaFigure 1.13 – Checking Kibana status
Finally, let’s connect to Kibana. This is quite straightforward, as ECK automatically creates a ClusterIP service for Kibana. Follow the next steps to log in to your Kibana instance.
Get the ClusterIP service created for Kibana:$ kubectl get service kibana-sample-kb-httpYou should expect to see an output like Figure 1.14:
Figure 1.14 – Printing the Kibana ClusterIP
Now, use kubectl port-forward to access Kibana from your host:$ kubectl port-forward service/kibana-sample-kb-http 5601Before visiting the Kibana login page, we’ll need to retrieve the password of the elastic user provisioned by the operator with the following command:$ kubectl get secret elasticsearch-sample-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode; echoCopy the output of the command.
Now that you’ve forwarded the port, open it in your web browser and use the credentials obtained in the previous steps to log in to Kibana as shown in the following figure:
Figure 1.15 – Log in to Kibana on ECK
Important note
When accessing Kibana, you might see a security warning due to self-signed certificates not being trusted by the browser. You can safely bypass this warning and proceed to Kibana’s URL. For production environments, it’s recommended to use certificates from your own certificate authority (CA) to ensure security.
As you have seen, ECK greatly simplifies the setup of Elasticsearch and Kibana, getting you up and running in a few minutes. It accomplishes this by managing a variety of tasks on our behalf. Let’s review what ECK has done for us in the cluster:
Security: Security features are enabled in ECK, ensuring robust protection for all deployed Elastic Stack resources. By default, all resources deployed through ECK are secured. The system provisions a built-in basic authentication user named elastic. Transport Layer Security (TLS) is configured to secure network traffic within and to your Elasticsearch cluster.Certificates: A self-signed, internally generated CA certificate is used by default for each cluster, providing secure communication within the Elasticsearch cluster. For advanced configurations, you have the option to use externally signed certificates or other custom certificate setups.Default service exposure: Your cluster is automatically set up with a ClusterIP service, which offers internal network connectivity. You also have the option to configure these services to be of the LoadBalancer type, making them accessible from external networks.Elasticsearch connection: You may have noticed by looking at the provided kibana.yaml file that there are no explicit Elasticsearch connection details. The information is provided to Kibana with the ElasticsearchRef specification defined in the ECK operator.As an alternative installation method, ECK can also be installed using a Helm chart from the Elastic Helm repository:
$ helm repo add elastic https://helm.elastic.co$ helm repo updateStarting with ECK version 2.8, Logstash can be managed as a custom resource using the operator.
In this recipe, you will learn how to install and manage the Elastic Stack on your local machine, focusing primarily on the essential components: Elasticsearch and Kibana.
Before proceeding with the installation, make sure your system meets the minimum requirements for running Elasticsearch, Kibana, and Fleet Server. Check the official documentation for the specific version you want to install to ensure compatibility with your operating system (https://www.elastic.co/support/matrix).
Let’s first look at how to download Elasticsearch:
Visit the Elasticsearch download page (https://www.elastic.co/downloads/elasticsearch).By default, the official Elasticsearch download page provides you with the download links for the latest release. Choose the right package for your operating system.Once the download is complete, extract the contents of the package to a working directory of your choice.Next, let’s configure Elasticsearch:
Open the Elasticsearch configuration file located in the extracted directory. For example, in Linux, it’s found at config/elasticsearch.yml.Adjust the settings as needed, such as the cluster name, network settings, and heap size.Save the configuration file.Now, let’s see how you start Elasticsearch:
Open a terminal or command prompt and navigate to the Elasticsearch directory.Run the Elasticsearch executable or script that is appropriate for your operating system:For Linux/Mac, it is the following:$ ./bin/elasticsearchFor Windows, it is the following:$ bin\elasticsearch.batOn the first launch, Elasticsearch will perform an initial security configuration, which includes generating a password for the built-in elastic user, an enrollment token for Kibana (valid for 30 minutes), and certificates and keys for transport and HTTP layers:
The Elasticsearch node is up and running and reachable at HTTPS port 9200< You can check the Elasticsearch node with curl command line curl --cacert <PATH_TO_CERTIFICATE> -u elastic https://localhost:9200Next, we will download and install Kibana:
Go to the official Kibana download page and go to the Downloads section.By default, the official Kibana download page provides you with the download links for the latest release of Kibana. Download the appropriate package for your operating system (tar.gz/zip, deb, or rpm).Extract the downloaded Kibana package to a directory of your choice.Open a terminal or command prompt and navigate to the Kibana directory.Run the Kibana executable file (e.g., bin/kibana for Unix-like systems or bin\kibana.bat for Windows) to start Kibana.In your browser, access Kibana with the https://localhost:5601 default URL, use the enrollment token from the earlier step when Kibana starts, and click the button to confirm the connection with Elasticsearch.Use the elastic superuser and the previously generated password to log in to Kibana.Starting with Elastic 8.0, security features such as TLS for both inter-node communication and HTTP layer security are enabled by default in self-managed clusters. As a result, certificates and keys are automatically generated during the Elasticsearch installation process. This allows stack-level security, activating, by default, both node-to-node TLS and Elasticsearch API TLS, which we have seen during the installation of Kibana.
You can also use Docker as a self-managed deployment option – please refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html.
A data tier consists of several Elasticsearch nodes that have the same data role and usually run on similar hardware. Often, different hardware is configured for each tier; for example, the hot tier might use the most powerful and expensive hardware, while the cold or frozen tiers could utilize less expensive, storage-oriented hardware. Using data tiers is an efficient strategy for reducing hardware requirements in an Elasticsearch cluster while maintaining access to data and the ability to search through it. To illustrate, a single frozen node can keep up to 100 TB of data compared to 2 TB of data for a hot node.
However, there is a caveat: as data moves to colder tiers, query performance can decrease. This is expected since the data is less frequently queried.
Figure 1.16 – Elasticsearch data tiering
As we can see in Figure 1.16, there are four data tiers provided by Elasticsearch:
Hot tier: This tier handles mostly indexing and query for timestamped data (most recent and frequently accessed data). This tier can also be referenced as the content tier for non-timestamped data.Warm tier: This tier is used for less recent timestamped data (more than seven days) that does not need to be updated. Extends storage capacity up to five times compared to the hot tier.Cold tier: This tier is used for timestamped data that is not so frequently accessed and not updated anymore. This tier is built on searchable snapshots technology and can store twice as much data compared to the warm tier.Frozen tier: This tier is used for timestamped data that is never updated and queried rarely but needs to be kept for regulation, compliance, or security use cases such as forensics. The frozen tier stores most of the data on searchable snapshots and only the necessary data based on query is pulled and cached on a local disk inside the node.In this recipe, you’ll learn how to set up data tiering in a self-managed Elasticsearch cluster. We will also discuss implementation on Elastic Cloud.
Make sure your self-managed cluster for the earlier recipe is up and running. For the sake of simplicity, we will create two additional nodes on the same local machine. We’ll add two data tiers to our cluster:
A node for the cold tierA node for the frozen tierThe code snippets for this recipe can be found at the following link: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/snippets.md#creating-and-setting-up-data-tiering.
On your local machine, execute the following steps:
Open the elasticearch.yaml file of the cluster you’ve previously set up and uncomment the transport.host setting at the end.Create two new directories for the new nodes, and let’s call those directories the following:node-coldnode-frozenDownload and extract the content of Elasticsearch package in each directory. Make sure to use the same version and operating system as previously used in the Installing a self-managed Elastic Stack recipe.In a separate terminal from where your cluster from the previous recipe is running, navigate to the directory where Elasticsearch is installed and run the following command:$ ./bin/elasticsearch --enrollment-token -s nodeThis command generates an enrollment token that you’ll copy and use to enroll new nodes with your Elasticsearch cluster.
Go to the cold node directory and open the elasticseach.yaml file and add the following settings: node.name: node-cold node.roles: ["data_cold"]From the installation directory of the cold node, start Elasticsearch and pass the enrollment token with --enrollment-token:$ ./bin/elasticsearch --enrollment-token <enrollment-token>Check that your node has successfully started.
Now, let’s do the same for the frozen node. Open the elasticseach.yaml file and add the following settings: node.name: node-frozen node.roles: ["data_frozen"]From the installation directory of the frozen node, start Elasticsearch and pass the enrollment token with --enrollment-token:$ bin/elasticsearch --enrollment-token <enrollment-token>Check that the new frozen node has successfully started.
Elasticsearch now provides specific roles that match the different data tiers (hot, warm, cold, frozen). It means we can add one of the data_hot, data_warm, data_cold, or data_frozen node roles to the roles setting in the configuration file. Once the appropriate roles are defined in the configuration file, new nodes are introduced into the cluster using an enrollment token. The –s node argument specifies that we’re creating a token to enroll an Elasticsearch node into a cluster.
Adding data tiers on an Elastic Cloud deployment is a more straightforward and streamlined process. There is no configuration file to edit and no infrastructure to provision; just head to your deployment and follow these steps:
On the Elastic Cloud deployment page, click Manage.Click on Edit on the left navigation pane.Click on Add capacity for any data tiers you wishto add.Figure 1.17 – Cloud deployment data tiering
In ECK, you define your cluster’s topology using a concept called nodeSets. Within the nodeSets attribute, each entry represents a group of Elasticsearch nodes sharing the same Kubernetes and Elasticsearch configurations. For instance, you might have one nodeSets attribute for master nodes, another for your hot tier nodes, and so forth. You can find an example configuration in the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/eck/elasticsearch-data-tiers.yaml.
When examining the provided configuration, it’s clear that there are three nodeSets attributes named hot, cold, and frozen, as illustrated in the following code block. Please note that for readability, the code has been abbreviated; the complete code is accessible at the specified GitHub repository location:
spec: version: 8.12.2 nodeSets: - name: hot config: node.store.allow_mmap: false podTemplate: ... count: 3 - name: cold config: node.roles: ["data_cold"] node.store.allow_mmap: false podTemplate: ... count: 1 - name: frozen config: node.roles: [ "data_frozen" ] node.store.allow_mmap: false podTemplate: ... count: 1In a real production scenario, additional configuration such as Kubernetes node affinity is necessary. Kubernetes node affinity uses NodeSelector to ensure Elasticsearch workloads are confined to selected Kubernetes nodes. Under the hood, Elasticsearch shard allocation awareness is used to allocate shards to the specified Kubernetes nodes.
In the production scenario, adding new data tiers to the cluster on a self-managed Elastic Stack is a bit more complex. For high availability and resilience, you’ll need to deploy nodes on separate machines and thus it requires additional configuration steps that were not covered in this recipe, such as binding to an address other than localhost.
Data tiers are the first steps of your data management strategy with Elasticsearch. The next step is to define an index life cycle management (ILM) policy that’ll automate the migration of your data between the different tiers. This will be covered in the Setting up index life cycle policy recipe in Chapter 12.
Data tiering is primarily intended for timestamped data. To fully leverage data tiers, matching infrastructure resources must be allocated for each tier. For instance, warm and cold tiers can use spinning disks rather than SSDs and have a larger RAM-to-disk ratio, enabling them to store more data. These tiers are ideal for frequent read access to your data. Meanwhile, the frozen tier depends entirely on searchable snapshots, making it most suitable for long-term retention and infrequent searches.
An Elasticsearch cluster can have a variety of node roles, besides data tiers, to function efficiently. Figure 1.18 outlines the several types of nodes available in a cluster:
Figure 1.18 – Elasticsearch node types
Roles such as Master, Machine Learning, or Ingest can be dedicated to specific Elasticsearch instances, and this is often a best practice in a production environment.
In this recipe, we will learn how to configure dedicated nodes for both self-managed deployments and Elastic Cloud.
Ensure that your self-managed cluster from the previous recipe is operational. For simplicity, we will create additional nodes on the same local machine. The nodes will undertake the following roles:
A dedicated master eligible nodeA machine learning nodeThe snippets for this recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/snippets.md#creating-and-setting-up-additional-elasticsearch-nodes.
On your local machine, proceed with the following steps:
Create two new directories for the new nodes, which we will name the following:node-masternode-mlRepeat Steps 1 and 2 of the Installing a self-managed Elastic Stack recipe in each directory.In a separate terminal from where your cluster from the previous recipe is running, navigate to the directory where Elasticsearch is installed and run the following command:$ bin\elasticsearch-create-enrollment-token -s nodeCopy the enrollment token. You will use it to enroll the new nodes with your Elasticsearch cluster.Navigate to the node-master directory and open the elasticsearch.yaml file and add the following settings: node.name: node-master node.roles: ["master"]From the installation directory of the cold node, start Elasticsearch and pass the enrollment token with –enrollment-token:$ bin/elasticsearch --enrollment-token <enrollment-token>Verify that the node has started successfully.
Now, let’s follow the same steps to add a dedicated machine learning node.
Open the elasticsearch.yaml file in the node-ml directory and add the following settings: node.name: node-ml node.roles: ["ml"]Start the node with the following command:$ bin/elasticsearch --enrollment-token <enrollment-token>Check the new machine learning node has successfully started and joined the cluster.
As explained in the previous recipe, we’re basically using the node.roles attributes to specify the roles.
On Elastic Cloud, dedicated master nodes are provisioned based on the number of Elasticsearch nodes available in your deployment. If your deployment has more than six Elasticsearch nodes, dedicated master nodes are automatically created. If your deployment has less than six Elasticsearch nodes, a tie-breaker node is set up behind the scenes to ensure high availability.
For machine learning and the other node types, follow the steps outlined here:
On the Elastic Cloud deployment page, click Manage.Click on Edit on the left navigation pane.Click on Add capacity for the node type you wish to add (coordinating and ingest, machine learning).In ECK, expanding your cluster with additional node types requires you to update your YAML specification. As discussed, when setting up data tiering, you introduce the concept of nodeSets. By simply adding a nodeSets attribute with the necessary role (e.g., ml, master, ingest, etc.), you instruct the operator to allocate those resources within your cluster. A sample YAML file is available at the following link: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/eck/elasticsearch-dedicated-master-ml.yaml.
In a production scenario, it’s always best to have dedicated hardware and hosts for specific node roles. You can also configure voting-only nodes that participate in the election for the master node but don’t serve as the master. A configuration with at least two dedicated master nodes and one voting-only node can be a suitable alternative to three full master nodes.
Fleet Server is a key component of the new ingest architecture in the Elastic Stack, which revolves around the Elastic Agent. Before delving into this recipe, let’s review some important concepts about Fleet and the Agent.
Fleet serves as the central management component, providing a UI within Kibana that manages Agents and their configurations at scale. The Elastic Agent is a single, unified binary responsible for data collection tasks – gathering logs, metrics, security events, and more, running on your hosts.
Fleet Server connects the Elastic Agent to Fleet and acts as a control plane for Elastic Agents. It is an essential piece if you intend to use Fleet for centralized management. The schema in Figure 1.19 illustrates the various components and their interactions:
Figure 1.19 – Architecture including Elastic Agent and Fleet Server
In this recipe, we’ll cover the setup of Fleet Server for self-managed deployments and Elastic Cloud.
Make sure you have an Elasticsearch cluster up and running with Kibana connected to the cluster.
For self-managed setups, this recipe assumes that you will be installing Fleet Server on the same local machine as your cluster.
Note
This configuration is not recommended for production environments.
We will use the quick-start wizard in Kibana for our setup:
In Kibana, on the left menu pane, go to Management | Fleet.Click on Add Fleet Servers. This will present instructions for adding a Fleet Server with two options: Quick Start and Advanced. We’ll use the Quick Start option:Figure 1.20 – Fleet Server configuration
Fill in the name and the URL.Click on Generate Fleet Server policy.Copy the generated command and paste it into your terminal.Figure 1.21 – Fleet Server installation
If the installation is successful, you will see a confirmation showing that Fleet Server is operational and connected.
By using the Quick Start option, Fleet automatically creates a Fleet Server instance and an enrollment token object in the background. Note that this option relies on self-signed certificates and is not suitable for production environments. For more details on how to set up Fleet using the Advanced mode, refer to the See also section of this recipe.
Elastic Cloud offers a hosted Integrations Server that includes Fleet Server, simplifying the setup process considerably.
To verify the availability of Fleet Server in your cloud deployment, do the following:
In Kibana, on the left menu pane, go to Management | Fleet.Look for Elastic Cloud agent policy on the Agents tab:Figure 1.22 – Centralized management for Elastic Agents
Check that the agent status is healthy.For configuration samples to set up Fleet Server on ECK, see the following examples:
To gain deeper insights into Fleet and Agent you also look at this recorded webinar: https://www.elastic.co/webinars/introducing-elastic-agent-and-fleetTo set up Fleet Server for production, check the official documentation: https://www.elastic.co/guide/en/fleet/current/add-fleet-server-on-prem.htmlAfter you’ve set up a functional Elastic cluster, we recommend setting up a snapshot repository according to your deployment method. This allows you to back up your valuable data. Elasticsearch features a native capability for data backup and restoration.
When you create a deployment on Elastic Cloud, it comes also with a default repository called found repository. In this recipe, you’ll learn how to register and manage a snapshot repository for an Amazon S3 bucket with Elastic Cloud, a popular option. The setup concepts can also apply to other cloud repositories, such as Google Cloud Storage or Azure Blob Storage, and self-managed repositories.
Later in the book, we will provide a guide on how to configure and execute snapshot and restore operations.
Make sure that your Elastic Cloud deployment is up and running and that you have sufficient permissions to create and configure S3 buckets on AWS.
In the first step, we will create a S3 bucket:
First, let us go to AWS Console | S3 | Create Bucket. Provide a name for the bucket, for instance: elasticsearch-s3-bucket-repository. Make sure to choose Block all public access before proceeding to create the bucket:Figure 1.23 – Creating an S3 bucket
Create an AWS policy to allow the Identity and Access Management user (IAM user) to access the S3 Bucket.
Navigate to AWS Management Console, then go to IAM | Policies.Click on Create Policy.Switch to the JSON editor and set up the policy with the following snippet (the snippet can be found at this address: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter1/snippets.md#sample-aws-s3-policy): { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::elasticsearch-s3-bucket-repository", "arn:aws:s3:::elasticsearch-s3-bucket-repository/*" ] } ] }On the next screen, give the policy the name elasticsearch-s3-bucket-policy and click on Create Policy.Create an IAM user and attach the policy we created.
Navigate to the AWS Management Console and then go to IAM | Access Management | Users.Click Create User, and provide elastic-s3-default-user as the username:Figure 1.24 – Creating Elastic S3 default user
On the next screen (Figure 1.25), choose Attach policies directly and attach the policy that you previously generated:Figure 1.25 – Attaching permission policy to the user
On the next screen (Figure 1.26), click on Create user to complete the user creation:Figure 1.26 – Finalizing the user creation
Now, we will generate an access key and secret access.
Open the Security credentials tab, and then choose Create access key.On the next screen (Figure 1.27), you will need to choose Third-party service for the use case, confirm the recommendation, and click on Next:Figure 1.27 – Access key configuration
On the next screen, click on Create access key.Download the key pair on the last screen of the wizard and choose Download .csv file. Store the .csv file with keys in a secure location that we will use in the next step.Store the access secrets in the Elastic Cloud deployment keystore (if you are configuring for an on-premises Elasticsearch cluster, you will have to use the elasticsearch-keystore command-line tool: https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-keystore.html).Go to the Elastic Cloud console and navigate to the management console of your deployment then go to the security page. Add settings for the Elasticsearch keystore with Type set to Single string and add the keys and values with the access key and secret access from the previous step: s3.client.secondary.access_key s3.client.secondary.secret_keyMake sure you get the following security keys on the security page of your deployment and restart the deployment to apply the changes:Figure 1.28 – Elastic Cloud keystore setting
For a self-managed deployment, you can set up the same keys with thefollowing commands:$ bin/elasticsearch-keystore add s3.client.secondary.access_key$ bin/elasticsearch-keystore add s3.client. secondary.secret_keyWe can now register the repository with Kibana. Let’s go to Kibana | Management | Stack Management | Snapshot & Restore | Repositories | Register a repository, name it my-custom-s3-repo, and choose AWS S3 as the Repository type option, as shown in Figure 1.29:Figure 1.29 – Snapshot repository creation
Set Client to secondary; this is part of your s3.client.secondary.access_key keystore secrets. Make sure to use the exact same bucket name that you created on AWS, elasticsearch-s3-bucket-repository, as shown in Figure 1.30:Figure 1.30 – Snapshot repository client configuration
Click to verify the repository and make sure that your S3 bucket is successfully connected as a snapshot repository:Figure 1.31 – Snapshot repository status