Mastering Elasticsearch 5.x - Bharvi Dixit - E-Book

Mastering Elasticsearch 5.x E-Book

Bharvi Dixit

0,0
39,59 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. Elasticsearch leverages the capabilities of Apache Lucene, and provides a new level of control over how you can index and search even huge sets of data.
This book will give you a brief recap of the basics and also introduce you to the new features of Elasticsearch 5. We will guide you through the intermediate and advanced functionalities of Elasticsearch, such as querying, indexing, searching, and modifying data. We’ll also explore advanced concepts, including aggregation, index control, sharding, replication, and clustering.
We’ll show you the modules of monitoring and administration available in Elasticsearch, and will also cover backup and recovery. You will get an understanding of how you can scale your Elasticsearch cluster to contextualize it and improve its performance. We’ll also show you how you can create your own analysis plugin in Elasticsearch.
By the end of the book, you will have all the knowledge necessary to master Elasticsearch and put it to efficient use.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Mastering Elasticsearch 5.x - Third Edition
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Revisiting Elasticsearch and the Changes
An overview of Lucene
Getting deeper into the Lucene index
Inverted index
Segments
Norms
Term vectors
Posting formats
Doc values
Document analysis
Basics of the Lucene query language
Querying fields
Term modifiers
Handling special characters
An overview of Elasticsearch
The key concepts
Working of Elasticsearch
Introducing Elasticsearch 5.x
Introducing new features in Elasticsearch
New features in Elasticsearch 5.x
New features in Elasticsearch 2.x
The changes in Elasticsearch
Changes between 1.x to 2.x
Mapping changes
Query and filter changes
Security, reliability, and networking changes
Monitoring parameter changes
Changes between 2.x to 5.x
Mapping changes
No more string fields
Floats are default
Changes in numeric fields
Changes in geo_point fields
Some more changes
Summary
2. The Improved Query DSL
The changed default text scoring in Lucene - BM25
Precision versus recall
Recalling TF-IDF
Introducing BM25 scoring
BM25 scoring formula
Example - tuning BM25 with custom similarity
How BM25 differs from TF-IDF
Saturation point
Average document length
Re-factored Query DSL
Choosing the right query for the job
Query categorization
Basic queries
Compound queries
Understanding bool queries
Non-analyzed queries
Full text search queries
Pattern queries
Similarity supporting queries
Score altering queries
Position aware queries
Structure aware queries
The use cases
Example data
Basic queries use cases
Searching for values in range
Compound queries use cases
A Boolean query for multiple terms
Boosting some of the matched documents
Ignoring lower scoring partial queries
Not analyzed queries use cases
Limiting results to given tags
Full text search queries use cases
Using Lucene query syntax in queries
Handling user queries without errors
Pattern queries use cases
Autocomplete using prefixes
Pattern matching
Similarity supporting queries use cases
Finding terms similar to a given one
Score altering query use cases
Decreasing importance of books with a certain value
Pattern query use cases
Matching phrases
Spans, spans everywhere
Some more important changes in Query DSL
Query rewrite explained
Prefix query as an example
Getting back to Apache Lucene
Query rewrite properties
An example
Query templates
Introducing search templates
The Mustache template engine
Conditional expressions
Loops
Default values
Storing templates in files
Storing templates in a cluster
Summary
3. Beyond Full Text Search
Controlling multimatching
Multimatch types
Best fields matching
Cross fields matching
Most fields matching
Phrase matching
Phrase with prefixes matching
Controlling scores using the function score query
Built-in functions under the function score query
The weight function
The field value factor function
The script score function
Decay functions - linear, exp, and gauss
Query rescoring
What is query rescoring?
Structure of the rescore query
Rescore parameters
To sum up
Elasticsearch scripting
The syntax
Scripting changes across different versions
Painless - the new default scripting language
Using Painless as your scripting language
Variable definition in scripts
Conditionals
Loops
An example
Sorting results based on scripts
Sorting based on multiple fields
Lucene expressions
The basics
An example
Summary
4. Data Modeling and Analytics
Data modeling techniques in Elasticsearch
Managing relational data in Elasticsearch
The object type
The nested documents
Parent - child relationship
Parent-child relationship in the cluster
Finding child documents with a parent ID query
A few words about alternatives
An example of data denormalization
Data analytics using aggregations
Instant aggregations in Elasticsearch 5.0
Revisiting aggregations
Metric aggregations
Bucket aggregations
Pipeline aggregations
Calculating average monthly sales using avg_bucket aggregation
Calculating the derivative for the sum of the monthly sale
The new aggregation category - Matrix aggregation
Understanding matrix stats
Dealing with missing values
Summary
5. Improving the User Search Experience
Correcting user spelling mistakes
Testing data
Getting into technical details
Suggesters
Using a suggester under the _search endpoint
Understanding the suggester response
Multiple suggestion types for the same suggestion text
The term suggester
Configuring the Elasticsearch term suggester
Common term suggester options
Additional term suggester options
The phrase suggester
Usage example
Configuring the phrase suggester
Basic configuration
Configuring smoothing models
Configuring candidate generators
The completion suggester
The logic behind the completion suggester
Using the completion suggester
Indexing data
Querying data
Custom weights
Using fuzziness with the completion suggester
Implementing your own auto-completion
Creating an index
Understanding the parameters
Configuring settings
Configuring mappings
Indexing documents
Querying documents for auto-completion
Working with synonyms
Preparing settings for synonym search
Formatting synonyms
Synonym expansion versus contraction
Summary
6. The Index Distribution Architecture
Configuring an example multi-node cluster
Choosing the right amount of shards and replicas
Sharding and overallocation
A positive example of overallocation
Multiple shards versus multiple indices
Replicas
Routing explained
Shards and data
Let's test routing
Indexing with routing
Routing in practice
Querying
Aliases
Multiple routing values
Shard allocation control
Allocation awareness
Forcing allocation awareness
Shard allocation filtering
What include, exclude, and require mean
Runtime allocation updating
Index level updates
Cluster level updates
Defining total shards allowed per node
Defining total shards allowed per physical server
Inclusion
Requirement
Exclusion
Disk-based allocation
Query execution preference
Introducing the preference parameter
An example of using query execution preference
Stripping data on multiple paths
Index versus type - a revised approach for creating indices
Summary
7. Low-Level Index Control
Altering Apache Lucene scoring
Available similarity models
Setting a per-field similarity
Similarity model configuration
Choosing the default similarity model
Configuring the chosen similarity model
Configuring the TF-IDF similarity
Configuring the BM25 similarity
Configuring the DFR similarity
Configuring the IB similarity
Configuring the LM Dirichlet similarity
Configuring the LM Jelinek Mercer similarity
Choosing the right directory implementation - the store module
The store type
The simple file system store - simplefs
The new I/O filesystem store - niofs
The mmap filesystem store - mmapfs
The default store type - fs
NRT, flush, refresh, and transaction log
Updating the index and committing changes
Changing the default refresh time
The transaction log
The transaction log configuration
Handling corrupted translogs
Near real-time GET
Segment merging under control
Merge policy changes in Elasticsearch
Configuring the tiered merge policy
Merge scheduling
The concurrent merge scheduler
Force merging
Understanding Elasticsearch caching
Node query cache
Configuring node query cache
Shard request cache
Enabling and disabling the shard request cache
Request cache settings
Cache invalidation
The field data cache
Field data or doc values
Using circuit breakers
The parent circuit breaker
The field data circuit breaker
The request circuit breaker
In flight requests circuit breaker
Script compilation circuit breaker
Summary
8. Elasticsearch Administration
Node types in Elasticsearch
Data node
Master node
Ingest node
Tribe node
Coordinating nodes/Client nodes
Discovery and recovery modules
Discovery configuration
Zen discovery
The unicast Zen discovery configuration
The master election configuration
Zen discovery fault detection and configuration
No Master Block
The Amazon EC2 discovery
The EC2 plugin installation
The EC2 plugin's generic configuration
Optional EC2 discovery configuration options
The EC2 nodes scanning configuration
Other discovery implementations
The gateway and recovery configuration
The gateway recovery process
Configuration properties
The local gateway
Low-level recovery configuration
Cluster-level recovery configuration
The indices recovery API
The human-friendly status API - using the cat API
The basics of cat API
Using the cat API
Cat API common arguments
The examples of cat API
Getting information about the master node
Getting information about the nodes
Changes in cat API - Elasticsearch 5.0
Host field removed from the cat nodes API
Changes to cat recovery API
Changes to cat nodes API
Changes to cat field data API
Backing up
The snapshot API
Saving backups on a filesystem
Creating snapshot
Registering repository path
Registering shared file system repository in Elasticsearch
Creating snapshots
Getting snapshot information
Deleting snapshots
Saving backups in the cloud
The S3 repository
The HDFS repository
The Azure repository
The Google cloud storage repository
Restoring snapshots
Example - restoring a snapshot
Restoring multiple indices
Renaming indices
Partial restore
Changing index settings during restore
Restoring to different cluster
Summary
9. Data Transformation and Federated Search
Preprocessing data within Elasticsearch with ingest nodes
Working with ingest pipeline
The ingest APIs
Creating a pipeline
Getting pipeline details
Deleting a pipeline
Simulating pipelines for debugging purposes
Handling errors in pipelines
Tagging errors within the same document and index
Indexing error prone documents in a different index
Ignoring errors altogether
Working with ingest processors
Append processor
Convert processor
Grok processor
Federated search
The test clusters
Creating the tribe node
Reading data with the tribe node
Master-level read operations
Writing data with the tribe node
Master-level write operations
Handling indices conflicts
Blocking write operations
Summary
10. Improving Performance
Query validation and profiling
Validating expensive queries before execution
Query profiling for detailed query execution reports
Understanding the profile API response
Consideration for profiling usage
Very hot threads
Usage clarification for the hot threads API
The hot threads API response
Scaling Elasticsearch
Vertical scaling
Horizontal scaling
Automatically creating replicas
Redundancy and high availability
Cost and performance flexibility
Continuous upgrades
Multiple Elasticsearch instances on a single physical machine
Preventing the shard and its replicas from being on the same node
Designated nodes' roles for larger clusters
Query aggregator nodes
Data nodes
Master eligible nodes
Using Elasticsearch for high load scenarios
General Elasticsearch-tuning advice
The index refresh rate
Thread pools tuning
Data distribution
Advice for high query rate scenarios
Node query cache and shard query cache
Think about the queries
Using routing
Parallelize your queries
Keeping size and shard_size under control
High indexing throughput scenarios and Elasticsearch
Bulk indexing
Keeping your document fields under control
The index architecture and replication
Tuning the write-ahead log
Thinking about storage
RAM buffer for indexing
Managing time-based indices efficiently using shrink and rollover APIs
The shrink API
Requirements for indices to be shrunk
Shrinking an index
Rollover API
Using the rollover API
Passing additional settings with a rollover request
Pattern for creating new index name
Summary
11. Developing Elasticsearch Plugins
Creating the Apache Maven project structure
Understanding the basics
The structure of the Maven Java project
The idea of POM
Running the build process
Introducing the assembly Maven plugin
Understanding the plugin descriptor file
Creating a custom REST action
The assumptions
Implementation details
Using the REST action class
The constructor
Handling requests
Writing responses
The plugin class
Informing Elasticsearch about our REST action
Time for testing
Building the REST action plugin
Installing the REST action plugin
Checking whether the REST action plugin works
Creating the custom analysis plugin
Implementation details
Implementing TokenFilter
Implementing the TokenFilter factory
Implementing the class custom analyzer
Implementing the analyzer provider
Implementing the analyzer plugin
Informing Elasticsearch about our custom analyzer
Testing our custom analysis plugin
Building our custom analysis plugin
Installing the custom analysis plugin
Checking whether our analysis plugin works
Summary
12. Introducing Elastic Stack 5.0
Overview of Elastic Stack 5.0
Introducing Logstash, Beats, and Kibana
Working with Logstash
Logstash architecture
Installing Logstash
Installing Logstash from binaries
Installing Logstash from APT repositories
Installing Logstash from YUM repositories
Configuring Logstash
Example - shipping system logs using Logstash
Starting Logstash
Introducing Beats as data shippers
Working with Metricbeat
Installing Metricbeat
Configuring Metricbeat
Running Metricbeat
Loading a sample Kibana dashboard into Elasticsearch
Working with Kibana
Installing Kibana
Kibana configuration
Starting Kibana
Exploring and visualizing data on Kibana
Understanding the Kibana Management screen
Discovering data on Kibana
Using the Dashboard screen to create/load dashboards
Using Sense
Summary

Mastering Elasticsearch 5.x - Third Edition

Mastering Elasticsearch 5.x - Third Edition

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2013

Second edition: February 2015

Third edition: February 2017

Production reference: 1160217

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham 

B3 2PB, UK.

ISBN 978-1-78646-018-9

www.packtpub.com

Credits

Author

Bharvi Dixit

Copy Editor

Safis Editing

Reviewer

Marcelo Ochoa

Project Coordinator

Nidhi Joshi 

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Divya Poojari

Indexer

Tejal Daruwale Soni

Content Development Editor

Cheryl Dsa

Graphics

Tania Dutta

Technical Editor

Prasad Ramesh

Production Coordinator

Nilesh Mohite

About the Author

Bharvi Dixit is an IT professional with extensive experience of working on search servers, NoSQL databases, and cloud services. He holds a master's degree in computer science and is currently working with Sentieo, a USA-based financial data and equity research platform, where he leads the overall platform and architecture of the company spanning across hundreds of servers. At Sentieo, he also plays a key role in the search and data team.

He is also the organizer of Delhi's Elasticsearch Meetup Group, where he speaks about Elasticsearch and Lucene and is continuously building the community around these technologies.

Bharvi also works as a freelance Elasticsearch consultant and has helped more than half a dozen organizations adapt Elasticsearch to solve their complex search problems around different use cases, such as creating search solutions for big data-automated intelligence platforms in the area of counter-terrorism and risk management, as well as in other domains, such as recruitment, e-commerce, finance, social search, and log monitoring.

He has a keen interest in creating scalable backend platforms. His other areas of interests are search engineering, data analytics, and distributed computing. Java and Python are the primary languages in which he loves to write code. He has also built a proprietary software for consultancy firms.

In 2013, he started working on Lucene and Elasticsearch, and in 2016, he authored his first book, Elasticsearch Essentials, which was published by Packt. He has also worked as a technical reviewer for the book Learning Kibana 5.0 by Packt.

You can connect with him on LinkedIn at https://in.linkedin.com/in/bharvidixit  or can follow him on Twitter @d_bharvi.

Acknowledgements

This is my second book on Elasticsearch, and I am really fascinated by the love and feedback I got from the readers of my first book, Elasticsearch Essentials. The book you are holding covers Elasticsearch 5.x, the release of Elasticsearch that brings a whole lot of features and improvements to this great search server. Hopefully, after reading this book, you will not only get to know the underlying architecture of Lucene and Elasticsearch, but also posses a command over many advanced concepts, such as scripting, improving cluster performance, writing custom Java-based plugins, and many more.

Now it is time to say thank you.

I would like to thank my family for their continuous support, especially my brother, Patanjali Dixit, who has been a pillar of strength for me at each step throughout my career. I extend my big thanks to Lavleen for the love, support, and encouragement she gave during all those days when I was busy writing this book or solving complex problems at work.

I would like to extend my thanks to the Packt team working on this book, including our technical reviewer. Without their incredible support, the book wouldn't have been as great as it is now.

I would also like to thank all the people I'm working with at Sentieo for all their love and for creating a culture that helps make work more fun. At Sentieo, I extend my special thanks to Atul Shah, who always inspired me to go into the intricacies of Lucene and Elasticsearch and solve some really complex problems using these technologies.

Finally, thanks to Shay Banon for creating Elasticsearch and to all the people who contributed to the libraries and modules published around this project.

Once again, thank you.

About the Reviewer

Marcelo Ochoa works at the system laboratory of Facultad de Ciencias Exactas of the Universidad Nacional del Centro de la Provincia de Buenos Aires and is the CTO at scotas, a company that specializes in near real-time search solutions using Apache Solr and Oracle. He divides his time between university jobs and external projects related to Oracle and big data technologies. He has worked on several Oracle-related projects, such as the translation of Oracle manuals and multimedia CBTs. His background is in database, network, web, and Java technologies. In the XML world, he is known as the developer of the DB Generator for the Apache Cocoon project. He has worked on open source projects such as DBPrism and DBPrism CMS, the Lucene-Oracle integration using the Oracle JVM Directory implementation, and the Restlet.org project, where he worked on the Oracle XDB Restlet Adapter, which is an alternative to writing native REST web services inside a database resident JVM. Since 2006, he has been part of an Oracle ACE program. Oracle ACEs are known for their strong credentials as Oracle community enthusiasts and advocates, with candidates nominated by ACEs in the Oracle technology and applications communities. He has coauthored Oracle Database Programming using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press, and has worked as a technical reviewer for several Packt books, such as Apache Solr 4 Cookbook, ElasticSearch Server, and others.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1786460181.

If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

Welcome to the world of Elasticsearch and Mastering Elasticsearch 5.x, Third Edition. While reading the book, you'll be taken through different topics—all connected to Elasticsearch. Please remember though that this book is not meant for beginners, and we really treat the book as a follow-up to Mastering Elasticsearch 5.x, Second Edition, which was based on Elasticsearch version 1.4.x. There is a lot of new content in the book since Elasticsearch has gone through many changes between versions 1.x and 5.x.

Throughout the book, we will discuss different topics related to Elasticsearch and Lucene. We start with an introduction to the world of Lucene and Elasticsearch to introduce you to the world of queries provided by Elasticsearch, where we discuss different topics related to queries, such as filtering and which query to choose in a particular situation. Of course, querying is not everything, and because of that, the book you are holding in your hands provides information on newly introduced aggregations and features that will help you give meaning to the data you have indexed in Elasticsearch indices and provide a better search experience for your users.

We have also decided to cover the approaches of data modeling and handling relational data in Elasticsearch along with taking you through the scripting module of Elasticsearch and show some examples of using the latest default scripting language, Painless.

Even though, for most users, querying and data analysis are the most interesting parts of Elasticsearch, they are not all that we need to discuss. Because of this, the book tries to bring you additional information when it comes to index architecture, such as choosing the right number of shards and replicas, adjusting the shard allocation behavior, and so on. We will also get into places where Elasticsearch meets Lucene, and we will discuss topics such as different scoring algorithms, choosing the right store mechanism, what the differences between them are, and why choosing the proper one matters.

Last but not least, we touch on the administration part of Elasticsearch by discussing discovery and recovery modules and the human-friendly cat API, which allows us to very quickly get relevant administrative information in a form that most humans should be able to read without parsing JSON responses. We also talk about ingest nodes, which allow you to preprocess data within Elasticsearch before indexing takes place and use tribe nodes, giving the ability to create federated searches across many nodes.

Because of the title of the book, we couldn't omit performance-related topics, and we decided to dedicate a whole chapter to it.

Just as with the second edition of the book, we decided to include a chapter dedicated to development of Elasticsearch plugins, showing you how to set up the Apache Maven project and develop two types of plugins—custom REST action and custom analysis.

At the end, we have included one chapter discussing the components of the complete Elastic Stack, and you should get a great overview of how to start with tools such as Logstash, Kibana, and Beats after reading the chapter.

If you think that you are interested in these topics after reading about them, we think this is a book for you, and hopefully, you will like the book after reading the last words of the summary in Chapter 12, Introducing Elastic Stack 5.0.

What this book covers

Chapter 1, Revisiting Elasticsearch and the Changes, guides you through how Apache Lucene works and will introduce you to Elasticsearch 5.x, describing the basic concepts and showing you the important changes in Elasticsearch from version 1.x to 5.x.

Chapter 2, The Improved Query DSL, describes the new default scoring algorithm, BM25, and how it would be better than the previous TF-IDF algorithm. In addition to that, it explains various Elasticsearch features, such as query rewriting, query templates, changes in query modules, and various queries to choose from in a given scenario.

Chapter 3, Beyond Full Text Search, describes queries about rescoring, multimatching control, and function score queries. In addition to that, this chapter covers the scripting module of Elasticsearch.

Chapter 4, Data Modeling and Analytics, discusses different approaches of data modeling in Elasticsearch and also covers how to handle relationships among documents using parent-child and nested data types, along with focusing on practical considerations. It further discusses the aggregation module of Elasticsearch for the purpose of data analytics.

Chapter 5, Improving the User Search Experience, focuses on topics for improving the user search experience using suggesters, which allows you to correct user-query spelling mistakes and build efficient autocomplete mechanisms. In addition to that, it covers how to improve query relevance and how to use synonyms to search.

Chapter 6, The Index Distribution Architecture, covers techniques for choosing the right amount of shards and replicas, how routing works, how shard allocation works, and how to alter its behavior. In addition to that, we discuss what query execution preference is and how it allows us to choose where the queries are going to be executed.

Chapter 7, Low-Level Index Control, describes how to alter Apache Lucene scoring and how to choose an alternative scoring algorithm. It also covers NRT searching and indexing and transaction log usage and allows you to understand segment merging and tune it for your use case along with the details about removed merge policies inside Elasticsearch 5.x. At the end of the chapter, you will also find information about IO throttling and Elasticsearch caching.

Chapter 8, Elasticsearch Administration, focuses on concepts related to administering Elasticsearch. It describes what the discovery, gateway, and recovery modules are, how to configure them, and why you should bother. We also describe what the cat API is and how to back up and restore your data to different cloud services (such as Amazon AWS and Microsoft Azure).

Chapter 9, Data Transformation and Federated Search, covers the latest feature of Elasticsearch 5, that is ingest node, which allows us to preprocess data into the Elasticsearch cluster itself before indexing. It further tells us about how federated search works with different clusters using tribe nodes.

Chapter 10, Improving Performance, discusses Elasticsearch performance improvements under different loads and what the right way of scaling production clusters is, along with covering the insights into garbage collections and hot threads issues and how to deal with them. It further covers query profiling and query benchmarking. In the end, it explains the general Elasticsearch cluster tuning advice under high query rate scenarios versus high indexing throughput scenarios.

Chapter 11, Developing Elasticsearch Plugins, covers Elasticsearch plugins' development by showing and describing in depth how to write your own REST action and language analysis plugin.

Chapter 12, Introducing Elastic Stack 5.0, introduces you to the components of Elastic Stack 5.0, covering Elasticsearch, Logstash, Kibana, and Beats.

What you need for this book

This book was written using Elasticsearch 5.0.x, and all the examples and functions should work with it. In addition to that, you'll need a command-line tool that allows you to send HTTP requests such as curl, which are available for most operating systems. Please note that all examples in this book use the mentioned curl tool. If you want to use another tool, please remember to format the request in an appropriate way that is understood by the tool of your choice.

In addition to that, to run examples in Chapter 11, Developing Elasticsearch Plugins, you will need a Java Development Kit (JDK) Version 1.8.0_73 and above installed and an editor that will allow you to develop your code (or a Java IDE such as Eclipse). To build the code and manage dependencies in Chapter 11, Developing Elasticsearch Plugins, we are using Apache Maven.

The last chapter of this book has been written using Elastic Stack 5.0.0, so you will need to have Logstash, Kibana, and Metricbeat, all comprising the same version.

Who this book is for

This book was written for Elasticsearch users and enthusiasts who are already familiar with the basic concepts of this great search server and want to extend their knowledge of Elasticsearch. It also covers topics such as how Apache Lucene or Elasticsearch works, along with getting aware of the changes from Elasticsearch 1.x to 5.x. In addition to that, readers who want to see how to improve their query relevancy and learn how to extend Elasticsearch with their own plugin may find this book interesting and useful.

If you are new to Elasticsearch and you are not familiar with basic concepts, such as querying and data indexing, you may find it a little difficult to use this book as most of the chapters assume that you have this knowledge already.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "but not the Elasticsearch term in the document field"

A block of code is set as follows:

public class CustomRestActionPlugin extends Plugin implements ActionPlugin {   @Override      public List<Class<? extends RestHandler>> getRestHandlers() {            return Collections.singletonList(CustomRestAction.class);      }  }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

curl -XGET 'localhost:9200/clients/_search?pretty' -d '{  "query" : {   "prefix" : {    "name" : {     "prefix" : "j",     "rewrite" : "constant_score_boolean"    }   }  } }'

Any command-line input or output is written as follows:

curl -XPUT 'localhost:9200/mastering_meta/_settings' -d '{ "index" : {  "auto_expand_replicas" : "0-all" }}

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "field and hit the Create button"

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-ElasticSearch-5.x-Third-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/MasteringElasticSearch5dotxThirdEdition_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Chapter 1.  Revisiting Elasticsearch and the Changes

Welcome to Mastering Elasticsearch 5.x, Third Edition. Elasticsearch has progressed rapidly from version 1.x, released in 2014, to version 5.x, released in 2016. During the two-and-a-half-year period since 1.0.0, adoption has skyrocketed, and both vendors and the community have committed bug-fixes, interoperability enhancements, and rich feature upgrades to ensure Elasticsearch remains the most popular NoSQL storage, indexing, and search utility for both structured and unstructured documents, as well as gaining popularity as a log analysis tool as part of the Elastic Stack.

We treat Mastering Elasticsearch as a book that will systematize your knowledge about Elasticsearch, and extend it by showing some examples of how to leverage your knowledge in certain situations. If you are looking for a book that will help you start your journey into the world of Elasticsearch, please take a look at Elasticsearch Essentials, also published by Packt.

Before going further into the book, we assume that you already know the basic concepts of Elasticsearch for performing operations such as how to index documents, how to send queries to get the documents you are interested in, how to narrow down the results of your queries by using filters, and how to calculate statistics for your data with the use of the aggregation mechanism. However, before getting to the exciting functionality that Elasticsearch offers, we think we should start with a quick overview of Apache Lucene, which is a full text search library that Elasticsearch uses to build and search its indices. We also need to make sure that we understand Lucene correctly, as Mastering Elasticsearch requires this understanding. By the end of this chapter, we will have covered the following topics:

An overview of Lucene and ElasticsearchIntroducing Elasticsearch 5.xLatest features introduced in ElasticsearchThe changes in Elasticsearch after 1.x

An overview of Lucene

In order to fully understand how Elasticsearch works, especially when it comes to indexing and query processing, it is crucial to understand how the Apache Lucene library works. Under the hood, Elasticsearch uses Lucene to handle document indexing. The same library is also used to perform a search against the indexed documents. In the next few pages, we will try to show you the basics of Apache Lucene, just in case you've never used it.

Lucene is a mature, open source, highly performing, scalable, light and, yet, very powerful library written in Java. Its core comes as a single file of the Java library with no dependencies, and allows you to index documents and search them with its out-of-the-box full text search capabilities. Of course, there are extensions to Apache Lucene that allow different language handling, and enable spellchecking, highlighting, and much more, but if you don't need those features, you can download a single file and use it in your application.

Getting deeper into the Lucene index

In order to fully understand Lucene, the following terminologies need to be understood first:

Document: This is a main data carrier used during indexing and search, containing one or more fields, which contains the data we put and get from Lucene.Field: This is a section of the document which is built of two parts: the name and the value.Term: This is a unit of search representing a word from the text.Token: This is an occurrence of a term from the text of the field. It consists of term text, start and end offset, and a type.

Inverted index

Apache Lucene writes all the information to the structure called the inverted index. It is a data structure that maps the terms in the index to the documents, not the other way round, as the relational database does. You can think of an inverted index as a data structure, where data is term oriented rather than document oriented.

Let's see how a simple inverted index can look. For example, let's assume that we have the documents with only the title field to be indexed, and they look like the following:

Elasticsearch Server (document 1)Mastering Elasticsearch (document 2)Elasticsearch Essentials (document 3)

So, the index (in a very simple way) could be visualized as shown in the following table:

Term

Count

Document : Position

Elasticsearch

3

1:1, 2:2, 3:1

Essentials

1

3:2

Mastering

1

2:1

Server

1

1:2

As you can see, each term points to the number of documents it is present in, along with its position. This allows for a very efficient and fast search such as term-based queries. In addition to this, each term has a number connected to it: the count, telling Lucene how often it occurs.

Segments

Each index is divided into multiple write once and read many times segments. When indexing, after a single segment is written to disk, it can't be updated. For example, the information about deleted documents is stored in a separate file, but the segment itself is not updated.

However, multiple segments can be merged together in a process called segments merge. After forcing, segments are merged, or after Lucene decides it is time for merging to be performed, segments are merged together by Lucene to create larger ones. This can be I/O demanding; however, it is needed to clean up some information because during that time some information that is not needed anymore is deleted; for example, the deleted documents. In addition to this, searching with the use of one larger segment is faster than searching against multiple smaller ones holding the same data.

Of course, the actual index created by Lucene is much more complicated and advanced, and consists of more than the terms, their counts, and documents, in which they are present. We would like to tell you about a few of these additional index pieces because even though they are internal, it is usually good to know about them, as they can be very useful.

Norms

A norm is a factor associated with each indexed document and stores normalization factors used to compute the score relative to the query. Norms are computed on the basis of index time boosts and are indexed along with the documents. With the use of norms, Lucene is able to provide an index time-boosting functionality at the cost of a certain amount of additional space needed for norms indexation and some amount of additional memory.

Term vectors

Term vectors are small inverted indices per document. They consist of pairs-a term and its frequency-and can optionally include information about the term position. By default, Lucene and Elasticsearch don't enable term vectors indexing, but some functionalities, such as the fast vector highlighting, require them to be present.

Posting formats

With the release of Lucene 4.0, the library introduced the so-called codec architecture, giving developers control over how the index files are written onto the disk. One of the parts of the index is the posting format, which stores fields, terms, documents, term positions and offsets, and, finally, the payloads (a byte array stored at an arbitrary position in the Lucene index, which can contain any information we want). Lucene contains different posting formats for different purposes; for example; one that is optimized for high cardinality fields such as the unique identifier.

Doc values

As we have already mentioned, the Lucene index is the so-called inverted index. However, for certain features, such as aggregations, such an architecture is not the best one. The mentioned functionality operates on the document level and not the term level because Elasticsearch needs to uninvert the index before calculations can be done. Because of that, doc values were introduced and an additional structure was used for sorting and aggregations. The doc values store uninverted data for a field that they are turned on for. Both Lucene and Elasticsearch allow us to configure the implementation used to store them, giving us the possibility of memory-based doc values, disk-based doc values, and a combination of the two. Doc values are default in Elasticsearch since the 2.x release.

Document analysis

When we index a document into Elasticsearch, it goes through an analysis phase which is necessary in order to create the inverted indexes. It is a series of steps performed by Lucene which are depicted in following image:

Analysis is done by the analyzer, which is built of a tokenizer and zero or more filters, and can also have zero or more character filters.

A tokenizer in Lucene is used to divide the text into tokens, which are basically terms with additional information, such as its position in the original text and its length. The result of the tokenizer work is a so-called token stream, where the tokens are put one by one and are ready to be processed by filters.

Apart from the tokenizer, the Lucene analyzer is built of zero or more filters that are used to process tokens in the token stream. For example, it can remove tokens from the stream, change them, or even produce new ones. There are numerous filters and you can easily create new ones. Some examples of filters are as follows:

Lowercase filter: This makes all the tokens lowercaseASCII folding filter: This removes non-ASCII parts from tokensSynonyms filter: This is responsible for changing one token to another on the basis of synonym rulesMultiple language stemming filters: These are responsible for reducing tokens (actually the text part that they provide) into their root or base forms, the stem

Filters are processed one after another, so we have almost unlimited analysis possibilities with adding multiple filters one after another.

The last thing is the character filtering, which is used before the tokenizer and is responsible for processing text before any analysis is done. One of the examples of the character filter is the HTML tags removal process.

This analysis phase is applied during query time also. However, you can also choose the other path and not analyze your queries. This is crucial to remember because some of the Elasticsearch queries are being analyzed and some are not. For example, the prefix query is not analyzed and the match query is analyzed.

What you should remember about indexing and querying analysis is that the index should be matched by the query term. If they don't match, Lucene won't return the desired documents. For example, if you are using stemming and lowercasing during indexing, you need to be sure that the terms in the query are also lowercased and stemmed, or your queries will return no results at all.

Basics of the Lucene query language

Some of the query types provided by Elasticsearch support Apache Lucene query parser syntax. Because of this, it is crucial to understand the Lucene query language.

A query is divided by Apache Lucene into terms and operators. A term, in Lucene, can be a single word or a phrase (a group of words surrounded by double quote characters). If the query is set to be analyzed, the defined analyzer will be used on each of the terms that form the query.

A query can also contain Boolean operators that connect terms to each other forming clauses. The list of Boolean operators is as follows:

AND: This means that the given two terms (left and right operand) need to match in order for the clause to be matched. For example, we would run a query, such as apache AND lucene, to match documents with both apache and lucene terms in a document field.OR: This means that any of the given terms may match in order for the clause to be matched. For example, we would run a query, such as apache OR lucene, to match documents with apache or lucene (or both) terms in a document field.NOT: This means that in order for the document to be considered a match, the term appearing after the NOT operator must not match. For example, we would run a query lucene NOT Elasticsearch to match documents that contain the lucene term, but not the Elasticsearch term in the document field.

In addition to these, we may use the following operators:

+: This means that the given term needs to be matched in order for the document to be considered as a match. For example, in order to find documents that match the lucene term and may match the apache term, we would run a query such as +lucene apache.-: This means that the given term can't be matched in order for the document to be considered a match. For example, in order to find a document with the lucene term, but not the Elasticsearch term, we would run a query such as +lucene -Elasticsearch.

When not specifying any of the previous operators, the default OR operator will be used.

In addition to all these, there is one more thing: you can use parentheses to group clauses together; for example, with something like the following query:

Elasticsearch AND (mastering OR book)

Querying fields

Of course, just like in Elasticsearch, in Lucene all your data is stored in fields that build the document. In order to run a query against a field, you need to provide the field name, add the colon character, and provide the clause that should be run against that field. For example, if you would like to match documents with the term Elasticsearch in the title field, you would run the following query:

title:Elasticsearch

You can also group multiple clauses. For example, if you would like your query to match all the documents having the Elasticsearch term and the mastering book phrase in the title field, you could run a query like the following code:

title:(+Elasticsearch +"mastering book")

The previous query can also be expressed in the following way:

+title:Elasticsearch +title:"mastering book"

Term modifiers

In addition to the standard field query with a simple term or clause, Lucene allows us to modify the terms we pass in the query with modifiers. The most common modifiers, which you will be familiar with, are wildcards. There are two wildcards supported by Lucene, the ? and * terms. The first one will match any character and the second one will match multiple characters.

In addition to this, Lucene supports fuzzy and proximity searches with the use of the ~ character and an integer following it. When used with a single word term, it means that we want to search for terms that are similar to the one we've modified (the so-called fuzzy search). The integer after the ~ character specifies the maximum number of edits that can be done to consider the term similar. For example, if we would run a query, such as writer~2, both the terms writer and writers would be considered a match.

When the ~ character is used on a phrase, the integer number we provide is telling Lucene how much distance between the words is acceptable. For example, let's take the following query:

title:"mastering Elasticsearch"

It would match the document with the title field containing mastering Elasticsearch, but not mastering book Elasticsearch. However, if we ran a query, such as title:"mastering Elasticsearch"~2, it would result in both example documents being matched.

We can also use boosting to increase our term importance by using the ^ character and providing a float number. Boosts lower than 1 would result in decreasing the document importance. Boosts higher than 1 would result in increasing the importance. The default boost value is 1. Please refer to the The changed default text scoring in Lucene - BM25 section in Chapter 2, The Improved Query DSL, for further information on what boosting is and how it is taken into consideration during document scoring.

In addition to all these, we can use square and curly brackets to allow range searching. For example, if we would like to run a range search on a numeric field, we could run the following query:

price:[10.00 TO 15.00]

The preceding query would result in all documents with the price field between 10.00 and 15.00 inclusive.

In case of string-based fields, we also can run a range query; for example name:[Adam TO Adria].

The preceding query would result in all documents containing all the terms between Adam and Adria in the name field including them.

If you would like your range bound or bounds to be exclusive, use curly brackets instead of the square ones. For example, in order to find documents with the price field between 10.00 inclusive and 15.00 exclusive, we would run the following query:

price:[10.00 TO 15.00}

If you would like your range bound from one side and not bound by the other, for example querying for documents with a price higher than 10.00, we would run the following query:

price:[10.00 TO *]

Handling special characters

In case you want to search for one of the special characters (which are +, -, &&, ||, !, (, ), { }, [ ], ^, ", ~, *, ?, :, \, /), you need to escape it with the use of the backslash (\) character. For example, to search for the abc"efg term you need to do something like abc"efg.

An overview of Elasticsearch

Although we've said that we expect the reader to be familiar with Elasticsearch, we would really like to give you a short introduction to the concepts of this great search engine.

As you probably know, Elasticsearch is a distributed full text search and analytic engine that is built on top of Lucene to build search and analysis-oriented applications. It was originally started by Shay Banon and published in February 2010. Since then, it has rapidly gained popularity within just a few years and has become an important alternative to other open source and commercial solutions. It is one of the most downloaded open source projects.

The key concepts

There are a few concepts that come with Elasticsearch, and their understanding is crucial to fully understand how Elasticsearch works and operates:

Index: A logical namespace under which Elasticsearch stores data and may be built with more than one Lucene index using shards and replicas.Document: A document is a JSON object that contains the actual data in key value pairs. It is very important to understand that when a field is indexed for the first time into the index, Elasticsearch creates a data type for that field. Starting from version 2.x, a very strict type checking gets done.Type: A doc type in Elasticsearch represents a class of similar documents. A type consists of a name such as a user or a blog post, and a mapping including data types and the Lucene configurations for each field.Mapping: As already mentioned in the An overview of Lucene section, all documents are analyzed before being indexed. We can configure how the input text is divided into tokens, which tokens should be filtered out, or what additional processing, such as removing HTML tags, is needed. This is where mapping comes into play-it holds all the information about the analysis chain. Besides the fact that Elasticsearch can automatically discover a field type by looking at its value, in most cases we will want to configure the mappings ourselves to avoid unpleasant surprises.Node: A single instance of Elasticsearch running on a machine. Elasticsearch nodes can serve different purposes. Of course, Elasticsearch is designed to index and search our data, so the first type of node is the data node. Such nodes hold the data and search on them. The second type of node is the master node-a node that works as a supervisor of the cluster controlling other nodes' work. The third node type is the client node, which is used as a query router. The fourth type of node is the tribe node, which was introduced in Elasticsearch 1.0. The tribe node can join multiple clusters and thus act as a bridge between them, allowing us to execute almost all Elasticsearch functionalities on multiple clusters just like we would by using a single cluster. Elasticsearch 5.0 has also introduced a new type of node called the ingest node, which can be used for data transformation before the data gets indexed.Cluster: A cluster is a single name under which one or more nodes/instances of Elasticsearch are connected to each other.Shard: Shards are containers that can be stored on a single node or multiple nodes and are composed of Lucene segments. An index is divided into one or more shards to make the data distributable. For the index, shards once created cannot be increased or decreased.

Note

A shard can be either primary or secondary. A primary shard is the one where all the operations that change the index are directed. A secondary shard is the one that contains duplicate data of the primary shard and helps in quickly searching data as well as in high availability; in case the machine that holds the primary shard goes down, then the secondary shard becomes the primary shard automatically.

Replica: A duplicate copy of the data living in a shard for high availability. Having a replica also provides a faster search experience.

Working of Elasticsearch

Elasticsearch uses the zen discovery module for cluster formation. In 1.x, multicast was the default discovery used in Elasticsearch, but in 2.x unicast became the default discovery type. Although, multicast was available in Elasticsearch 2.x as a plugin. Multicast support has completely been removed from Elasticsearch 5.0

When an Elasticsearch node starts, it performs discovery and searches for the list of unicast hosts (master eligible nodes), which are configured in the elasticsearch.yml configuration file using the discovery.zen.ping.unicast.hosts parameter. By default, the default list of unicast hosts is ["127.0.0.1", "[::1]"] so that each node, when starting, does not form a cluster only with itself. We will have a detailed section on zen discovery and node configurations in Chapter 8, Elasticsearch Administration.

Introducing Elasticsearch 5.x

In 2015, Elasticsearch, after acquiring Kibana, Logstash, Beats, and Found, re-branded the company name as Elastic. According to Shay Banon, the name change is part of an initiative to better align the company with the broad solutions it provides: future products, and new innovations created by Elastic's massive community of developers and enterprises that utilize the ELK stack for everything from real-time search, to sophisticated analytics, to building modern data applications.

But having several products under one hood resulted in discord among them during the release process and started creating confusion for the users. This resulted in the ELK stack being renamed to Elastic Stack and the company decided to keep releasing all components of the Elastic Stack together. This is so that they will all share the same version number for all the products to keep speed with your deployments, simplify compatibility testing, and make it even easier for developers to add new functionality across the stack.

The very first GA release under Elastic stack is 5.0.0, which will be covered throughout this book. Further, Elasticsearch keeps pace with Lucene version releases to incorporate bug fixes and the latest features into Elasticsearch. Elasticsearch 5.0 is based on Lucene 6, which is a major release from Lucene with some awesome new features and a focus on improving the search speed. We will discuss Lucene 6 in upcoming chapters to let you know how Elasticsearch is going to have some awesome improvements, both from search and storage points of view.

Introducing new features in Elasticsearch

Elasticsearch 5.x has many improvements and has gone through a great refactoring, which caused removal/deprecation of some features. We will keep discussing the removed/improved/new features in upcoming chapters, but for now let's take an overview of the new and improved things in Elasticsearch.

New features in Elasticsearch 5.x

Following are some of the most important features introduced in Elasticsearch version 5.0:

Ingest node: This node is a new type of node in Elasticsearch, which can be used for simple data transformation and enrichment before actual data indexing takes place. The best thing is that any node can be configured to act as an ingest node and it is very lighter across the board. You can avoid Logstash for these tasks because the ingest node is a Java based implementation of the Logstash filter and comes as a default in Elasticsearch itself.Index shrinking: By design, once an index is created, there is no provision of reducing the number of shards for that index and this brings a lot of challenges since each shard consumes some resources. Although this design still remains same, to make life easier for users, Elasticsearch has introduced a new _shrink API to overcome this problem. This API allows you to shrink an existing index into a newer index with a fewer number of shards.

Note

We will cover the ingest node and shrink API in detail under Chapter 9, Data Transformation and Federated Search.

Painless scripting language: In Elasticsearch, scripting has always been a matter of concern because of its slowness and for security reasons. Elasticsearch 5.0 includes a new scripting language called Painless, which has been designed to be fast and secure. Painless is still going through lots of improvements to make it more awesome and easily adaptable. We will cover it under Chapter 3, Beyond Full Text Search.Instant aggregations: Queries have been completely refactored in 5.0; they are now parsed on the coordinating node and serialized to different nodes in a binary format. This allows Elasticsearch to be much more efficient, with more cache-able queries, especially on data separated into time-based indices. This will cause a significant speed up for aggregations.A new completion suggester: The completion suggester has undergone a complete rewrite. This means that the syntax and data structure for fields of type completion have changed, as have the syntax and response of the completion suggester requests. The completion suggester is now built on top of the first iteration of Lucene's new suggest API.Multi-dimensional points: This is one of the most exciting features of Lucene 6, which empowers Elasticsearch 5.0. It is built using the k-d tree geospatial data structure to offer a fast single- and multi-dimensional numeric range and a geospatial point-in-shape filtering. A multi-dimensional point helps in reducing disk storage, memory utilization, and faster searches.Delete by Query API: After much demand from the community, Elasticsearch has finally provided the ability to delete documents based on a matching query using the _delete_by_query REST endpoint.

New features in Elasticsearch 2.x

Apart from the features discussed just now, you can also benefit from all of the new features that came in Elasticsearch version 2.x. For those who have not had a look at the 2.x series, let's have a quick revamp of the new features which came with Elasticsearch under this series:

Reindex API: In Elasticsearch, re-indexing of documents is almost needed by every user, under several scenarios. The _reindex