Elasticsearch Essentials - Bharvi Dixit - E-Book

Elasticsearch Essentials E-Book

Bharvi Dixit

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide

About This Book

  • New to ElasticSearch? Here's what you need—a highly practical guide that gives you a quick start with ElasticSearch using easy-to-follow examples; get up and running with ElasticSearch APIs in no time
  • Get the latest guide on ElasticSearch 2.0.0, which contains concise and adequate information on handling all the issues a developer needs to know while handling data in bulk with search relevancy
  • Learn to create large-scale ElasticSearch clusters using best practices
  • Learn from our experts—written by Bharvi Dixit who has extensive experience in working with search servers (especially ElasticSearch)

Who This Book Is For

Anyone who wants to build efficient search and analytics applications can choose this book. This book is also beneficial for skilled developers, especially ones experienced with Lucene or Solr, who now want to learn Elasticsearch quickly.

What You Will Learn

  • Get to know about advanced Elasticsearch concepts and its REST APIs
  • Write CRUD operations and other search functionalities using the ElasticSearch Python and Java clients
  • Dig into wide range of queries and find out how to use them correctly
  • Design schema and mappings with built-in and custom analyzers
  • Excel in data modeling concepts and query optimization
  • Master document relationships and geospatial data
  • Build analytics using aggregations
  • Setup and scale Elasticsearch clusters using best practices
  • Learn to take data backups and secure Elasticsearch clusters

In Detail

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store.

This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we'll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.

Style and approach

This is an easy-to-follow guide with practical examples and clear explanations of the concepts. This fast-paced book believes in providing very rich content focusing majorly on practical implementation. This book will provide you with step-by-step practical examples, letting you know about the common errors and solutions along with ample screenshots and code to ensure your success.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 247

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Elasticsearch Essentials
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with Elasticsearch
Introducing Elasticsearch
The primary features of Elasticsearch
Understanding REST and JSON
What is REST?
What is JSON?
Elasticsearch common terms
Understanding Elasticsearch structure with respect to relational databases
Installing and configuring Elasticsearch
Installing Elasticsearch on Ubuntu through Debian package
Installing Elasticsearch on Centos through the RPM package
Understanding the Elasticsearch installation directory layout
Configuring basic parameters
Adding another node to the cluster
Installing Elasticsearch plugins
Checking for installed plugins
Installing the Head plugin for Elasticsearch
Installing Sense for Elasticsearch
Basic operations with Elasticsearch
Creating an Index
Indexing a document in Elasticsearch
Fetching documents
Get a complete document
Getting part of a document
Updating documents
Updating a whole document
Updating documents partially
Deleting documents
Checking documents' existence
Summary
2. Understanding Document Analysis and Creating Mappings
Text search
TF-IDF
Inverted indexes
Document analysis
Introducing Lucene analyzers
Creating custom analyzers
Changing a default analyzer
Putting custom analyzers into action
Elasticsearch mapping
Document metadata fields
Data types and index analysis options
Configuring data types
String
Number
Date
Boolean
Arrays
Objects
Indexing the same field in different ways
Putting mappings in an index
Viewing mappings
Updating mappings
Summary
3. Putting Elasticsearch into Action
CRUD operations using elasticsearch-py
Setting up the environment
Installing Pip
Installing virtualenv
Installing elasticsearch-py
Performing CRUD operations
Request timeouts
Creating indexes with settings and mappings
Indexing documents
Retrieving documents
Updating documents
Replacing the value of a field completely
Appending a value in an array
Updates using doc
Checking document existence
Deleting a document
CRUD operations using Java
Connecting with Elasticsearch
Indexing a document
Fetching a document
Updating a document
Updating a document using doc
Updating a document using script
Deleting documents
Creating a search database
Elasticsearch Query-DSL
Understanding Query-DSL parameters
Query types
Full-text search queries
match_all
match query
Phrase search
multi match
query_string
Term-based search queries
Term query
Terms query
Range queries
Exists queries
Missing queries
Compound queries
Bool queries
Not queries
Search requests using Python
Search requests using Java
Parsing search responses
Sorting your data
Sorting documents by field values
Sorting on more than one field
Sorting multivalued fields
Sorting on string fields
Document routing
Summary
4. Aggregations for Analytics
Introducing the aggregation framework
Aggregation syntax
Extracting values
Returning only aggregation results
Metric aggregations
Computing basic stats
Combined stats
Computing stats separately
Computing extended stats
Finding distinct counts
Bucket aggregations
Terms aggregation
Range aggregation
Date range aggregation
Histogram aggregation
Date histogram aggregation
Filter-based aggregation
Combining search, buckets, and metrics
Memory pressure and implications
Summary
5. Data Looks Better on Maps: Master Geo-Spatiality
Introducing geo-spatial data
Working with geo-point data
Mapping geo-point fields
Indexing geo-point data
Querying geo-point data
Geo distance query
Geo distance range query
Geo bounding box query
Understanding bounding boxes
Sorting by distance
Geo-aggregations
Geo distance aggregation
Using bounding boxes with geo distance aggregation
Geo-shapes
Point
Linestring
Circles
Polygons
Envelops
Mappings geo-shape fields
Indexing geo-shape data
Querying geo-shape data
Summary
6. Document Relationships in NoSQL World
Relational data in the document-oriented NoSQL world
Managing relational data in Elasticsearch
Working with nested objects
Creating nested mappings
Indexing nested data
Querying nested type data
Nested aggregations
Nested aggregation
Understanding nested aggregation syntax:
Reverse nested aggregation
Parent-child relationships
Creating parent-child mappings
Indexing parent-child documents
Querying parent-child documents
has_child query
has_parent query
Considerations for using document relationships
Summary
7. Different Methods of Search and Bulk Operations
Introducing search types in Elasticsearch
Cheaper bulk operations
Bulk create
Bulk indexing
Bulk updating
Bulk deleting
Multi get and multi search APIs
Multi get
Multi searches
Data pagination
Pagination with scoring
Pagination without scoring
Scrolling and re-indexing documents using scan-scroll
Practical considerations for bulk processing
Summary
8. Controlling Relevancy
Introducing relevant searches
The Elasticsearch out-of-the-box tools
An example: why defaults are not enough
Controlling relevancy with custom scoring
The function_score query
weight
field_value_factor
script_score
Decay functions - linear, exp, and gauss
Summary
9. Cluster Scaling in Production Deployments
Node types in Elasticsearch
Client node
Data node
Master node
Introducing Zen-Discovery
Multicasting discovery
Unicasting discovery
Configuring unicasting discovery
Minimum number of master nodes: preventing split-brain
An initial list of hosts to ping
Ping timeout
Node upgrades without downtime
Upgrading Elasticsearch version
Best Elasticsearch practices in production
Creating a cluster
Scaling your clusters
When to scale
Metrics to watch
CPU utilization
Memory utilization
Disk I/O utilization
Disk low watermark
How to scale
Summary
10. Backups and Security
Introducing backup and restore mechanisms
Backup using snapshot API
Creating an NFS drive
Configuring the NFS host server
Configuring client machines
Creating a snapshot
Registering the repository path
Registering the shared file system repository in Elasticsearch
Create your first snapshot
Getting snapshot information
Deleting snapshots
Restoring snapshots
Restoring multiple indices
Renaming indices
Partial restore
Changing index settings during restore
Restoring to a different cluster
Manual backups
Manual restoration
Securing Elasticsearch
Setting up basic HTTP authentication
Setting up Nginx
Securing critical access
Restricting DELETE requests
Restricting endpoints
Load balancing using Nginx
Summary
Index

Elasticsearch Essentials

Elasticsearch Essentials

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference: 1250116

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-101-0

www.packtpub.com

Credits

Author

Bharvi Dixit

Reviewer

Alberto Paro

Commissioning Editor

Pramila Balan

Acquisition Editor

Sonali Vernekar

Content Development Editor

Kirti Patil

Technical Editor

Ryan Kochery

Copy Editor

Kausambhi Majumdar

Project Coordinator

Nidhi Joshi

Proofreader

Safis Editing

Indexer

Tejal Daruwale Soni

Graphics

Abhinash Sahu

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Bharvi Dixit is an IT professional with an extensive experience of working on the search servers (especially Elasticsearch) and NoSQL databases. He is currently working as a technology and search expert with GrownOut, a SAAS-based referral hiring solution provider company. He is the organizer and speaker of Delhi's Elasticsearch Meetup Group, which is one of the fastest growing Elasticsearch communities in India.

He also works as a freelance Elasticsearch consultant and has helped many small to medium size organizations in adapting Elasticsearch for different use cases such as, creating search solutions for big data-automated intelligence platforms in the area of counter-terrorism and risk management as well as in other domains such as recruitment, e-commerce, finance and log monitoring.

He holds a master's degree in computer science from LBSIM - Delhi, India, and has a keen interest in creating scalable backend platforms. His other interest area are data analytics, distributed computing, automations, and DevOps. Java and Python are the primary languages in which he loves to write code, and he has already built a proprietary software for consultancy firms.

In his spare time, he loves writing blogs and reading the latest technology books. He can be connected through LinkedIn at https://in.linkedin.com/in/bharvidixit.

Acknowledgments

I would like to thank my family for their continuous support, specially my brother, Patanjali Dixit, who always guided me at each step throughout my career. I would also like to give a big thanks to Lavleen for the support, patience, and encouragement she gave during all those days when I was busy writing this book.

I would like to extend my thanks to all of the Packt team working on this book and our technical reviewer, Alberto Paro. Without them, the book wouldn't have been as great as it is now. It was one of the best team i have worked with.

Finally, special thanks to Shay Banon for creating Elasticsearch and to all the people who contributed to the libraries and modules published around this project.

Once again, thank you.

About the Reviewer

Alberto Paro is an engineer, project manager, and software developer. He currently works as a CTO at Big Data Technologies and as a freelance international consultant on software engineering for big data and NoSQL solutions. He loves to study emerging solutions and applications mainly related to Big Data processing, NoSQL, natural language processing, and neural networks. He began programming in BASIC on a Sinclair Spectrum when he was eight years old, and he has a lot of experience of using different operating systems, applications, and programming languages.

In 2000, he graduated in computer science engineering from Politecnico di Milano with a thesis on designing multiuser and multidevice web applications. He assisted the professors at the university for about a year. Then, he came in contact with The Net Planet Company and loved their innovative ideas; he started working on knowledge management solutions and advanced data mining products. In the summer of 2014, his company was acquired by Big Data technologies, where he currently works and uses mainly Scala and Python on state-of-the-art Big Data software (Spark, Akka, Cassandra, and YARN). In 2013, he started freelancing as a consultant for Big Data technologies, machine learning, and Elasticsearch.

In his spare time, when he is not playing with his children, he likes to work on open source projects. When he was in high school, he started contributing to projects related to the GNOME environment (gtkmm). One of his preferred programming languages is Python, and he wrote one of the first NoSQL backends on Django for MongoDB (Django-MongoDB-engine). He is also a fan of the Scala language and enjoys spreading his love of technology: he was a presenter of Big Data concepts at Scala Day Italy 2015 on Scala.JS and Big Data Tech Italian Conference in Florence.

In 2010, he began using Elasticsearch to provide search capabilities to some Django e-commerce sites and developed PyES (a Pythonic client for Elasticsearch), as well as the initial part of the Elasticsearch MongoDB driver. He is the author of ElasticSearch Cookbook and ElasticSearch Cookbook Second Edition as well as a technical reviewer of Elasticsearch Server, Second Edition, and the video course, Building a Search Server with ElasticSearch, all of which have been published by Packt Publishing.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. Elasticsearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazingly fast search solutions over a massive amount of data, but can also serve as a NoSQL data store.

Elasticsearch Essentials will guide you to become a competent developer quickly with a solid knowledge and understanding of the Elasticsearch core concepts. In the beginning, this book will cover the fundamental concepts required to start working with Elasticsearch and then it will take you through more advanced concepts of search techniques and data analytics.

This book provides complete coverage of working with Elasticsearch using Python and Java APIs to perform CRUD operations, aggregation-based analytics, handling document relationships, working with geospatial data, and controlling search relevancy.

In the end, you will not only learn about scaling Elasticsearch clusters in production, but also how to secure Elasticsearch clusters and take data backups using best practices.

What this book covers

Chapter 1, Getting Started with Elasticsearch, provides an introduction to Elasticsearch and how it works. After going through the basic concepts and terminologies, you will learn how to install and configure Elasticsearch and perform basic operations with Elasticsearch.

Chapter 2, Understanding Document Analysis and Creating Mappings, covers the details of the built-in analyzers, tokenizers, and filters provided by Lucene. It also covers how to create custom analyzers and mapping with different data types.

Chapter 3, Putting Elasticsearch into Action, introduces Elasticsearch Query-DSL, various queries, and the data sorting techniques. You will also learn how to perform CRUD operations with Elasticsearch using Elasticsearch Python and Java clients.

Chapter 4, Aggregations for Analytics, is all about the Elasticsearch aggregation framework for building analytics on data. It provides many fundamental as well complex examples of data analytics that can be built using a combination of full-text search, term-based search, and multi level aggregations. The user will master the aggregation module of Elasticsearch by learning a complete set of practical code examples that are covered using Python and Java clients.

Chapter 5, Data Looks Better on Maps: Master Geo-Spatiality, discusses geo-data concepts and covers the rich geo-search functionalities offered by Elasticsearch including how to create mappings for geo-points and geo-shapes data, indexing documents, geo-aggregations, and sorting data based on geo-distance. It includes code examples for the most widely used geo-queries in both Python and Java.

Chapter 6, Document Relationships in NoSQL World, focuses on the techniques offered by Elasticsearch to handle relational data using nested and parent-child relationships and creating a schema for the same using real-world examples. The reader will also learn how to create mappings based on relational data and write code for indexing and querying data using Python and Java APIs.

Chapter 7, Different Methods of Search and Bulk Operations, covers the different types of search and bulk APIs that every programmer needs to know while developing applications and working with large data sets. You will learn examples of bulk processing, multi-searches, and faster data reindexing using both Python and Java, which will help you throughout your journey with Elasticsearch.

Chapter 8, Controlling Relevancy, discusses the most important aspect of search engines—relevancy. It covers the powerful scoring capabilities available in Elasticsearch and practical examples that show how you can control the scoring process according to your needs.

Chapter 9, Cluster Scaling in Production Deployments, shows how to create Elasticsearch clusters and configure different types of nodes with the right resource allocations. It also focuses on cluster scalability using the best practices in production environment.

Chapter 10, Backups and Security, focuses on the different mechanisms of creating data backups of an Elasticsearch cluster and restoring them back into the same or an other cluster. A step-by-step guide to setting up NFS (Network File System) is also provided. Finally, you will learn about setting up Nginx to secure Elasticsearch and load balance requests.

What you need for this book

This book was written using Elasticsearch version 2.0.0, and all the examples and functions should work with it. Using Oracle Java 1.7 u55 and above is recommended for creating Elasticsearch clusters. In addition to this, you'll need a command that allows you to send HTTP requests, such as curl, which is available for most operating systems. In addition to this, this book covers all the examples using Python and Java.

For Java examples, you will need to have Java JDK (Java Development Kit) installed and an editor that will allow you to develop your code (such as Eclipse). Apache Maven has been used to build Java codes.

To run the Python examples, you will need Python 2.7 and above and will also need to install Elasticsearch-Py, the official Python client for Elasticsearch.

In addition to this, some chapters may require additional software such as Elasticsearch plugins and other software but it has been explicitly mentioned when certain types of software are needed.

Who this book is for

Anyone who wants to build efficient search and analytics applications can choose this book. It is also beneficial for skilled developers, especially ones experienced with Lucene or Solr, who now want to learn Elasticsearch quickly. A basic knowledge of Python or Java and Linux is expected.

In addition to this, readers who want to see how to improve their query relevancy, and how to use Elasticsearch Java and Python API, may find this book interesting and useful.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "REST endpoints also enable users to make changes in clusters and indices settings dynamically rather than manually pushing configuration updates to all the nodes in a cluster by editing the elasticsearch.yml file and restarting the node."

A block of code is set as follows:

{ "int_array": [1, 2,3], "string_array": ["Lucene" ,"Elasticsearch","NoSQL"], "boolean": true, "null": null, "number": 123, "object": { "a": "b", "c": "d", "e": "f" }, "string": "Learning Elasticsearch" }

Any command-line input or output is written as follows:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.debsudo dpkg -i elasticsearch-2.0.0.deb

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/B03461_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Getting Started with Elasticsearch

Nowadays, search is one of the primary functionalities needed in every application; it can be fulfilled by Elasticsearch, which also has many other extra features. Elasticsearch, which is built on top of Apache Lucene, is an open source, distributable, and highly scalable search engine. It provides extremely fast searches and makes data discovery easy.

In this chapter, we will cover the following topics:

Concepts and terminologies related to ElasticsearchRest API and the JSON data structureInstalling and configuring ElasticsearchInstalling the Elasticsearch pluginsBasic operations with Elasticsearch

Introducing Elasticsearch

Elasticsearch is a distributed, full text search and analytic engine that is build on top of Lucene, a search engine library written in Java, and is also a base for Solr. After its first release in 2010, Elasticsearch has been widely adopted by large as well as small organizations, including NASA, Wikipedia, and GitHub, for different use cases. The latest releases of Elasticsearch are focusing more on resiliency, which builds confidence in users being able to use Elasticsearch as a data storeage tool, apart from using it as a full text search engine. Elasticsearch ships with sensible default configurations and settings, and also hides all the complexities from beginners, which lets everyone become productive very quickly by just learning the basics.

The primary features of Elasticsearch

Lucene is a blazing fast search library but it is tough to use directly and has very limited features to scale beyond a single machine. Elasticsearch comes to the rescue to overcome all the limitations of Lucene. Apart from providing a simple HTTP/JSON API, which enables language interoperability in comparison to Lucene's bare Java API, it has the following main features:

Distributed: Elasticsearch is distributed in nature from day one, and has been designed for scaling horizontally and not vertically. You can start with a single-node Elasticsearch cluster on your laptop and can scale that cluster to hundreds or thousands of nodes without worrying about the internal complexities that come with distributed computing, distributed document storage, and searches.High Availability: Data replication means having multiple copies of data in your cluster. This feature enables users to create highly available clusters by keeping more than one copy of data. You just need to issue a simple command, and it automatically creates redundant copies of the data to provide higher availabilities and avoid data loss in the case of machine failure.REST-based: Elasticsearch is based on REST architecture and provides API endpoints to not only perform CRUD operations over HTTP API calls, but also to enable users to perform cluster monitoring tasks using REST APIs. REST endpoints also enable users to make changes to clusters and indices settings dynamically, rather than manually pushing configuration updates to all the nodes in a cluster by editing the elasticsearch.yml file and restarting the node. This is possible because each resource (index, document, node, and so on) in Elasticsearch is accessible via a simple URI.Powerful Query DSL: Query DSL (domain-specific language) is a JSON interface provided by Elasticsearch to expose the power of Lucene to write and read queries in a very easy way. Thanks to the Query DSL, developers who are not aware of Lucene query syntaxes can also start writing complex queries in Elasticsearch.Schemaless: Being schemaless means that you do not have to create a schema with field names and data types before indexing the data in Elasticsearch. Though it is one of the most misunderstood concepts, this is one of the biggest advantages we have seen in many organizations, especially in e-commerce sectors where it's difficult to define the schema in advance in some cases. When you send your first document to Elasticsearch, it tries its best to parse every field in the document and creates a schema itself. Next time, if you send another document with a different data type for the same field, it will discard the document. So, Elasticsearch is not completely schemaless but its dynamic behavior of creating a schema is very useful.

Note

There are many more features available in Elasticsearch, such as multitenancy and percolation, which will be discussed in detail in the next chapters.

Understanding REST and JSON

Elasticsearch is based on a REST design pattern and all the operations, for example, document insertion, deletion, updating, searching, and various monitoring and management tasks, can be performed using the REST endpoints provided by Elasticsearch.

What is REST?

In a REST-based web API, data and services are exposed as resources with URLs. All the requests are routed to a resource that is represented by a path. Each resource has a resource identifier, which is called as URI. All the potential actions on this resource can be done using simple request types provided by the HTTP protocol. The following are examples that describe how CRUD operations are done with REST API:

To create the user, use the following:
POST /userfname=Bharvi&lname=Dixit&age=28&id=123
The following command is used for retrieval:
GET /user/123
Use the following to update the user information:
PUT /user/123fname=Lavleen
To delete the user, use this:
DELETE /user/123

Note

Many Elasticsearch users get confused between the POST and PUT request types. The difference is simple. POST is used to create a new resource, while PUT is used to update an existing resource. The PUT request is used during resource creation in some cases but it must have the complete URI available for this.

What is JSON?

All the real-world data comes in object form. Every entity (object) has some properties. These properties can be in the form of simple key value pairs or they can be in the form of complex data structures. One property can have properties nested into it, and so on.

Elasticsearch is a document-oriented data store where objects, which are called as documents, are stored and retrieved in the form of JSON. These objects are not only stored, but also the content of these documents gets indexed to make them searchable.

JavaScript Object Notation (JSON) is a lightweight data interchange format and, in the NoSQL world, it has become a standard data serialization format. The primary reason behind using it as a standard format is the language independency and complex nested data structure that it supports. JSON has the following data type support:

Array, Boolean, Null, Number, Object, and String

The following is an example of a JSON object, which is self-explanatory about how these data types are stored in key value pairs:

{ "int_array": [1, 2,3], "string_array": ["Lucene" ,"Elasticsearch","NoSQL"], "boolean": true, "null": null, "number": 123, "object": { "a": "b", "c": "d", "e": "f" }, "string": "Learning Elasticsearch" }

Elasticsearch common terms

The following are the most common terms that are very important to know when starting with Elasticsearch:

Node: A single instance of Elasticsearch running on a machine.Cluster: A cluster is the single name under which one or more nodes/instances of Elasticsearch are connected to each other.Document: A document is a JSON object that contains the actual data in key value pairs.Index: A logical namespace under which Elasticsearch stores data, and may be built with more than one Lucene index using shards and replicas.Doc types: A doc type in Elasticsearch represents a class of similar documents. A type consists of a name, such as a user or a blog post, and a mapping, including data types and the Lucene configurations for each field. (An index can contain more than one type.)Shard: Shards are containers that can be stored on a single node or multiple nodes and are composed of Lucene segments. An index is divided into one or more shards to make the data distributable.

Note

A shard can be either primary or secondary. A primary shard is the one where all the operations that change the index are directed. A secondary shard is the one that contains duplicate data of the primary shard and helps in quickly searching the data as well as for high availability; in a case where the machine that holds the primary shard goes down, then the secondary shard becomes the primary automatically.

Replica: A duplicate copy of the data living in a shard for high availability.

Understanding Elasticsearch structure with respect to relational databases

Elasticsearch is a search