E-Book
34,79 €

Elasticsearch Essentials E-Book

Bharvi Dixit

0,0

34,79 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide

About This Book

New to ElasticSearch? Here's what you need—a highly practical guide that gives you a quick start with ElasticSearch using easy-to-follow examples; get up and running with ElasticSearch APIs in no time
Get the latest guide on ElasticSearch 2.0.0, which contains concise and adequate information on handling all the issues a developer needs to know while handling data in bulk with search relevancy
Learn to create large-scale ElasticSearch clusters using best practices
Learn from our experts—written by Bharvi Dixit who has extensive experience in working with search servers (especially ElasticSearch)

Who This Book Is For

Anyone who wants to build efficient search and analytics applications can choose this book. This book is also beneficial for skilled developers, especially ones experienced with Lucene or Solr, who now want to learn Elasticsearch quickly.

What You Will Learn

Get to know about advanced Elasticsearch concepts and its REST APIs
Write CRUD operations and other search functionalities using the ElasticSearch Python and Java clients
Dig into wide range of queries and find out how to use them correctly
Design schema and mappings with built-in and custom analyzers
Excel in data modeling concepts and query optimization
Master document relationships and geospatial data
Build analytics using aggregations
Setup and scale Elasticsearch clusters using best practices
Learn to take data backups and secure Elasticsearch clusters

In Detail

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store.

This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we'll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.

Style and approach

This is an easy-to-follow guide with practical examples and clear explanations of the concepts. This fast-paced book believes in providing very rich content focusing majorly on practical implementation. This book will provide you with step-by-step practical examples, letting you know about the common errors and solutions along with ample screenshots and code to ensure your success.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 247

Veröffentlichungsjahr: 2016

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Elasticsearch Essentials

Credits

About the Author

Acknowledgments

About the Reviewer

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Elasticsearch

Introducing Elasticsearch

The primary features of Elasticsearch

Understanding REST and JSON

What is REST?

What is JSON?

Elasticsearch common terms

Understanding Elasticsearch structure with respect to relational databases

Installing and configuring Elasticsearch

Installing Elasticsearch on Ubuntu through Debian package

Installing Elasticsearch on Centos through the RPM package

Understanding the Elasticsearch installation directory layout

Configuring basic parameters

Adding another node to the cluster

Installing Elasticsearch plugins

Checking for installed plugins

Installing the Head plugin for Elasticsearch

Installing Sense for Elasticsearch

Basic operations with Elasticsearch

Creating an Index

Indexing a document in Elasticsearch

Fetching documents

Get a complete document

Getting part of a document

Updating documents

Updating a whole document

Updating documents partially

Deleting documents

Checking documents' existence

Summary

2. Understanding Document Analysis and Creating Mappings

Text search

TF-IDF

Inverted indexes

Document analysis

Introducing Lucene analyzers

Creating custom analyzers

Changing a default analyzer

Putting custom analyzers into action

Elasticsearch mapping

Document metadata fields

Data types and index analysis options

Configuring data types

String

Number

Date

Boolean

Arrays

Objects

Indexing the same field in different ways

Putting mappings in an index

Viewing mappings

Updating mappings

Summary

3. Putting Elasticsearch into Action

CRUD operations using elasticsearch-py

Setting up the environment

Installing Pip

Installing virtualenv

Installing elasticsearch-py

Performing CRUD operations

Request timeouts

Creating indexes with settings and mappings

Indexing documents

Retrieving documents

Updating documents

Replacing the value of a field completely

Appending a value in an array

Updates using doc

Checking document existence

Deleting a document

CRUD operations using Java

Connecting with Elasticsearch

Indexing a document

Fetching a document

Updating a document

Updating a document using doc

Updating a document using script

Deleting documents

Creating a search database

Elasticsearch Query-DSL

Understanding Query-DSL parameters

Query types

Full-text search queries

match_all

match query

Phrase search

multi match

query_string

Term-based search queries

Term query

Terms query

Range queries

Exists queries

Missing queries

Compound queries

Bool queries

Not queries

Search requests using Python

Search requests using Java

Parsing search responses

Sorting your data

Sorting documents by field values

Sorting on more than one field

Sorting multivalued fields

Sorting on string fields

Document routing

Summary

4. Aggregations for Analytics

Introducing the aggregation framework

Aggregation syntax

Extracting values

Returning only aggregation results

Metric aggregations

Computing basic stats

Combined stats

Computing stats separately

Computing extended stats

Finding distinct counts

Bucket aggregations

Terms aggregation

Range aggregation

Date range aggregation

Histogram aggregation

Date histogram aggregation

Filter-based aggregation

Combining search, buckets, and metrics

Memory pressure and implications

Summary

5. Data Looks Better on Maps: Master Geo-Spatiality

Introducing geo-spatial data

Working with geo-point data

Mapping geo-point fields

Indexing geo-point data

Querying geo-point data

Geo distance query

Geo distance range query

Geo bounding box query

Understanding bounding boxes

Sorting by distance

Geo-aggregations

Geo distance aggregation

Using bounding boxes with geo distance aggregation

Geo-shapes

Point

Linestring

Circles

Polygons

Envelops

Mappings geo-shape fields

Indexing geo-shape data

Querying geo-shape data

Summary

6. Document Relationships in NoSQL World

Relational data in the document-oriented NoSQL world

Managing relational data in Elasticsearch

Working with nested objects

Creating nested mappings

Indexing nested data

Querying nested type data

Nested aggregations

Nested aggregation

Understanding nested aggregation syntax:

Reverse nested aggregation

Parent-child relationships

Creating parent-child mappings

Indexing parent-child documents

Querying parent-child documents

has_child query

has_parent query

Considerations for using document relationships

Summary

7. Different Methods of Search and Bulk Operations

Introducing search types in Elasticsearch

Cheaper bulk operations

Bulk create

Bulk indexing

Bulk updating

Bulk deleting

Multi get and multi search APIs

Multi get

Multi searches

Data pagination

Pagination with scoring

Pagination without scoring

Scrolling and re-indexing documents using scan-scroll

Practical considerations for bulk processing

Summary

8. Controlling Relevancy

Introducing relevant searches

The Elasticsearch out-of-the-box tools

An example: why defaults are not enough

Controlling relevancy with custom scoring

The function_score query

weight

field_value_factor

script_score

Decay functions - linear, exp, and gauss

Summary

9. Cluster Scaling in Production Deployments

Node types in Elasticsearch

Client node

Data node

Master node

Introducing Zen-Discovery

Multicasting discovery

Unicasting discovery

Configuring unicasting discovery

Minimum number of master nodes: preventing split-brain

An initial list of hosts to ping

Ping timeout

Node upgrades without downtime

Upgrading Elasticsearch version

Best Elasticsearch practices in production

Creating a cluster

Scaling your clusters

When to scale

Metrics to watch

CPU utilization

Memory utilization

Disk I/O utilization

Disk low watermark

How to scale

Summary

10. Backups and Security

Introducing backup and restore mechanisms

Backup using snapshot API

Creating an NFS drive

Configuring the NFS host server

Configuring client machines

Creating a snapshot

Registering the repository path

Registering the shared file system repository in Elasticsearch

Create your first snapshot

Getting snapshot information

Deleting snapshots

Restoring snapshots

Restoring multiple indices

Renaming indices

Partial restore

Changing index settings during restore

Restoring to a different cluster

Manual backups

Manual restoration

Securing Elasticsearch

Setting up basic HTTP authentication

Setting up Nginx

Securing critical access

Restricting DELETE requests

Restricting endpoints

Load balancing using Nginx

Summary

Index

Elasticsearch Essentials

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference: 1250116

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-101-0

www.packtpub.com

Credits

Author

Bharvi Dixit

Reviewer

Alberto Paro

Commissioning Editor

Pramila Balan

Acquisition Editor

Sonali Vernekar

Content Development Editor

Kirti Patil

Technical Editor

Ryan Kochery

Copy Editor

Kausambhi Majumdar

Project Coordinator

Nidhi Joshi

Proofreader

Safis Editing

Indexer

Tejal Daruwale Soni

Graphics

Abhinash Sahu

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Bharvi Dixit is an IT professional with an extensive experience of working on the search servers (especially Elasticsearch) and NoSQL databases. He is currently working as a technology and search expert with GrownOut, a SAAS-based referral hiring solution provider company. He is the organizer and speaker of Delhi's Elasticsearch Meetup Group, which is one of the fastest growing Elasticsearch communities in India.

He also works as a freelance Elasticsearch consultant and has helped many small to medium size organizations in adapting Elasticsearch for different use cases such as, creating search solutions for big data-automated intelligence platforms in the area of counter-terrorism and risk management as well as in other domains such as recruitment, e-commerce, finance and log monitoring.

He holds a master's degree in computer science from LBSIM - Delhi, India, and has a keen interest in creating scalable backend platforms. His other interest area are data analytics, distributed computing, automations, and DevOps. Java and Python are the primary languages in which he loves to write code, and he has already built a proprietary software for consultancy firms.

In his spare time, he loves writing blogs and reading the latest technology books. He can be connected through LinkedIn at https://in.linkedin.com/in/bharvidixit.

Acknowledgments

I would like to thank my family for their continuous support, specially my brother, Patanjali Dixit, who always guided me at each step throughout my career. I would also like to give a big thanks to Lavleen for the support, patience, and encouragement she gave during all those days when I was busy writing this book.

I would like to extend my thanks to all of the Packt team working on this book and our technical reviewer, Alberto Paro. Without them, the book wouldn't have been as great as it is now. It was one of the best team i have worked with.

Finally, special thanks to Shay Banon for creating Elasticsearch and to all the people who contributed to the libraries and modules published around this project.

Once again, thank you.

About the Reviewer

Alberto Paro is an engineer, project manager, and software developer. He currently works as a CTO at Big Data Technologies and as a freelance international consultant on software engineering for big data and NoSQL solutions. He loves to study emerging solutions and applications mainly related to Big Data processing, NoSQL, natural language processing, and neural networks. He began programming in BASIC on a Sinclair Spectrum when he was eight years old, and he has a lot of experience of using different operating systems, applications, and programming languages.

In 2000, he graduated in computer science engineering from Politecnico di Milano with a thesis on designing multiuser and multidevice web applications. He assisted the professors at the university for about a year. Then, he came in contact with The Net Planet Company and loved their innovative ideas; he started working on knowledge management solutions and advanced data mining products. In the summer of 2014, his company was acquired by Big Data technologies, where he currently works and uses mainly Scala and Python on state-of-the-art Big Data software (Spark, Akka, Cassandra, and YARN). In 2013, he started freelancing as a consultant for Big Data technologies, machine learning, and Elasticsearch.

In his spare time, when he is not playing with his children, he likes to work on open source projects. When he was in high school, he started contributing to projects related to the GNOME environment (gtkmm). One of his preferred programming languages is Python, and he wrote one of the first NoSQL backends on Django for MongoDB (Django-MongoDB-engine). He is also a fan of the Scala language and enjoys spreading his love of technology: he was a presenter of Big Data concepts at Scala Day Italy 2015 on Scala.JS and Big Data Tech Italian Conference in Florence.

In 2010, he began using Elasticsearch to provide search capabilities to some Django e-commerce sites and developed PyES (a Pythonic client for Elasticsearch), as well as the initial part of the Elasticsearch MongoDB driver. He is the author of ElasticSearch Cookbook and ElasticSearch Cookbook Second Edition as well as a technical reviewer of Elasticsearch Server, Second Edition, and the video course, Building a Search Server with ElasticSearch, all of which have been published by Packt Publishing.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. Elasticsearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazingly fast search solutions over a massive amount of data, but can also serve as a NoSQL data store.

Elasticsearch Essentials will guide you to become a competent developer quickly with a solid knowledge and understanding of the Elasticsearch core concepts. In the beginning, this book will cover the fundamental concepts required to start working with Elasticsearch and then it will take you through more advanced concepts of search techniques and data analytics.

This book provides complete coverage of working with Elasticsearch using Python and Java APIs to perform CRUD operations, aggregation-based analytics, handling document relationships, working with geospatial data, and controlling search relevancy.

In the end, you will not only learn about scaling Elasticsearch clusters in production, but also how to secure Elasticsearch clusters and take data backups using best practices.

What this book covers

Chapter 1, Getting Started with Elasticsearch, provides an introduction to Elasticsearch and how it works. After going through the basic concepts and terminologies, you will learn how to install and configure Elasticsearch and perform basic operations with Elasticsearch.

Chapter 2, Understanding Document Analysis and Creating Mappings, covers the details of the built-in analyzers, tokenizers, and filters provided by Lucene. It also covers how to create custom analyzers and mapping with different data types.

Chapter 3, Putting Elasticsearch into Action, introduces Elasticsearch Query-DSL, various queries, and the data sorting techniques. You will also learn how to perform CRUD operations with Elasticsearch using Elasticsearch Python and Java clients.

Chapter 4, Aggregations for Analytics, is all about the Elasticsearch aggregation framework for building analytics on data. It provides many fundamental as well complex examples of data analytics that can be built using a combination of full-text search, term-based search, and multi level aggregations. The user will master the aggregation module of Elasticsearch by learning a complete set of practical code examples that are covered using Python and Java clients.

Chapter 5, Data Looks Better on Maps: Master Geo-Spatiality, discusses geo-data concepts and covers the rich geo-search functionalities offered by Elasticsearch including how to create mappings for geo-points and geo-shapes data, indexing documents, geo-aggregations, and sorting data based on geo-distance. It includes code examples for the most widely used geo-queries in both Python and Java.

Chapter 6, Document Relationships in NoSQL World, focuses on the techniques offered by Elasticsearch to handle relational data using nested and parent-child relationships and creating a schema for the same using real-world examples. The reader will also learn how to create mappings based on relational data and write code for indexing and querying data using Python and Java APIs.

Chapter 7, Different Methods of Search and Bulk Operations, covers the different types of search and bulk APIs that every programmer needs to know while developing applications and working with large data sets. You will learn examples of bulk processing, multi-searches, and faster data reindexing using both Python and Java, which will help you throughout your journey with Elasticsearch.

Chapter 8, Controlling Relevancy, discusses the most important aspect of search engines—relevancy. It covers the powerful scoring capabilities available in Elasticsearch and practical examples that show how you can control the scoring process according to your needs.

Chapter 9, Cluster Scaling in Production Deployments, shows how to create Elasticsearch clusters and configure different types of nodes with the right resource allocations. It also focuses on cluster scalability using the best practices in production environment.

Chapter 10, Backups and Security, focuses on the different mechanisms of creating data backups of an Elasticsearch cluster and restoring them back into the same or an other cluster. A step-by-step guide to setting up NFS (Network File System) is also provided. Finally, you will learn about setting up Nginx to secure Elasticsearch and load balance requests.

What you need for this book

This book was written using Elasticsearch version 2.0.0, and all the examples and functions should work with it. Using Oracle Java 1.7 u55 and above is recommended for creating Elasticsearch clusters. In addition to this, you'll need a command that allows you to send HTTP requests, such as curl, which is available for most operating systems. In addition to this, this book covers all the examples using Python and Java.

For Java examples, you will need to have Java JDK (Java Development Kit) installed and an editor that will allow you to develop your code (such as Eclipse). Apache Maven has been used to build Java codes.

To run the Python examples, you will need Python 2.7 and above and will also need to install Elasticsearch-Py, the official Python client for Elasticsearch.

In addition to this, some chapters may require additional software such as Elasticsearch plugins and other software but it has been explicitly mentioned when certain types of software are needed.

Who this book is for

Anyone who wants to build efficient search and analytics applications can choose this book. It is also beneficial for skilled developers, especially ones experienced with Lucene or Solr, who now want to learn Elasticsearch quickly. A basic knowledge of Python or Java and Linux is expected.

In addition to this, readers who want to see how to improve their query relevancy, and how to use Elasticsearch Java and Python API, may find this book interesting and useful.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "REST endpoints also enable users to make changes in clusters and indices settings dynamically rather than manually pushing configuration updates to all the nodes in a cluster by editing the elasticsearch.yml file and restarting the node."

A block of code is set as follows:

{ "int_array": [1, 2,3], "string_array": ["Lucene" ,"Elasticsearch","NoSQL"], "boolean": true, "null": null, "number": 123, "object": { "a": "b", "c": "d", "e": "f" }, "string": "Learning Elasticsearch" }

Any command-line input or output is written as follows:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.debsudo dpkg -i elasticsearch-2.0.0.deb

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/B03461_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Getting Started with Elasticsearch

Nowadays, search is one of the primary functionalities needed in every application; it can be fulfilled by Elasticsearch, which also has many other extra features. Elasticsearch, which is built on top of Apache Lucene, is an open source, distributable, and highly scalable search engine. It provides extremely fast searches and makes data discovery easy.

In this chapter, we will cover the following topics:

Concepts and terminologies related to ElasticsearchRest API and the JSON data structureInstalling and configuring ElasticsearchInstalling the Elasticsearch pluginsBasic operations with Elasticsearch

Introducing Elasticsearch

Elasticsearch is a distributed, full text search and analytic engine that is build on top of Lucene, a search engine library written in Java, and is also a base for Solr. After its first release in 2010, Elasticsearch has been widely adopted by large as well as small organizations, including NASA, Wikipedia, and GitHub, for different use cases. The latest releases of Elasticsearch are focusing more on resiliency, which builds confidence in users being able to use Elasticsearch as a data storeage tool, apart from using it as a full text search engine. Elasticsearch ships with sensible default configurations and settings, and also hides all the complexities from beginners, which lets everyone become productive very quickly by just learning the basics.

The primary features of Elasticsearch

Lucene is a blazing fast search library but it is tough to use directly and has very limited features to scale beyond a single machine. Elasticsearch comes to the rescue to overcome all the limitations of Lucene. Apart from providing a simple HTTP/JSON API, which enables language interoperability in comparison to Lucene's bare Java API, it has the following main features:

Distributed: Elasticsearch is distributed in nature from day one, and has been designed for scaling horizontally and not vertically. You can start with a single-node Elasticsearch cluster on your laptop and can scale that cluster to hundreds or thousands of nodes without worrying about the internal complexities that come with distributed computing, distributed document storage, and searches.High Availability: Data replication means having multiple copies of data in your cluster. This feature enables users to create highly available clusters by keeping more than one copy of data. You just need to issue a simple command, and it automatically creates redundant copies of the data to provide higher availabilities and avoid data loss in the case of machine failure.REST-based: Elasticsearch is based on REST architecture and provides API endpoints to not only perform CRUD operations over HTTP API calls, but also to enable users to perform cluster monitoring tasks using REST APIs. REST endpoints also enable users to make changes to clusters and indices settings dynamically, rather than manually pushing configuration updates to all the nodes in a cluster by editing the elasticsearch.yml file and restarting the node. This is possible because each resource (index, document, node, and so on) in Elasticsearch is accessible via a simple URI.Powerful Query DSL: Query DSL (domain-specific language) is a JSON interface provided by Elasticsearch to expose the power of Lucene to write and read queries in a very easy way. Thanks to the Query DSL, developers who are not aware of Lucene query syntaxes can also start writing complex queries in Elasticsearch.Schemaless: Being schemaless means that you do not have to create a schema with field names and data types before indexing the data in Elasticsearch. Though it is one of the most misunderstood concepts, this is one of the biggest advantages we have seen in many organizations, especially in e-commerce sectors where it's difficult to define the schema in advance in some cases. When you send your first document to Elasticsearch, it tries its best to parse every field in the document and creates a schema itself. Next time, if you send another document with a different data type for the same field, it will discard the document. So, Elasticsearch is not completely schemaless but its dynamic behavior of creating a schema is very useful.

Note

There are many more features available in Elasticsearch, such as multitenancy and percolation, which will be discussed in detail in the next chapters.

Understanding REST and JSON

Elasticsearch is based on a REST design pattern and all the operations, for example, document insertion, deletion, updating, searching, and various monitoring and management tasks, can be performed using the REST endpoints provided by Elasticsearch.

What is REST?

In a REST-based web API, data and services are exposed as resources with URLs. All the requests are routed to a resource that is represented by a path. Each resource has a resource identifier, which is called as URI. All the potential actions on this resource can be done using simple request types provided by the HTTP protocol. The following are examples that describe how CRUD operations are done with REST API:

To create the user, use the following:

POST /userfname=Bharvi&lname=Dixit&age=28&id=123

The following command is used for retrieval:

GET /user/123

Use the following to update the user information:

PUT /user/123fname=Lavleen

To delete the user, use this:

DELETE /user/123

Note

Many Elasticsearch users get confused between the POST and PUT request types. The difference is simple. POST is used to create a new resource, while PUT is used to update an existing resource. The PUT request is used during resource creation in some cases but it must have the complete URI available for this.

What is JSON?

All the real-world data comes in object form. Every entity (object) has some properties. These properties can be in the form of simple key value pairs or they can be in the form of complex data structures. One property can have properties nested into it, and so on.

Elasticsearch is a document-oriented data store where objects, which are called as documents, are stored and retrieved in the form of JSON. These objects are not only stored, but also the content of these documents gets indexed to make them searchable.

JavaScript Object Notation (JSON) is a lightweight data interchange format and, in the NoSQL world, it has become a standard data serialization format. The primary reason behind using it as a standard format is the language independency and complex nested data structure that it supports. JSON has the following data type support:

Array, Boolean, Null, Number, Object, and String

The following is an example of a JSON object, which is self-explanatory about how these data types are stored in key value pairs:

Elasticsearch common terms

The following are the most common terms that are very important to know when starting with Elasticsearch:

Node: A single instance of Elasticsearch running on a machine.Cluster: A cluster is the single name under which one or more nodes/instances of Elasticsearch are connected to each other.Document: A document is a JSON object that contains the actual data in key value pairs.Index: A logical namespace under which Elasticsearch stores data, and may be built with more than one Lucene index using shards and replicas.Doc types: A doc type in Elasticsearch represents a class of similar documents. A type consists of a name, such as a user or a blog post, and a mapping, including data types and the Lucene configurations for each field. (An index can contain more than one type.)Shard: Shards are containers that can be stored on a single node or multiple nodes and are composed of Lucene segments. An index is divided into one or more shards to make the data distributable.

Note

A shard can be either primary or secondary. A primary shard is the one where all the operations that change the index are directed. A secondary shard is the one that contains duplicate data of the primary shard and helps in quickly searching the data as well as for high availability; in a case where the machine that holds the primary shard goes down, then the secondary shard becomes the primary automatically.

Replica: A duplicate copy of the data living in a shard for high availability.

Understanding Elasticsearch structure with respect to relational databases

Elasticsearch is a search

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Elasticsearch Essentials E-Book

Bharvi Dixit

About This Book

Who This Book Is For

What You Will Learn

In Detail

Style and approach

Table of Contents

Elasticsearch Essentials

Elasticsearch Essentials

Credits

About the Author

Acknowledgments

About the Reviewer

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. Getting Started with Elasticsearch

Introducing Elasticsearch

The primary features of Elasticsearch

Note

Understanding REST and JSON

What is REST?

Note

What is JSON?

Elasticsearch common terms

Note

Understanding Elasticsearch structure with respect to relational databases