32,39 €
Accelerate your enterprise search engine and bring relevancy in your search analytics
Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites.
To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs.
By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.
The book would rightly appeal to developers, software engineers, data engineers and database architects who are building or seeking to build enterprise-wide effective search engines for business intelligence. Prior experience of Apache Solr or Java programming is must to take the best of this book.
Sandeep Nair has more than 11 years of experience of Java and Java EE technologies. His keen interest is in developing enterprise solutions using the Liferay platform, and he has been doing so for the past 9 years. He has executed projects using Liferay across various verticals, providing solutions for collaboration, enterprise content management, and web content management systems. He is also experienced with Java and Java EE. He has authored Liferay Beginner’s Guide and Instant Liferay Portal 6 Starter. Travel, food, and books are his passions, besides coding. Chintan Mehta is a cofounder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, and RIMS, and server administration on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for BigData, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions, and has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications. Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 275
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pravin DhandreAcquisition Editor: Aman SinghContent Development Editor: Aishwarya PandereTechnical Editor: Dinesh PawarCopy Editor: Vikrant PhadkayProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Aishwarya GangawaneGraphics: Tania DuttaProduction Coordinator: Arvindkumar Gupta
First published: February 2018
Production reference: 1160218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78883-738-5
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Sandeep Nair has more than 11 years of experience of Java and Java EE technologies. His keen interest is in developing enterprise solutions using the Liferay platform, and he has been doing so for the past 9 years. He has executed projects using Liferay across various verticals, providing solutions for collaboration, enterprise content management, and web content management systems. He is also experienced with Java and Java EE.
He has authored Liferay Beginner’s Guide and Instant Liferay Portal 6 Starter.
Travel, food, and books are his passions, besides coding.
Chintan Mehta is a cofounder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, and RIMS, and server administration on open source technologies. He is also an AWS Certified Solutions Architect.
Chintan has authored MySQL 8 for BigData, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions, and has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay.
Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Marcelo Ochoa works at the system laboratory of Facultad de Ciencias Exactas of Universidad Nacional del Centro de la Provincia de Buenos Aires and is the CTO at Scotas. He has worked on several Oracle-related projects, such as translating Oracle manuals and multimedia CBTs. Since 2006, he has been part of an Oracle ACE program and was recently inducted into a Docker mentor program.
He has coauthored Oracle Database Programming using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press. He has been a technical reviewer on several Packt Publishing books.
Krunal Patel has been working on the Liferay portal for 5+ years and has 9+ years of experience in enterprise application development using Java and Java EE. He has also executed enterprise CMS projects using Solr, Apache web server, and Apache Lucene. He has good experience in setup and configuration of servers (Solr, Tomcat, JBOSS, and Jenkins (CI)), performance tuning, LDAP integration, and so on. He has an ITIL Foundation certification in IT service management, Liferay 6.1 Developer certification, Brainbench Java 6 certification, and MongoDB for Java Developers certification.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Mastering Apache Solr 7.x
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Solr 7
Introduction to Solr
History of Solr
Lucene – the backbone of Solr
Why choose Solr?
Benefits of keyword search
Benefits of ranked results
Solr use cases
Social media
Science and research
Search engine
E-commerce
Media and entertainment
Government
Education
What's new in Solr 7?
Replication for SolrCloud
TLOG replicas
PULL replicas
Schemaless improvements
Autoscaling
Default numeric types
Spatial fields
SolrJ
JMX and MBeans
Other changes
Summary
Getting Started
Solr installation
Understanding various files and the folder structure
bin
Solr script
Post script
contrib
DataImportHandler
ContentExtractionLibrary
LanguageIdentifier
Clustering
VelocityIntegration
dist and docs
example
core.properties
zoo.cfg
solr.xml
server
Running Solr
Running basic Solr commands
Production Solr setup
Loading sample data
Loading data from MySQL
Understanding the browse interface
Using the Solr admin interface
Dashboard
Logging
Cloud screens
Tree view
Graph view
Collections or core admin
Java properties
Thread dump
Collection-specific tools
Overview
Analysis
DataImport
Documents
Files
Query
Stream
Schema
Core-specific tools
Summary
Designing Schemas
How Solr works
Getting started with Solr's basics
The schema file of Solr
Understanding field types
Definitions and properties of field types
Field type properties
Field types available in Solr
Understanding date fields
Understanding currencies and exchange rates 
Understanding enum fields
Field management
Field properties
Copying fields
Dynamic fields
Mastering Schema API
Schema API in detail
Schema operations
Listing fields, field types, DynamicFields, and CopyField rules
Deciphering schemaless mode
Creating a schemaless example
Schemaless mode configuration
Managed schema
Field guessing
Summary
Mastering Text Analysis Methodologies
Understanding text analysis
What is text analysis?
How text analysis works
Understanding analyzer
What is an analyzer?
How an analyzer works
Understanding tokenizers
What is a tokenizer?
Available tokenizers in Solr
Standard tokenizer
White space tokenizer
Classic tokenizer
Keyword tokenizer
Lower case tokenizer
Letter tokenizer
N-gram tokenizer
Edge n-gram tokenizer
Understanding filters
What is a filter?
Available filters in Solr
Stop filter
Classic filter
Synonym filter
Synonym graph filter
ASCII folding filter
Keep word filter
KStem filter
KeywordMarkerFilterFactory
Word delimiter graph filter 
Understanding CharFilter
Understanding PatternReplaceCharFilterFactor
Understanding multilingual analysis
Language identification
Configuring Solr for multiple language search
Creating separate fields per language
Creating separate indexes per language
Understanding phonetic matching
Understanding Beider-Morse phonetic matching
Summary
Data Indexing and Operations
Basics of Solr indexing
Installing Postman
Exploring the post tool
Understanding index handlers
Working with an index handler with the XML format
Index handler with JSON
Apache Tika and indexing
Solr Cell basics
Indexing a binary using Tika
Language detection 
Language detection configuration
Client APIs 
Summary
Advanced Queries – Part I
Search relevance
Velocity search UI
Query parsing and syntax
Common query parameters
Standard query parser
Advantage
Disadvantage
Searching terms for standard query parser
Term modifiers
Wildcard searches
Fuzzy searches
Proximity searching 
Range searches
Boolean operators
Escaping special characters
Grouping terms
Dates and times in query strings
Adding comments to the query string
The DisMax Query Parser
Advantages
DisMax query parser parameters
eDisMax Query Parser
Response writer
JSON
Standard XML
CSV
Velocity
Faceting
Common parameters
Field-value faceting parameters
Range faceting
Pivot faceting
Interval faceting
Highlighting
Highlighting parameters
Highlighter
Unified highlighter (hl.method=unified)
Original highlighter (hl.method=original) 
FastVector highlighter (hl.method=fastVector)
Boundary scanners
The breakIterator boundary scanner
The simple boundary scanner
Summary
Advanced Queries – Part II
Spellchecking
Spellcheck parameters
Implementation approaches
IndexBasedSpellChecker
DirectSolrSpellChecker
FileBasedSpellChecker
WordBreakSolrSpellChecker
Distributed spellcheck
Suggester
Suggester parameters
Running suggestions
Pagination
How to implement pagination
Cursor pagination
Result grouping
Result grouping parameters
Running result grouping
Result clustering
Result clustering parameters
Result clustering implementation
Install the clustering contrib
Declare the cluster search component
Declare the request handler and include the cluster search component
Spatial search
Spatial search implementation
Field types
Query parser
Spatial search query parser parameters
Function queries
Summary
Managing and Fine-Tuning Solr
JVM configuration
Managing the memory heap 
Managing solrconfig.xml
User-defined properties
Implicit Solr core properties
Managing backups
Backup in SolrCloud
Standalone mode backups
Backup API
Backup status
API to restore
Restore status API
Snapshot API
JMX with Solr
JMX configuration
Logging configuration
Log settings using the admin web interface
Log level at startup
Setting the environment variable
Passing parameters in the startup script
Configuring Log4J for logging
SolrCloud overview
SolrCloud in interactive mode
SolrCloud – core concepts
Routing documents
Splitting shards
Setting up ignore commits from client applications
Enabling SSL – Solr security
Prerequisites
Generating a key and self-signed certificate
Starting Solr with SSL system properties
Performance statistics
Statistics for request handlers
Summary
Client APIs – An Overview
Client API overview
JavaScript Client API
SolrJ Client API
Ruby Client API
Python Client API
Summary
In today's digital enterprise world, every business has complex search requirements. With big data coming into the picture, the volume of data on which search filters have to be applied has massively increased. It becomes absolutely crucial to have an enterprise search platform that caters to your enterprise application.
Solr is a leading open source Java-based enterprise search platform that has been adopted by many organizations. It offers a plethora of features, such as handling rich documents, faceted search, and full-text searching, to name a few.
With the recent release of Solr 7, the arsenal of features that Solr provides has widened. We hope that this book will provide you with everything you need to not only learn but also master the various features and functionalities that Solr provides. We believe you will enjoy reading this as much as we did writing it. Happy learning!
This book is for anyone who wants to not only learn Solr 7.0 but also understand various advanced concepts of Solr. You'll learn why you should search on an enterprise search platform like Solr by the time you finish this book.
Chapter 1, Introduction to Solr 7, gets you acquainted with what Solr is all about and explains why you should use Solr.
Chapter 2, Getting Started, shows you how to set up Solr and how everything is laid out under the Solr umbrella.
Chapter 3, Designing Schemas, takes us through schema design using the schema API and gives an understanding of schemaless mode.
Chapter 4, Mastering Text Analysis Methodologies, shows us features related to text analysis, tokenizers, filters, and analyzers.
Chapter 5, Data Indexing and Operations, teaches us how to use the client API to do indexing. We also learn about index handlers.
Chapter 6, Advanced Queries – Part I, looks at querying Solr, velocity search UI, relevance, query parsing, faceting, and highlighting.
Chapter 7, Advanced Queries – Part II, continues where the last chapter ended. We go through suggester, pagination, result grouping, clustering, and spatial search.
Chapter 8, Managing and Fine-Tuning Solr, shows how to make Solr ready for production.
Chapter 9, Client APIs – An Overview, gives an overview of the various APIs that are available for JavaScript, Ruby, Python, and Java to interact with Solr.
It would be great if you know a bit of Java, but it is not mandatory as this book will teach you from the ground up
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Mastering-Apache-Solr-7x. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/MasteringApacheSolr7x_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Also, the PATH variable should point to JRE 1.8."
A block of code is set as follows:
<requestHandler name="/dataimport" class="solr.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst></requestHandler>
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
<field
column
="category_id" name="category_id" /><field column="category_name"
name
="category_name" /><field column="remarks" name="remarks" />
Any command-line input or output is written as follows:
brew install solr
solr start
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Go to the Query screen; at the bottom, click on facet."
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Today we are in the age of digitization. People are generating data in different ways: they take pictures, upload images, write blogs, comment on someone's blog or picture, change their status on social networking sites, tweet on Twitter, update details on LinkedIn, do financial transactions, write emails, store data on the cloud, and so on. Data size has grown not only in the personal space but also in professional services, where people have to deal with a humongous amount of data. Think of the data managed by players such as Google, Facebook, the New York Stock Exchange, Amazon, and many others. For this data tsunami, we need the appropriate tools to fetch data, in an organized way, that can be used in various fields, such as scientific research, real-time traffic, fighting crime, fraud detection, digital personalization, and so on. All of this data needs to be captured, stored, searched, shared, transferred, analyzed, and visualized.
Analyzing structured, unstructured, or semi-structured ubiquitous data helps us discover hidden patterns, market trends, correlations, and personal preferences. With the help of the right tools to process and analyze data, organizations can expect much better marketing plans, additional revenue opportunities, improved customer services, healthier operational efficiency, competitive benefits, and much more. It is important to not only store data but also process it in order to generate information that is necessary. Every company collects data and uses it; however, to potentially flourish more effectively, a company needs to search relevant data. Every company must carve out direct search-produced data, which can improve their business either directly or indirectly.
Okay, now you have Solr, which is generally referred to as search server, and you are doing searches. Is that what you need? Hold on! This allows a lot more than a simple search. So get ready and hold your breath to take a deep dive into Solr—a scalable, flexible, and enterprise NoSQL search platform!
We will go through the following topics in this chapter:
Introduction to Solr
Why Solr?
Solr use cases
What's new in Solr 7
Solr is one of the most popular enterprise search servers and is widely used across the world. It is written based on Java and uses the Lucene Java search library. Solr is an open source project from Apache Software Foundation (ASF) and is amazingly fast, scalable, and ideal for searching relevant data. Some of the major Solr users are Netfix, SourceForge, Instagram, CNET, and Flipkart. You can check out more such use cases at https://wiki.apache.org/solr/PublicServers.
Some of the features included are as follows:
Full-text search
Faceted search
Dynamic clustering
GEO search
Hit highlighting
Near-real-time indexing
Rich document handling
Geospatial search
Structured Query Language
(
SQL
) support
Textual search
Rest API
JSON, XML, PHP, Ruby, Python, XSLT, velocity, and custom Java binary output formats over HTTP
GUI admin interface
Replication
Distributed search
Caching of queries, documents, and filters
Auto-suggest
Streaming
Many more features
Solr has enabled many such Internet sites, government sites, and Intranet sites too, providing solutions for e-commerce, blogs, science, research, and so on. Solr can index billions of documents/rows via XML, JSON, CSV, or HTTP APIs. It can secure your data with the help of authentication and can be drilled down to role-based authentication. Solr is now an integral part of many big data solutions too.
Doug Cutting created Lucene in 2000, which is the core technology behind Solr.
Solr was made in 2004 by Yonik Seeley at CNET Networks for a homegrown project to provide search capability for the CNET Networks website.
Later in 2006, CNET Networks published the Solr source code to ASF. By early 2007, Solr had found its place in some of the top projects. It was then that Solr kept on adding new features to attract customers and contributors.
Solr 1.3 was released in September 2008. It included major performance enhancements and features such as distributed search.
In January 2009, Yonik Seeley, Grant Ingersoll, and Erik Hatcher joined Lucidworks; they are the prime faces of Solr and enterprise search. Lucidworks started providing commercial support and training for Solr.
Solr 1.4 was released in November 2009. Solr had never stopped providing enhancements; 1.4 was no exception, with indexing, searching, faceting, rich document processing, database integration, plugins, and more.
In 2011, Solr versioning was revised to match up with the versions of Lucene. Sometime in 2010, the Lucence and Solr projects were merged; Solr had then became an integral subproject of Lucene. Solr downloads were still available separately; however, it was developed together by the same set of contributors. Solr was then marked as 3.1.
Solr 4.0 was released in October 2012, which introduced the SolrCloud feature. There were a number of follow-ups released over a couple of years in the 4.x line. Solr kept on adding new features, becoming more scalable and further focusing on reliability.
Solr 5.0 was released in February 2015. It was with this release that official support for the WAR bundle package ended. It was packaged as a standalone application. And later, in version 5.3, it also included an authentication and authorization framework.
Solr 6.0 was released in April 2016. It included support for executing parallel SQL queries across SolrCloud. It also included stream expression support and JDBC driver for the SQL interface.
Finally, Solr 7.0 was released in September 2017, followed by 7.1.0 in October 2017, as shown in the following diagram. We will discuss the new features as we move ahead in this chapter, in the What is new in Solr 7 section.
We have depicted the history of Solr in the preceding image for a much better view and understanding.
So by now, we have a brief understanding of Solr, along with its history. We must also have a good understanding of why we should be using Solr. Let's get the answer to this question too.
Lucene is an open source project that provides text search engine libraries. It is widely adopted for many search engine technologies. It has strong community contributions, which makes it much stronger as a technology backend. Lucene is a simple code library that you can use to write your own code by using the API available for searching, indexing, and much more.
For Lucene, a document consists of a collection of fields; they are name-value pairs consisting of either text or numbers. Lucene can be configured as a text analyzer that tokenizes a field’s text to a series of words. It can also do further processing, such as substituting with synonyms or other similar processes. Lucene stores its index on the disk of the server, which consists of indexing for each of the documents. The index is an inverted index that stores the mapping of a field to its relevant document, along with the position of the word from the text of the document. Once you have the index in place, you can search for documents with the input of a query string that is parsed accordingly to Lucence. Lucene manages to score a value for each of the relevant documents and the ones that are high-scoring documents are displayed.
If we already have a relational database, then why should we use Solr? It's simple; if there is a use case that needs you to search, you need a search engine platform like Solr. There are various use cases that we will be discussing further in the chapter.
Databases and Solr have their own pros and cons. In one place where we use a database, SQL supports limited wildcard-based text search with some basic normalization, such as matching uppercase to lowercase. It might be a costly query as it does full table scans. Whereas in Solr, a searchable word index is stored in an inverse index, which is much faster than traditional database searches.
Let's look at the following diagram to understand this better:
Having an enterprise search engine solution is must for an organization nowadays, it is having a prominent role in the aspect of getting information quickly with the help of searches. Not having such a search engine platform can result in insufficient information, inefficiency of productivity, and additional efforts due to duplication of work. Why? Just because of not having the right information available quickly, without a search; it is something that we can't even think of. Most such use cases comprise the following key requirements:
Data collected should be parsed and indexed. So, parsing and indexing is one of the important requirements of any enterprise search engine platform.
A search should provide the required results almost at runtime on the required datasets. Performance and relevance are two more key requirements.
The search engine platform should be able to crawl or collect all of the data that it would require to perform the search.
Integration of the search engine along with administration, monitoring, log management, and customization is something that we would be expecting.
Solr has been designed to have a powerful and flexible search that can be used by applications; whenever you want to serve data based on search patterns, Solr is the right fit.
Here is a high-level diagram that shows how Solr is integrated with an application:
The majority of popular websites, including many Intranet websites, have integrated search solutions to help users find relevant information quickly. User experience is a key element for any solution that we develop; and searching is one of the major features that cannot be ignored when we talk about user experience.
One of the basic needs a search engine should support is a keyword search, as that's the primary goal behind the search engine platform. In fact it is the first thing a user will start with. Keyword search is the most common technique used for a search engine and also for end users on our websites. It is a pretty common expectation nowadays to punch in a few keywords and quickly retrieve the relevant results. But what happens in the backend is something we need to take care of to ensure that the user experience doesn't deteriorate. Let's look at a few areas that we must consider in order to provide better outcomes for search engine platforms using Solr:
Relevant search with quick turnaround
Auto-correct spelling
Auto-suggestions
Synonyms
Multilingual support
Phrase handling—an option to search for a specific keyword or all keywords in a phrase provided
Expanded results if the user wants to view something beyond the top-ranked results
These features can be easily managed by Solr; so our next challenge is to provide relevant results with improved user experience.
Solr is not limited to finding relevant results for a user's search. Providing the end user with selection of the most relevant results, that are sorted, is important as well. We will be doing this using SQL to find relevant matching pattern results and sorting them into columns in either ascending or descending order. Similarly, Solr also does sorting of the result set retrieved based on the search pattern, with a score that would match the relevancy strength in the dataset.
Ranked results is very important, primarily because the volume of data that search engine platforms have to dig through is huge. If there is no control on ranked results, then the result set would be filled with no relevancy and would have so much data that it wouldn't be feasible to display it either. The other important aspect is user experience. All of us are now used to expecting a search engine to provide relevant results using limited keywords. We are getting restless, aren't we? But we expect a search engine platform to not get annoyed and provide us relevant ranked results with few keywords. Hold on, we are not talking of Google search here! So for users like us, Solr can help address such situations by providing higher rankings based on various criteria: fields, terms, document name, and a few more. The ranking of the dataset can vary based on many factors, but a higher ranking would generally be based on the relevancy of the search pattern. With this, we can also have criteria such as gender; with the rankings of certain documents being at the top.
Solr is widely accepted and used by big companies such as Netflix, Disney, Instagram, The Guardian, and many more. Let us see with the help of a few use cases the real-life importance that Solr has made on renowned scenarios.
For an extended but incomplete list of use cases and sites that leverage Solr, you can refer to the official web page of Solr at https://wiki.apache.org/solr/PublicServers:
This diagram helps us understand Solr as a solution serving various industries. Though it's not an exhaustive list of industries where Solr has been playing a prominent role in business decisions, let's discuss a few of the industries.
