Apache Solr Interview Questions

1. What is Apache Solr?

Apache Solr is an enterprise standalone search server that takes the documents in the form of JSON, CSV, XML, or binary over HTTP and provides an interface to query those data. Solr is capable to provides a matching capability that includes the joins, wildcards, grouping, and it can scale up and down and provides distributed and fault tolerance features by using Apache Zookeeper. Solr can perform near real-time indexing by using Lucene's near-real-time feature. Solr provides an extensive search plugin API to handle customized search requests.

2. What are the features of Apache Solr?

Apache Solr provides the following list of important features.

Apache Solr provides the Advanced Full-Text Search Capabilities that include the search of wildcards, phrases, grouping, joins, data types, and so on.
Apache Solr is an Optimized server to handle high-volume traffic.
Solr provides a comprehensive administrative tool for querying, managing, and controlling it.
Using Apache Solr JMX we can easily monitor the insight of Solr instances.
Solr can be easily scaled up and down and provides the distributed and fault tolerance features using the Apache Zookeeper.
With a simple configuration, Solr can be easily deployed based on user requirements.
Solr leverages Lucene's near-real-time index capability to provide near-real-time indexing for documents so that it can be seen when a user wants it.
Solr built-in support provides location-based searching as well.
Solr provides the advanced facility for auto spelling checks, autocomplete search. For example, if we type one or two characters in search then it will suggest matching words with those search and if there is a spelling mistake then it will suggest the correct spelling also.

3. What are the use cases of Apache Solr?

The following are some of the common use cases of Apache Solr.

Instagram: It uses the Apache Solr to boost its geographics search API.
Comcast / Xfinity: It uses Apache Solr to perform faceted navigation and increase the site search.
AT&T: Uses to perform the local search at yp.com.
Sears: It uses Solr for faceted navigation, search operation, and autosuggestion.
eBay: It uses Solr to boost the German Classified sites.
Panasonic Europe: Uses Solr for its faceted navigation and search operation for 30 countries.
Flipkart: Uses the Apache Solr to increase its commerce search.
Wego Travel: It uses Apache Solr to perform the data indexing to speed up the retrieval.
Well.ca: Uses the Apache Solr to perform the product search.
Avvo: It Uses Apache Solr to perform search operations and faceted browsing.

4. What is the name of the file that contains the data dictionary configuration?

The data dictionary configuration is stored in the Solrconfig.xml configuration file.

5. What are the use cases of Apache Solr?

The following are some differences between Apache Solr and Elasticsearch.

Parameters	Solr	Elasticsearch
Community and Developers	Apache Solr is developed by Apache Software Foundation and the community provides its support.	Elasticsearch is developed by the Elastic NV, Swiftype.
Node Discovery	Apache Solr uses the Apache Zookeeper for Node discovery.	Elasticsearch uses the Zen that is built-in Elasticsearch itself for Node discovery.
Shard Placement	Apache Solr is static and requires a manual task for shards migrating.	Elasticsearch is static and depending upon the requirement the shards are moved.
Caches	Solr provides the global caches that are not validated for each segment.	It provides a segment-wise Cache that is best for changing data.
Analytics Engine	It uses Facets and streaming aggregations.	It uses Advanced and flexible aggregations.
Optimized Query Execution	There is no such Optimized Query Execution in Solr for now.	Depending upon the context the faster range.
Search Speed	Solr is best suited for static data.	Elasticsearch is best suited for dynamic ever-changing data.
DevOps Friendliness	Solr is not fully friendly with Devops however its community is very active so it should come soon.	Elasticsearch provides good support of APIs.
Query DSL	The Query DSL is limited to JSON and XML.	The Query DSL is limited to JSON.

6. What is the task of the Solr request handler?

Apache Solr request handler is used to process the query request that is sent from the user. It takes the query and checks the logic that can be executed to process the request. The configuration of the Solr request handler is mentioned in the Solrconfig.xml file. This file also contains the number of instances of a request handler.

7. What is Apache Lucene?

Apache Lucene is a high-performance text search engine library that is developed by Doug Cutting and delivered to Apache Software Foundation. It is developed in the Java programming language. Apache Lucene supports searching for various document formats such as HTML, MS Office docs, text documents, PDF, etc.

8. How to start the Apache Solr Web interface?

We can start the Apache Solr Web interface on localhost:8983. Once the Solr web interface is started we can run the query, create core, and many other operations that can be performed.

9. What are the field types used in Apache Solr?

The following are the different types of fields that we can use in Apache Solr.

date
double
float
long
text

10. What is the command to start Apache Solr?

We can use the below command to start the Apache Solr. Please refer to the "Apache Solr Basic Commands" section to check some basic commands.

$ ./bin/solr start

11. What is Apache Solr Faceting?

Apache Solr Faceting is used to arrange the search result into a category and due to this a search operation becomes very smooth and quick. For example, Query Faceting is used to return the result of the current query whereas Date faceting is used to return the result based on a date range.

12. What is a tokenizer in Apache Solr?

Apache Solr tokenizer is responsible for breaking down a stream into a taken so each token represents a subsequence of characters. After that, these token are processed through the token filters, and then it is indexed by the resulting token stream.

13. What is the important configuration file in Apache Solr?

The below are the two important files in Apache Solr?

solrconfig.xml
schema.xml

14. What is the command to stop Apache Solr?

Using the below command Apache Solr can be stopped.

$ bin/solr stop -p 8983

15. How to start Apache Solr in the foreground?

The below command is used to start the Apache Solr in the foreground.

$ bin/solr start –f

16. What are the important elements of solrconfig.xml?

The important elements of solrconfig.xml are as below.

The Request Handlers
The Search components
The Location of Data Directory
The Cache Parameters

17. What are the different types of Apache Solr Highlighters?

There are the below 3 Apache Solr Highlighters?

Standard Highlighter: It is used to provides an exact match for the advanced query parsers.
FastVector Highlighter: It works best for the Unicode break iterators.
Postings Highlighter: It is much better than vector but not good with a high number of Query.

18. What is the command to create a Core in Apache Solr?

The command to create Apache Solr core is as below. In the below example the Sample_core is the core name.

$ ./bin/solr create -c Sample_core

19. What is the command to delete Apache Solr Core?

$ ./bin/solr delete -c Sample_core

20. What is the difference between Apache Solr and Apache Drill?

Below list are some of the differences between Apache Solr and Apache Drill.

Apache Solr	Apache Drill
Apache Solr is a search engine.	Apache Drill is a database.
Solr is an open-source project managed by Apache Software Foundation.	Drill is also an open-source project managed by Apache Software Foundation.
The Language supported by Apache Drill is.JavaScript, Net, Java, PHP, Python, Perl, Ruby, etc.	Drill supports the C++ language.
Apache Solr's search engine is based on Apache Lucene.	Apache Drill uses the SQL query engine to process SQL queries.

21. How to check the status of Apache Solr?

The status command is used to show the status of Apache Solr.

$ bin/solr status

22. How to bring down Apache Solr?

To bring down the Apache Solr, press the CTRL + C command from the same terminal from which it was launched.

23. What is the use of Phonetic Filter in Apache Solr?

The Apache Solr Phonetic Filter uses phonetic encoding algorithms to create the tokens. The algorithms are present underorg.apache.commons.codec.language package.

24. List out some companies name which is using Apache Solr?

The following are some of the lists of companies that are using the Apache Solr.

Immonet
Instagram
FCC.gov
Comcast
AOL
eHarmony
Sears
Ticketmaster
GameSpot
The
Chegg
CNET
Zappos.com
Pink
Miinto.com
SourceForge
http://news.com
Panasonic
eBay
digg
Buy.com's
Internet
StubHub!,
Smithsonian
Homeland
NASA

25. What is the operation Apache Solr performs to search a document?

Apache Solr performs the below list of tasks to search a document.

Indexing: It is the first step in which the user's documents are changed into a machine-readable format that is also termed indexing.
Querying: The user fires a query that can include text search requests, image search requests, or keywords.
Mapping: In this stage, Solr checks the user query and finds the relevant matching result.
Ranking: A rank is given to the search result.

26. What are Apache Solr Field Type Definitions and Properties?

The Solr Filed types indicate the operations and the analysis that will be carried on once the user documents are indexed and when a search query will be triggered on those documents.

The below are the four pieces of information that are included in the Field definition.

A mandatory filed name type.
A mandatory implementation class name.
A detailed description of the field type.
Based on the implementation class the Field properties.

27. What are the most common ways to index documents in Apache Solr?

The below are the 3 most common ways of indexing documents in Apache Solr.

A User can develop an application that uses the Java API to ingest data.
If a user is uploading an XML file then a request is sent to the Solr server using an HTTP request.
Apache Solr Cell framework can be used to ingest the data in the format of structured or binary.

28. What is SolrCloud?

SolrCloud is a highly available, distributed, and fault-tolerant environment cluster of nodes on which index contents are distributed. So when a search request is triggered it goes to multiple servers which intern provides better performance. To manage overall clusters of nodes, Apache Zookeeper is used so that document indexing and user's search request can be assigned to appropriate nodes properly.

29. What is the use of Apache Solr Field analyzers?

The Apache Solr Field analyzers are used to check the field text and generate a token stream. The work of Field analyzers starts when a search query is fired from the end-user or data is ingested in Solr for indexing.

30. What is the task of Apache Solr Field Analysis?

Apache Solr Field analysis is used to process the user document using its three important parts of field type namely, Analyzers, Tokenizers, and Filters. The analyzer analyzes the field text and creates a token stream. The task of Tokenizers is to take data from a reader and generate Token objects. The Filters take the input data and generates the token stream.