What is Indexing in Apache Solr?
In Apache Solr, Indexing is used to organize the document systematically so that the document required by the user can be founded easily. Indexing is required to perform a fast search operation on the user query. It helps to collect documents and then parse and after that store it on storage medium. We can use different types of document types for indexing such as PDF, CSV, XML, databases table data, etc.
Indexing involves the following three operations namely the addition of data to the index, deletion of data from the index, and the updating of data in the index.
In this tutorial, we will go through the addition of data in the Apache Solr index, and the following are the most common three ways through which we can do it.
- The post tool is used to add data in Apache Solr index.
- We can add data in Apache Solr Index using the Apache Solr web interface.
- We can use the client APIs as well such as Python, Java to add content in Apache Solr.
We will go through each way of data addition in the Apache Solr index.
Document addition using Apache Solr Post Command
Apache Solr Post command is used to add different formats of documents such as XML, CSV, JSON for indexing. We can use the Post command from the bin directory of Apache Solr.
Let us see the Apache Solr Post command with the below example.
We have a book.csv file in which the detail of the book such as its name, price, author, availability are present. The file is saved at the cloudduggu@ubuntu:~/hadoop/solr-8.8.1/bin$ location.
BOOK_ID, | CAT, | BOOK_NAME, | PRICE, | BOOK_AVAILABILITY, | BOOK_AUTHOR |
---|---|---|---|---|---|
0453573403, | book, | A Game of Thrones, | 8.99, | true, | George R.R. Martin |
0353579908, | book, | A Clash of Kings, | 6.99, | true, | George R.R. Martin |
064357342X, | book, | A Storm of Swords, | 9.99, | true, | George R.R. Martin |
0553293354, | book, | Foundation, | 8.99, | true, | Isaac Asimov |
0812521390, | book, | The Black Company, | 7.99, | false, | Glen Cook |
0814564706, | book, | Ender's Game, | 8.99, | true, | Orson Scott Card |
0443458532, | book, | Jhereg, | 9.95, | false, | Steven Brust |
0383459300, | book, | Nine Princes In Amber, | 7.99, | true, | Roger Zelazny |
0805875481, | book, | The Book of Three, | 8.99, | true, | Lloyd Alexander |
080567749X, | book, | The Black Cauldron, | 7.99, | true, | Lloyd Alexander |
BOOK_ID, CAT, BOOK_NAME, PRICE, BOOK_AVAILABILITY, BOOK_AUTHOR
0453573403, book, A Game of Thrones, 8.99, true, George R.R. Martin
0353579908, book, A Clash of Kings, 6.99, true, George R.R. Martin
0643573429, book, A Storm of Swords, 9.99, true, George R.R. Martin
0553293354, book, Foundation, 8.99, true, Isaac Asimov
0812521390, book, The Black Company, 7.99, false, Glen Cook
0814564706, book, Ender's Game, 8.99, true, Orson Scott Card
0443458532, book, Jhereg, 9.95, false, Steven Brust
0383459300, book, Nine Princes In Amber, 7.99, true, Roger Zelazny
0805875481, book, The Book of Three, 8.99, true, Lloyd Alexander
0805677499, book, The Black Cauldron, 7.99, true, Lloyd Alexander
We can use nano editor to create the book.csv file and press CTRL+O to save the file and CTRL+X to exit from the editor.
cloudduggu@ubuntu:~/hadoop/solr-8.8.1/bin$ nano book.csvNow let us see the Apache Solr Post command to index book.csv file in Solr_sample_core core.
Command:
cloudduggu@ubuntu:~/hadoop/solr-8.8.1/bin$ ./post -c Solr_sample_core book.csvOutput:
Once the Post command is executed the below output will be generated.
We can verify the document indexing by visiting the Apache Solr Web interface at http://localhost:8983/. Select the core name(Solr_sample_core) and click on the query option. Now leave everything default and click on the Execute Query button. Once the query is executed, we will see the indexing data in the format JSON by default.
Note: We have used our server IP "http://192.168.216.131:8983/" on which Solr is installed. You can also use your server IP or localhost to open the Solr Web interface.
Document addition using Apache Solr Web Interface
We can add the document in Apache Solr index by login into the Solr web interface.
Let us see this in the following example.
Login into http://localhost:8983/ and choose core that we have already created "Solr_sample_core" and click on the documents. The below window will be opened.
We have the below JSON records that we want to add in Solr for indexing.
{
"Emp_id" : "1001",
"Emp_name" : "Deepak",
"EMP_age" : 24,
"Emp_Designation" : "System Engineer",
"Emp_Work_Location" : "Delhi",
},
{
"Emp_id" : "1002",
"Emp_name" : "Ankit",
"EMP_age" : 28,
"Emp_Designation" : "Consultant",
"Emp_Work_Location" : "Deoria",
},
{
"Emp_id" : "1003",
"Emp_name" : "Kanheya",
"EMP_age" : 38,
"Emp_Designation" : "Manager",
"Emp_Work_Location" : "Odisha",
},
{
"Emp_id" : "1004",
"Emp_name" : "Sarvesh",
"EMP_age" : 32,
"Emp_Designation" : "Developer",
"Emp_Work_Location" : "Lucknow",
},
{
"Emp_id" : "1005",
"Emp_name" : "Thousif",
"EMP_age" : 30,
"Emp_Designation" : "DBA",
"Emp_Work_Location" : "Hyderabad",
}
Now select the document type JSON from the Solr Web portal and put the JSON records in the Documents tab, leave other options such as commit within and overwrite as it is, and click on the submission document.
After the successful submission of the document, we will see the status sucessful on the right side.