Why Solr Search?
Apache Solr is a free open source search solution -- built on top of the Lucene index and search engine -- that can run standalone on a computer as small as a Raspberry Pi, but can also scale up to an enterprise search solution across a cluster of computers. It is written in Java, so it can run on most operating systems, and some sources estimate that more than two-thirds of enterprise search currently runs on Solr.
Included here are step by step instructions for installing and configuring a single node Solr server. Although there are no shortage of articles on the Internet that will tell you how to install Solr, what I am including here are a few additional steps that are usually contained in different articles, to provide one stop shopping for building and testing a fully functional Solr installing including:
- Configuring SolrCloud (out of the box Solr installs as Standalone)
- Basic Authentiction (out of the box Solr provides no user level security)
- Indexing Rich Documents (e.g., Word, PDF, HTML) using Solr Cell & Tika
While these instructions were created on a Raspberry Pi 4B with 8 GB RAM, running Bullseye / Debian 11 64-bit, there are many reports of running Solr on less powerful Rasberry Pis (RPi 3Bs and even 2Bs) with less memory (1GB), as well as (of course) much more powerful computers. These instructions should apply as-is for or any computer running Debian Linux on supported hardware. These insturctions have also been used for CentOS / Rocky 8 Linux, with the only difference being the package manager (yum vs. apt-get) used to install Java.
While these instructions include configuring Solr for SolrCloud -- vs. Standalone, which unlocks additional features and permits expansion to a cluster of Solr computers -- the result here is a single instance of Solr. When you are ready to expand to multiple Solr Search nodes, please see the Part 2 for creating multiple nodes with fail-over using ZooKeeper.
Install Java
As of the time of this writing, the latest version of Solr (9.2) requires Java 11 or later. Athough there are later versions of Java, I suggest Java 11 because it has been around for a while, and has Long Term Support (LTS).
$ sudo apt-get install openjdk-11-jdk (Debian / Raspberry Pi OS)
$ sudo yum install java-11-openjdk (CentOS / Rocky)
Install Solr 9.2 (with Lucene 9.4)
1. Download the latest version of Solr, from the Apache Solr downloads page, e.g.:
https://solr.apache.org/downloads.html
$ curl -L https://www.apache.org/dyn/closer.lua/solr/solr/9.2.1/solr-9.2.1.tgz?action=download -o solr-9.2.1.tgz
2. Extract the Solr installation script from the download file to the current directory, e.g.:
$ tar xvzf solr-9.2.1.tgz solr-9.2.1/bin/install_solr_service.sh --strip-components=2
3. Run the installation script as root, and enable the Solr service at startup
$ sudo ./install_solr_service.sh solr-9.2.1.tgz
id: ‘solr’: no such user
Creating new user: solr
Adding system user `solr' (UID 117) ...
Adding new group `solr' (GID 126) ...
Adding new user `solr' (UID 117) with group `solr' ...
Creating home directory `/var/solr' ...
Extracting solr-9.2.1.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-9.2.1 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
? solr.service - LSB: Controls Apache Solr as a Service
Loaded: loaded (/etc/init.d/solr; generated)
Active: active (exited) since Fri 2023-07-07 08:41:17 EDT; 5s ago
Docs: man:systemd-sysv-generator(8)
Process: 8130 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)
CPU: 23ms
Jul 07 08:41:07 dmz systemd[1]: Starting LSB: Controls Apache Solr as a Service...
Jul 07 08:41:07 dmz su[8132]: (to solr) root on none
Jul 07 08:41:07 dmz su[8132]: pam_unix(su-l:session): session opened for user solr(uid=117) by (uid=0)
Jul 07 08:41:17 dmz systemd[1]: Started LSB: Controls Apache Solr as a Service.
$ sudo systemctl enable solr
4. If you are running headless or will be accessing Solr from another computer on your network, enable all IP addresses, and then restart Solr
Note: This step is unnecessary if you use and manage Solr solely from the computer on which it is being installed; otherwise note that without security enabled, this will open up your Solr installtion to anyone on your network.
$ sudo vi /etc/default/solr.in.sh
SOLR_JETTY_HOST="127.0.0.1"
change that line to:
SOLR_JETTY_HOST="0.0.0.0"
$ sudo systemctl restart solr
5. Test Solr by connecting to the administrative console (replace localhost with the server name if using a browser on a different computer)
http://localhost:8983
Enable SolrCloud Mode
Although the default installation will start Solr in standalone mode, enabling SolrCluster provides greater capabilities in terms of distributing indexes across multiple servers, sharing confiration information across servers (via ZooKeeper), provides some additional management APIs, etc.
6. To enable SolrCloud mode:
$ sudo vi /etc/init.d/solr
SOLR_CMD="$1"
change that line to:
SOLR_CMD="$1 -c"
$ sudo systemctl restart solr
Create a New Configuration Set with Solr Cell + Tika
Current versions of Solr ship with Solr Cell (initially Solr Content Extraction Library) and Apache Tika, which permit extracting text and metadata from rich document formats, and then formatting it for indexing. This currently incudes the following formats:
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
The default (_default) configuration set does not support this out-of-the-box, but a new configuration set that supports it can be created. In SolrCloud mode, the new configuation must then also be uploaded to ZooKeeper (vs. standalone mode, whereby configuration sets are stored on the filesystem).
7. Copy the the default configuration set (_default) to a new configuration set (solr_cell_config), and upload the new configuration set to ZooKeeper
$ cd /opt/solr/server/solr/configsets
$ sudo cp -r _default solr_cell_config
$ sudo vi solr_cell_config/conf/solrconfig.xml
and add to the end of the file (before the closing </config>)
<lib dir="${solr.install.dir:../../..}/modules/extraction/lib" regex=".*\.jar" />
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
$ /opt/solr/bin/solr zk upconfig -n solr_cell_config -z localhost:9983 -d /opt/solr/server/solr/configsets/solr_cell_config
Create a Solr Collection (Index) with Example Rich Format Documents
Although most applications will use the Solr API (or Lucene API) for loading documents into a collection (index), Solr includes an easy to use post program for indexing documents from the command line.
This section verifies that SolrCloud + Solr Cell + Tika are all working correctly.
8. Create a new example collection using the new configuation set (with two (2) shards)
$ sudo su - solr
$ cd /opt/solr
$ bin/solr create -c example -n solr_cell_config -s 2
WARN - 2023-07-11 13:38:44.283; org.apache.solr.common.cloud.SolrZkClient; Using default ...
WARN - 2023-07-11 13:38:44.548; org.apache.solr.common.cloud.SolrZkClient; Using default ....
Re-using existing configuration directory solr_cell_config
Created collection 'example' with 2 shard(s), 1 replica(s) with config-set 'solr_cell_config'
9. While in the same directory, index the example rich documents that have been provided
$ bin/post -c example example/exampledocs/*
java -classpath /opt/solr/server/solr-webapp/webapp/WEB-INF/lib/solr-core-9.2.1.jar ...
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/example/update...
Entering auto mode. File endings considered are xml,json,...,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
POSTing file books.json (application/json) to [base]/json/docs
POSTing file gb18030-example.xml (application/xml) to [base]xtract
... etc ...
POSTing file sample.html (text/html) to [base]/extract
POSTing file solr-word.pdf (application/pdf) to [base]/extract
... etc ...
21 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/example/update...
Time spent: 0:00:31.648
10. Search the collection for a hardcover book:
$ curl http://localhost:8983/solr/example/select?q="cat:hardcover"
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":41,
"params":{
"q":"cat:hardcover"}},
"response":{"numFound":1,"start":0,"maxScore":0.9945897,"numFoundExact":true,"docs":[
{
"id":"978-0641723445",
"cat":["book",
"hardcover"],
"name":["The Lightning Thief"],
"author":["Rick Riordan"],
"series_t":"Percy Jackson and the Olympians",
"sequence_i":1,
"genre_s":"fantasy",
"inStock":[true],
"price":[12.5],
"pages_i":384,
"_version_":1771148950850502656}]
}}
Enable Basic Authentication
11. Create and upload a custom security profile with Basic Authentication and Rule Based Authorization:
Admin Username: solr
Admin Pasword: SolrRocksIn2023!
Create Roles: admin (assigned to solr) & reader (to assign to read-only users)
$ cd /opt/solr/server/solr/configsets/solr_cell_config/conf
$ sudo vi security.json
{
"authentication":{
"blockUnknown":true,
"class":"solr.BasicAuthPlugin",
"credentials":{
"solr":"2SCRGOEHbHa3RSeUk0/YsK9zzHoDNZQ0E4YJmWgQxqA= GAZ0iKzIaHkAgGsIaKFb9AAc6w13gSShh+5dgnCOyBw="},
"":{"v":262}},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{
"name":"security-edit",
"role":["admin"],
"index":1},
{
"name":"security-read",
"role":[
"admin",
"reader"],
"index":2},
{
"name":"config-edit",
"role":["admin"],
"index":3},
{
"name":"config-read",
"role":[
"admin",
"reader"],
"index":4},
{
"name":"collection-admin-edit",
"role":["admin"],
"index":5},
{
"name":"collection-admin-read",
"role":[
"admin",
"reader"],
"index":6},
{
"name":"core-admin-edit",
"role":["admin"],
"index":7},
{
"name":"core-admin-read",
"role":[
"admin",
"reader"],
"index":8},
{
"name":"read",
"role":["reader"],
"index":9},
{
"name":"schema-read",
"role":["reader"],
"index":10},
{
"name":"metrics-read",
"role":["reader"],
"index":11},
{
"name":"filestore-read",
"role":["reader"],
"index":12},
{
"name":"package-read",
"role":["reader"],
"index":13},
{
"name":"health",
"role":["reader"],
"index":14}],
"user-role":{
"solr":[
"admin",
"reader"]},
"":{"v":264}}}
$ /opt/solr/bin/solr zk cp file:security.json zk:/security.json -z localhost:9983
For instructions for creating multiple Solr nodes with configuation management and fail-over using ZooKeeper: