Why a ZooKeeper Cluster?
Apache ZooKeeper is a free open source solution for managing a cluster of similar computer nodes. It manages both availabiity and configuration of enabled cloud applications. Although it has many uses (it was originally a sub-project of the Apache Hadoop framework for distributed computing), the use case of interest herein is solely to manage a cluster of Solr Search nodes.
Although later versions of Solr come bundled with a ZooKeeper instance, that is primarily for testing and experimenting with SolrCloud on a single node. For a reliable production implementation, including load balancing and fail-over, it is recommened that an external (not bundled) version of ZooKeeper be installed.
Managing multiple Solr Search nodes with ZooKeeper means that the configurations are consistent (ZooKeeper maintains configuration files), and insures that one or more nodes (depending on the configuration) can experience failures, while the rest of the nodes continue to function.
Solr + ZooKeeper Nodes
For reliability, a minimum of three (3) nodes are required. That configuration allows any one node to become unavailabile, while the other two continue to function. It is typically recommended to implement an odd number of nodes for redundancy, given that a functional ZooKeeper cluster requires more than 50% of the configured nodes to be operational. So, for example:
- 2 nodes can sustain zero failures
- 3 nodes can sustain one failure
- 4 nodes can sustain one failure
- 5 nodes can sustain two failures
- 6 nodes can sustain two failures
- 7 nodes can sustain three failures
... etc ...
For maximum reliability, every Solr Search node should have one ZooKeeper node. For the example configuration provided herein, it is assuming three (3) Solr Search nodes have been installed, with the following IP addresses:
10.0.2.125 <= Solr + ZooKeeper Node 1
10.0.2.126 <= Solr + ZooKeeper Node 2
10.0.2.127 <= Solr + ZooKeeper Node 3
This should be adjusted for your network node IP addresses. It is also assumed that each node is running on a separate computer, each with a Solr Search installation and a ZooKeeper installation. While it is possible to install more than one Solr node on a single computer, that type of configuration is not explicitly covered here.
While the rest of this document covers installing and configuring ZooKeeper in a three (3) node Solr Search cluster, please follow these instructions for installing Solr Search on each node before continuing:
Install ZooKeeper 3.8 (repeat on each Solr Search node)
1. Download the Latest Stable Apache Zookeeper binary (currently apache-zookeeper-3.8.3-bin.tar.gz)
https://zookeeper.apache.org/releases.html
2. Copy to /opt and install
$ cd /opt
$ sudo tar xvf apache-zookeeper-3.8.3-bin.tar.gz
$ sudo ln -s apache-zookeeper-3.8.3-bin zookeeper
3. Create a Zookeeper user and change ownership
$ sudo useradd zk -m
$ sudo passwd zk
$ sudo chown -R zk:zk /opt/apache-zookeeper-3.8.3-bin
4. Create the Zookeeper configuration (initially commenting out other servers)
$ sudo su - zk
$ cd /opt/zookeeper/conf
$ cp zoo_sample.cfg zoo.cfg
$ vi zoo.cfg and change or verify:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/zk/data
clientPort=2181
maxClientCnxns=60
#server.1=10.0.2.125:2888:3888
#server.2=10.0.2.126:2888:3888
#server.3=10.0.2.127:2888:3888
4lw.commands.whitelist=mntr,conf,ruok
5. Start Zookeeper
$ sudo su - zk
$ cd /opt/zookeeper
$ bin/zkServer.sh start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
6. Verify Zookeeper (by connecting to it, enter quit command to exit)
$ bin/zkCli.sh -server localhost:2181
/usr/bin/java
Connecting to localhost:2181
...
[zk: localhost:2181(CONNECTED) 1] quit
7. Stop Zookeeper
$ bin/zkServer.sh stop
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
8. Create a ZooKeeper Id file (while still the zk user, from step #5)
$ mkdir ~zk/data
$ echo # > ~zk/data/myid
where # is replaced with:
1 on node 10.0.2.125
2 on node 10.0.2.126
3 on node 10.0.2.127
9. Create a Zookeeper startup file
$ vi zookeeper.service
[Unit]
Description=Zookeeper
Requires=network.target
After=network.target
[Service]
Type=forking
WorkingDirectory=/opt/zookeeper
User=zk
Group=zk
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
TimeoutSec=30
Restart=on-failure
[Install]
WantedBy=default.target
10. Install the ZooKeeper startup file
$ sudo cp zookeeper.service /usr/lib/systemd/system
11. Open up all network ports between servers (e.g., for RedHat / CentOS)
Note: Not required if you are not running a firewall (e.g., Raspbery Pi)
$ sudo firewall-cmd --zone=trusted --add-source=10.0.2.125/32 --permanent
$ sudo firewall-cmd --zone=trusted --add-source=10.0.2.126/32 --permanent
$ sudo firewall-cmd --zone=trusted --add-source=10.0.2.127/32 --permanent
$ sudo firewall-cmd --reload
Configure ZooKeeper (repeat on each Solr Search node now with ZooKeeper)
12. Edit the ZooKeeper configuration file and uncomment the servers by removing #
$ vi zoo.cfg and change:
#server.1=10.0.2.125:2888:3888
#server.2=10.0.2.126:2888:3888
#server.3=10.0.2.127:2888:3888
to:
server.1=10.0.2.125:2888:3888
server.2=10.0.2.126:2888:3888
server.3=10.0.2.127:2888:3888
13. Edit the Solr startup file to add the ZooKeeper node IP addresses
$ sudo vi /etc/init.d/solr
and change:
start|stop|restart|status)
SOLR_CMD="$1 -c"
;;
to:
start|stop|restart|status)
SOLR_CMD="$1 -c -p 8983 -z 10.0.2.125:2181,10.0.2.126:2181,10.0.2.127:2181"
;;
14. Start ZooKeeper (and enable ZooKepper on startup)
$ sudo systemctl start zookeeper
$ sudo systemctl enable zookeeper
15. Restart Solr with the new ZooKeeper configuration
$ sudo systemctl restart solr
Confirm the ZooKeeper Cluster
16. Verify Solr + ZooKeeper Cluster by connecting to each administrative console
http://10.0.2.125:8983
http://10.0.2.125:8983
http://10.0.2.125:8983
and navigating to Cloud => ZK Status to verify the status, for example:
Status: green
ZK connection string: 10.0.2.125:2181,10.0.2.103:2181,10.0.2.107:2181
Ensemble size: 3
Ensemble mode: ensemble
Dynamic reconfig enabled: true
| 10.0.2.125:2181 | 10.0.2.103:2181 | 10.0.2.107:2181 |
ok |
true |
true |
true |
clientPort |
2181 |
2181 |
2181 |
secureClientPort |
-1 |
-1 |
-1 |
zk_server_state |
follower |
leader |
follower |
zk_version |
3.8.3 |
3.8.3 |
3.8.3 |
zk_approximate_data_size |
620756 |
620756 |
620756 |
zk_znode_count |
228 |
228 |
228 |
zk_num_alive_connections |
2 |
2 |
2 |
serverId |
1 |
4 |
5 |
electionPort |
3888 |
3888 |
3888 |
quorumPort |
2888 |
2888 |
2888 |
role |