Setting up Apache ZooKeeper cluster

Page content

What is ZooKeeper (very rough)

Apache ZooKeeper storage hierarchal structure. Refer to ZNode.

https://de.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper

Install ZooKeeper

My environment

  • Ubuntu 20.04

Pre-installation

Install java first.

sudo apt update
sudo apt upgrade -y
sudo apt install -y openjdk-11-jdk

Crete ZooKeeper user.

sudo useradd -r -s /bin/bash zk

Install ZooKeeper on nodes

Download link https://zookeeper.apache.org/releases.html

sudo su
cd /opt
curl https://mirror.netcologne.de/apache.org/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz -O
tar xvf apache-zookeeper-3.6.2-bin.tar.gz
mkdir /var/lib/zookeeper
cd apache-zookeeper-3.6.2-bin
cp conf/zoo_sample.cfg conf/zoo.cfg

Edit conf/zoo.cfg as follows.

# Edit dataDir like
dataDir=/var/lib/zookeeper

# Add the following line
4lw.commands.whitelist=mntr,conf,ruok

Open TCP port 2181 for ZooKeeper clients e.g. SolrCloud, Kafka, etc..

Run a ZooKeeper with ./bin/zkServer.sh start. If it works fine, stop it with ./bin/zkServer.sh stop.

## Start
/opt/apache-zookeeper-3.6.2-bin# ./bin/zkServer.sh start

## Check log
/opt/apache-zookeeper-3.6.2-bin# tail -f logs/zookeeper-root-server-{{ your_servers_hostname }}.out

## Stop
/opt/apache-zookeeper-3.6.2-bin# ./bin/zkServer.sh stop

Change ownership

chown -R zk:zk /opt/apache-zookeeper-3.6.2-bin/
chown -R zk:zk /var/lib/zookeeper

Add to systemctl

# cat /etc/systemd/system/zk.service

[Unit]
Description=Zookeeper Daemon
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]    
Type=forking
WorkingDirectory=/opt/apache-zookeeper-3.6.2-bin
User=zk
Group=zk
ExecStart=/opt/apache-zookeeper-3.6.2-bin/bin/zkServer.sh start /opt/apache-zookeeper-3.6.2-bin/conf/zoo.cfg
ExecStop=/opt/apache-zookeeper-3.6.2-bin/bin/zkServer.sh stop /opt/apache-zookeeper-3.6.2-bin/conf/zoo.cfg
ExecReload=/opt/apache-zookeeper-3.6.2-bin/bin/zkServer.sh restart /opt/apache-zookeeper-3.6.2-bin/conf/zoo.cfg
TimeoutSec=30
Restart=on-failure
SuccessExitStatus=143

[Install]
WantedBy=default.target

We can manage like systemctl start zk.

ZooKeeper Cluster

Add the following linesin conf/zoo.cnf file.

maxClientCnxns=60
initLimit=10
syncLimit=5
server.1=your_zookeeper_node_1:2888:3888
server.2=your_zookeeper_node_2:2888:3888
server.3=your_zookeeper_node_3:2888:3888

Quote from reference.

ZooKeeper nodes use a pair of ports, :2888 and :3888, for follower nodes to connect to the leader node and for leader election, respectively.

Open TCP ports 2888 and 3888 between all ZooKeeper cluster nodes (both in and out.)

Create myid file under dataDir, in my case /var/lib/zookeeper.

$ cat /var/lib/zookeeper/myid
1

echo "1" >> /var/lib/zookeeper/myid && chown zk /var/lib/zookeeper/myid && chgrp zk /var/lib/zookeeper/myid In the second and third node, you should set the value /var/lib/zookeeper/myid as 2 and 3 respectively.

Run the cluster in all nodes.

systemctl start zk

Check it.

@server3
bin/zkCli.sh -server {{ your_zookeeper_node_1 }}:2181

[zk: node1:2181(CONNECTED) 0] create /zk_znode_1 sample_data
Created /zk_znode_1
[zk: node1:2181(CONNECTED) 1] ls /
[zk_znode_1, zookeeper]
[zk: node1:2181(CONNECTED) 2] get /zk_znode_1
sample_data

You can check from other nodes!!

Notes

Godd reference.

https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-an-apache-zookeeper-cluster-on-ubuntu-18-04

memo

https://www.corejavaguru.com/blog/bigdata/why-zookeeper-on-odd-number-nodes

  • ZooKeeper is server client model.
  • server:client = 1:many
  • server: a ZK -> ZKs (ensemble)
  • there is a leader of servers.
    • The purpose of the leader is to order client requests that change the ZooKeeper state: create, setData, and delete.
  • client send ping to connected server -> if not got ack, then connect to other server.
  • Leader doesnt have connection between client.