SolrCloud - set up

Page content

Install Solr

Install environment

  • Ubuntu 20.04

Pre-installation

Linux settings

Create an user solr.

sudo useradd -r -s /bin/bash solr

Change several kernel parameters in /etc/security/limits.conf.

solr hard nofile 65535
solr soft nofile 65535
solr hard nproc 65535
solr soft nproc 65535

Turn off swaps.

sudo swapoff -a

Relogin so that the change will takes a place.

Install Java

Install Java.

sudo apt install -y openjdk-11-jdk

Install ZooKeeper

Install Apache ZooKeeper.

Install Solr 8.8.0

sudo su
apt update
apt upgrade -y
apt install lsof
cd /opt
curl https://apache.mirror.digionline.de/lucene/solr/8.8.0/solr-8.8.0.tgz -O
tar xzf solr-8.8.0.tgz solr-8.8.0/bin/install_solr_service.sh --strip-components=2
bash ./install_solr_service.sh solr-8.8.0.tgz
systemctl stop solr
chown -R solr /opt/solr-8.8.0
chgrp -R solr /opt/solr-8.8.0

Open port 8983 inside the SolrCloud cluster and your local client.

Run Solr (command line test)

You can run and stop solr with systemctl (start|stop) solr, but just check the commands for sure. As solr user,

# Start
/opt/solr$ ./bin/solr start
# Stop
/opt/solr$ ./bin/solr stop

Now, You can check Solr from browser http://{{ your_solr_server_domain }}:8983/solr/ So far, we run solr single instance.

(Optional) Run SolrCloud demo

cd /opt/solr-8.8.0
bin/solr -e cloud 

2 Java process are running at different port in a server.

Start SolrCloud

Set up

At the node1,

sudo su - solr
cd /opt/solr-8.8.0
mkdir -p example/cloud/node1/solr
cp server/solr/solr.xml example/cloud/node1/solr
./bin/solr start -cloud -s example/cloud/node1/solr -p 8983 -z localhost:2181

example/cloud/node1/solr is arbitrary created directory for SolrCloud Home dir.

At the node2,

sudo su - solr
cd /opt/solr-8.8.0
mkdir -p example/cloud/node2/solr
cp server/solr/solr.xml example/cloud/node2/solr
./bin/solr start -cloud -s example/cloud/node2/solr -p 8983 -z localhost:2181

At the node3,

sudo su - solr
cd /opt/solr-8.8.0
mkdir -p example/cloud/node3/solr
cp server/solr/solr.xml example/cloud/node3/solr
./bin/solr start -cloud -s example/cloud/node3/solr -p 8983 -z localhost:2181

config set????

chmod u+x solr-8.8.0/server/scripts/cloud-scripts/zkcli.sh
./server/scripts/cloud-scripts/zkcli.sh -zkhost solrnode1.com -cmd upconfig -confname _default -confdir server/solr/configsets/_default/conf

Learn about Solr

doc.lucidworks.com/lucidworks-hdpsearch/2.5/Guide-Solr.html

Terminology: Cores, Collections & Nodes There are several terms that are used to describe parts of a SolrCloud implementation, and it’s helpful to try to understand them early:

Core - kind of index, scheme A single Solr instance, which represents a single Solr index. A core has a different set of configuration files and schema definitions than other cores.

Document - kind of record. under document there is field.

Collection - logical index of SolrCloud cluster. A group of cores that together form a single logical index. A collection has a different set of configuration files and schema definitions than other collections. (my word: a single collection could be distributed)

Shard A logical section of a single collection.

Node A Java Virtual Machine instance running Solr, commonly known as a server. Multiple cores can run on a node if you wish.

browser, collections, sample_collection, _default, 2,2,show advanced, 2.

https://mkyong.com/solr/apache-solr-hello-world-example/

3.2 What is a Solr Core? In Apache Solr, a Solr Core is also known as simply “Core”. A Core is an Index of texts and fields available in all documents. One Solr Instance can contain one or more Solr Cores. In other words, a Solr Core = an instance of Apache Lucene Index + Solr Configuration (solr.xml,solrconfig.xml etc.)

3.3 What is Indexing? In Apache Lucene or Solr, Indexing is a technique of adding Document’s content to Solr Index so that we can search them easily. Apache Solr uses Apache Lucene Inverted Index technique to Index it’s documents. That’s why Solr provides very fast searching feature.

3.4 What is a Document? In Apache Solr, a Document is a group of fields and their values. Documents are the basic unit of data we store in Apache Cores. One core can contain one or more Documents.

3.5 What is a Field? In Apache Solr, a Field is actual data stored in a Document. It is a key & value pair. Key indicates the field name and value contains that Field data. One Document can contain one or more Fields. Apache Solr uses this Field data to index the Docuemnt Content.

Good terminology https://cwiki.apache.org/confluence/display/SOLR/SolrTerminology

Config Set: A set of config files necessary for a core to function properly. Each config set has a name. At minimum this will consist of solrconfig.xml (SolrConfigXml) and schema.xml (SchemaXml),

My memo A document is returned as JSON, and its key-value paires are fields.

Doesn’t work

create Core.

./bin/solr create -c Solr_sample

https://cwiki.apache.org/confluence/display/SOLR/SolrTerminology https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783553235/1/ch01lvl1sec10/the-solr-architecture-and-directory-structure https://www.intra-mart.jp/document/library/iap/public/im_contents_search/solr_administrator_guide/texts/about/index.html

Memo error

*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 15537.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Error: Could not find or load main class org.apache.solr.util.SolrCLI
Caused by: java.lang.ClassNotFoundException: org.apache.solr.util.SolrCLI

Backup collection (To be updated)

In case of Solr (not SolrCloud) core can be backed up as follows (put it to any browser).

http://solrnode.com:8983/solr/{{ name_of_a_core }}/replication?command=backup

You could get response like below.

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">1093</int>
  </lst>
  <str name="status">OK</str>
</response>

The backup data was stored like {{ path_to_solr_instance }}/{{ name_of_a_core }}/data/snapshot.{{ datetime_info }}

Memo: Use Solr

https://www.youtube.com/watch?v=Zw4M4NGv-Rw

  • json store like ES
  • The schema is defined in schema.xml
    • in <field> tag