Elastic Search - Getting started (Draft)
Install (for test)
I used Docker image for test purpose.
https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html
docker run \
-d --rm \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
docker.elastic.co/elasticsearch/elasticsearch:7.12.1
Check the status
$ curl localhost:9200
{
"name" : "cfd8f0ec08e2",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "f5fi-57iSVWngx0slmijPA",
"version" : {
"number" : "7.12.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7",
"build_date" : "2021-04-20T20:56:39.040728659Z",
"build_snapshot" : false,
"lucene_version" : "8.8.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Before setting - concepts
https://www.elastic.co/guide/en/elasticsearch/reference/current/add-elasticsearch-nodes.html
Cluster
When you start an instance of Elasticsearch, you are starting a node. An Elasticsearch cluster is a group of nodes that have the same
cluster.name
attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.
Yes, Elasticseach store data! And they are distributed.
X-Pack
https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html
X-Pack is an Elastic Stack extension that provides security, alerting, monitoring, reporting, machine learning, and many other capabilities. By default, when you install Elasticsearch, X-Pack is installed.
Compliant to FIPS 140-2
If you know FIPS 140-2, this is a cool option.
Elasticsearch offers a FIPS 140-2 compliant mode and as such can run in a FIPS 140-2 enabled JVM. In order to set Elasticsearch in fips mode, you must set the xpack.security.fips_mode.enabled to true in elasticsearch.yml
Elasticsearch does’t have its own WEB page
It is a role of Kibana.
Use ElasticSearch
Store data
Data or an item is called “document” in ElasticSearch.
A document is in json format.
The following REST request by cURL store a document in the index customer
with type _doc
.
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}
'
## From file
$ cat es.txt
{
"name": "John Doe"
}
$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H "Content-Type: application/json" -d@es.txt
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
## We can POST data without id
curl -X POST "localhost:9200/customer/_doc/" -H 'Content-Type: application/json' -d'
{
"time": "hoo2",
"value": "var2"
}
'
{"_index":"customer","_type":"_doc","_id":"Z-zkIHgBzujlMjfkth6e","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
ElasticSearch handles json data.
pretty
means “pretty formatted JSON returned.”
Query data
By the index number (in this example, 1
)
$ curl -X GET "localhost:9200/customer/_doc/1?pretty"
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "John Doe"
}
}
Where to store?
https://www.elastic.co/de/blog/found-dive-into-elasticsearch-storage#elasticsearch-paths
Kibana
Viaualize the data in ElasticSearch.
Run
Here should be done more smart way by Docker network.
docker pull docker.elastic.co/kibana/kibana:7.17.0
docker run --link 6328c3ae0c34:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.17.0
Check localhost:5601/status
.
Use (memo)
Go discover
- indexapttern -> index (customer)
- create index pattern
- fields returned.
Nginx log integration
In client side Filebeat required. In ES server side, logstash required.
In Filebeat config file like here.
Instead of Logstash
Fluentd may work. It is called EFK stack. Fluentd doesn’t store log.
Multi ElasticSearch, one Kibana?
Sure. https://www.elastic.co/guide/en/kibana/current/production.html#high-availability
Index, type, document, key
https://de.slideshare.net/NeilBaker18/elasticsearch-for-beginners
- Indices ~ Database
- Documents ~ Rows
- Keys ~ Columns
Operation snippets
Get a list of indices:
curl http://localhost:9200/_cat/indices
Check the size of all indexes.
curl http://localhost:9200/_cat/shards?v
Get first 2 data from the index customer
:
curl "http://localhost:9200/customer/_search?size=2"
Delete the index customer
:
curl -X DELETE http://localhost:9200/customer
Delete data (id=1
) in customer
index:
curl -X DELETE "localhost:9200/customer/_doc/1?pretty"
Get items in the index customer
(change the value of size
parameter (default=10)):
curl -X GET "localhost:9200/customer/_search/?size=10&pretty"
Count the number of documents in the index customer
:
curl -X GET "localhost:9200/customer/_count
Delete whole index customer
data:
$ curl -X DELETE "localhost:9200/customer"
Search
Set data
curl -X PUT "localhost:9200/customer/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"name": "John Doe2"
}
'
curl -X PUT "localhost:9200/customer/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{
"name": "John Doe3"
}
'
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
$ curl -X GET "localhost:9200/customer/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name" : "John Doe2"
}
}
}
The result:
{
"query": {
"match": {
"name" : "John Doe2"
}
}
}
'
{
"took" : 887,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.1143606,
"hits" : [
{
"_index" : "customer",
"_type" : "mytype",
"_id" : "2",
"_score" : 1.1143606,
"_source" : {
"name" : "John Doe2"
}
},
{
"_index" : "customer",
"_type" : "mytype",
"_id" : "3",
"_score" : 0.13353139,
"_source" : {
"name" : "John Doe3"
}
},
{
"_index" : "customer",
"_type" : "mytype",
"_id" : "1",
"_score" : 0.13353139,
"_source" : {
"name" : "John Doe"
}
}
]
}
}
The ElasticSearch returns not only “John Doe2” but also “John Doe1” and “John Doe3”, because this is full-text search.
You can see low score in the documents “John Doe2” and “John Doe1”.
You can see details with _analyze
.
$ curl -X GET "localhost:9200/customer/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"field": "name",
"text": "3"
}
'
{
"tokens" : [
{
"token" : "3",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<NUM>",
"position" : 0
}
]
}
$ curl -X GET "localhost:9200/customer/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"field": "name",
"text": "This is a test"
}
'
{
"tokens" : [
{
"token" : "this",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "is",
"start_offset" : 5,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "a",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "test",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
Not tested, but snippets
In address.city
field, match Berlin, get only fields name
and address
.
curl -X GET "localhost:9200/customer/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"address.city" : "Berlin"
}
},
"_source" : ["name","address"]
}
'
Input data with custom id?
I tried to put the data as below:
{
"name": "John Doe",
"_id": "myid"
}
And ElasticSearch returned the error:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [_id] of type [_id] in document with id '1'. Preview of field's value: 'myid'",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
}
},
"status" : 400
}
As the error message said, the field [_id]
is a metadata field and cannot be added inside a document. Use the index API request parameters.
If your custom index contains special charactors, like forward slashes /
, escape URL escapes.
Terminology
https://www.elastic.co/guide/en/elastic-stack-glossary/current/terms.html