Elastic Search - Getting started (Draft)

February 13, 2021

Page content

Install (for test)

I used Docker image for test purpose.

https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html

docker run \
  -d --rm \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  docker.elastic.co/elasticsearch/elasticsearch:7.12.1

Check the status

$ curl localhost:9200
{
  "name" : "cfd8f0ec08e2",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "f5fi-57iSVWngx0slmijPA",
  "version" : {
    "number" : "7.12.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7",
    "build_date" : "2021-04-20T20:56:39.040728659Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Before setting - concepts

https://www.elastic.co/guide/en/elasticsearch/reference/current/add-elasticsearch-nodes.html

Cluster

When you start an instance of Elasticsearch, you are starting a node. An Elasticsearch cluster is a group of nodes that have the same cluster.name attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

Yes, Elasticseach store data! And they are distributed.

X-Pack

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html

X-Pack is an Elastic Stack extension that provides security, alerting, monitoring, reporting, machine learning, and many other capabilities. By default, when you install Elasticsearch, X-Pack is installed.

Compliant to FIPS 140-2

If you know FIPS 140-2, this is a cool option.

Elasticsearch offers a FIPS 140-2 compliant mode and as such can run in a FIPS 140-2 enabled JVM. In order to set Elasticsearch in fips mode, you must set the xpack.security.fips_mode.enabled to true in elasticsearch.yml

Elasticsearch does’t have its own WEB page

It is a role of Kibana.

Use ElasticSearch

Store data

Data or an item is called “document” in ElasticSearch. A document is in json format. The following REST request by cURL store a document in the index customer with type _doc.

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "John Doe"
}
'

## From file 
$ cat es.txt
{
   "name": "John Doe"
}
$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H "Content-Type: application/json" -d@es.txt
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

## We can POST data without id
curl -X POST "localhost:9200/customer/_doc/" -H 'Content-Type: application/json' -d'
{
  "time":  "hoo2",
  "value": "var2"
}
'
{"_index":"customer","_type":"_doc","_id":"Z-zkIHgBzujlMjfkth6e","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

ElasticSearch handles json data. pretty means “pretty formatted JSON returned.”

Query data

By the index number (in this example, 1)

$ curl -X GET "localhost:9200/customer/_doc/1?pretty"
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

Where to store?

https://www.elastic.co/de/blog/found-dive-into-elasticsearch-storage#elasticsearch-paths

Kibana

Viaualize the data in ElasticSearch.

Run

Here should be done more smart way by Docker network.

docker pull docker.elastic.co/kibana/kibana:7.17.0
docker run --link 6328c3ae0c34:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.17.0

Check localhost:5601/status.

Use (memo)

Go discover

indexapttern -> index (customer)
create index pattern
fields returned.

Nginx log integration

In client side Filebeat required. In ES server side, logstash required.

In Filebeat config file like here.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-nginx.html#configuring-nginx-module

Instead of Logstash

Fluentd may work. It is called EFK stack. Fluentd doesn’t store log.

Multi ElasticSearch, one Kibana?

Sure. https://www.elastic.co/guide/en/kibana/current/production.html#high-availability

Index, type, document, key

https://de.slideshare.net/NeilBaker18/elasticsearch-for-beginners

Indices ~ Database
Documents ~ Rows
Keys ~ Columns

Operation snippets

Get a list of indices:

curl http://localhost:9200/_cat/indices

Check the size of all indexes.

curl http://localhost:9200/_cat/shards?v

Get first 2 data from the index customer:

curl "http://localhost:9200/customer/_search?size=2"

Delete the index customer :

curl -X DELETE http://localhost:9200/customer

Delete data (id=1) in customer index:

curl -X DELETE "localhost:9200/customer/_doc/1?pretty"

Get items in the index customer (change the value of size parameter (default=10)):

curl -X GET "localhost:9200/customer/_search/?size=10&pretty"

Count the number of documents in the index customer:

curl -X GET "localhost:9200/customer/_count

Delete whole index customer data:

$ curl -X DELETE "localhost:9200/customer"

Search

Set data

curl -X PUT "localhost:9200/customer/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "John Doe2"
}
'

curl -X PUT "localhost:9200/customer/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "John Doe3"
}
'

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

$ curl -X GET "localhost:9200/customer/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "name" : "John Doe2"
    }
  }
}

The result:

{
  "query": {
    "match": {
      "name" : "John Doe2"
    }
  }
}
'
{
  "took" : 887,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.1143606,
    "hits" : [
      {
        "_index" : "customer",
        "_type" : "mytype",
        "_id" : "2",
        "_score" : 1.1143606,
        "_source" : {
          "name" : "John Doe2"
        }
      },
      {
        "_index" : "customer",
        "_type" : "mytype",
        "_id" : "3",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "John Doe3"
        }
      },
      {
        "_index" : "customer",
        "_type" : "mytype",
        "_id" : "1",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "John Doe"
        }
      }
    ]
  }
}

The ElasticSearch returns not only “John Doe2” but also “John Doe1” and “John Doe3”, because this is full-text search. You can see low score in the documents “John Doe2” and “John Doe1”. You can see details with _analyze.

$ curl -X GET "localhost:9200/customer/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "field": "name",
  "text": "3"
}
'
{
  "tokens" : [
    {
      "token" : "3",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<NUM>",
      "position" : 0
    }
  ]
}

$ curl -X GET "localhost:9200/customer/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "field": "name",
  "text": "This is a test"
}
'
{
  "tokens" : [
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "is",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "a",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "test",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

Not tested, but snippets In address.city field, match Berlin, get only fields name and address.

curl -X GET "localhost:9200/customer/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": { 
      "address.city" : "Berlin"
    }
  },
  "_source" : ["name","address"]
}
'

Input data with custom id?

I tried to put the data as below:

{
  "name": "John Doe",
  "_id": "myid"
}

And ElasticSearch returned the error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [_id] of type [_id] in document with id '1'. Preview of field's value: 'myid'",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
    }
  },
  "status" : 400
}

As the error message said, the field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters. If your custom index contains special charactors, like forward slashes /, escape URL escapes.

Terminology

https://www.elastic.co/guide/en/elastic-stack-glossary/current/terms.html