Elasticsearch Optimization or Elascitsearch Performance Tuning

ElasticSearch Optimization

ElasticSearch Performance Tuning

How to make Elasticsearch Efficient

Running Elasticsearch may costs you in an unwanted way by eating up resources of your infrastructure. Hence you should do optimization/tuning to run it cost and performance efficiently. This post will drive you through the basic optimization of Elasticsearch. Following are some points to understand first.

  • Indexes are horizontally split into shards.
  • Shards are distributed by Elasticsearch on different Nodes.
  • A Node is a java process.
  • There could be more than one Node on the same machine.
  • Nodes are formed automatically to form clusters.
  • Shards replica would be on different Node to prevent failure.

The above points can be understood by seeing the following diagram..

Example of a cluster with 2 nodes holding one index with 4 shards and 1 replica shard

There are three roles a node can play
1. Master – A master node is responsible for managing state of the cluster and distribution of shards in the cluster. If the master goes down, the cluster automatically starts election process and choose a master.

2. Data – Data node is responsible for holding all the indexes in shards. It perform indexing and execute search queries.

3. Client –  Client node act as the communication interface for the coming requests and distribute it to data nodes. Than it aggregate the overall results from all the shards and responds back.

Scalable Architecture

 Performance Tuning

# Set Heap Size

The ES_HEAP_SIZE should be set to half of the available memory of the system. It should not be given more than the half of the available memory.
To set Es_HEAP_SIZE, edit the file /etc/default/elasticsearch and look for the below text and set it to half (if 4GB RAM than set it to 2GB)
Also the ES_MIN_MEM and ES_MAX_MEM should be set same in /etc/elasticsearch/elasticsearch.yml 
# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=2g

# Process should not get swapped

 The process should not get swapped to read and write the indexes on disk as it will degrade the performance. To make this happen set bootstrap.mlockall to true in file /etc/elasticsearch/elasticsearch.yml 
Also set the MAX_LOCKED_MEMORY=unlimited in /etc/default/elasticsearch file.  
To check if mlockall is true, run the following command
curl http://localhost:9200/_nodes/process?pretty
# Adjusting swappiness to 1
The process might get swap to disk if the memory is not enough. It is recommended to off the swap strictly. Temporary way to disable the swap is
sudo swappoff -a
To permanently disable it, find the line in /etc/fstab referring to swap and comment it. Turning swappiness is however not a very good idea as it may invoke the OOM killer and may kill the java process making Elasticsearch to shutdown. So it is advisable to set the swappiness to 1 (which ranges from 0 to 100). This makes OS to swap off the process to disk only in case of emergency.
To set this to 1 add the following line to /etc/sysctl.confvm.swappiness = 1
To reload this file after changing it run the command
sudo sysctl -p

# Java Pointers

On 64-bit system the ordinary object pointers (POP) can take 8 bytes of memory to address the complete memory. But we can compress the pointers to take only 4 bytes by setting the heap size to under 32gb.
Therefore if the Elasticsearch run on a JDK before 6u23 release, than we should use XX:UseCompressedOops flag if the heap size is less than 32gb.

# Files

The Elasticsearch process can use up all the file descriptors. So one should make sure that the file descriptor limit should be enough for the user running the Elasticsearch. Set it to 32K or 64K. 

 


Leave a Reply

Your email address will not be published. Required fields are marked *