Replace Data Volumes of Elasticsearch Cluster with Encrypted Volumes
How To Remove Node From Elasticsearch Cluster Without Affecting Cluster Or Without Downtime
Encrypt Data Volumes Of Elasticsearch Node Without Any Downtime
Replace Existing Node Of Elasticsearch Cluster Without Any Downtime
You may encounter a situation when you need to remove/replace a node from an Elasticsearch cluster. let’s consider a use-case where we need to encrypt all the data volumes (Volumes containing Elasticsearch data) needs to be encrypted.
Following process will be followed to achieve the same:
- Select a Node, say N2, take AMI of the root volume.
- Launch an instance from the AMI taken, with the number of volumes to be used to store data. Check the option for “encrypted“.
- After launching it, Format and mount the volumes at the required mount-point.
- Change the node.name and network.host parameters in /etc/elasticsearch.yml file
- Since we have taken the image of root volume, the elasticsearch configuration should already be having the master node IP configuration. But now, since the IP has changed we simply need to update the node.name and network.host in elasticsearch.yml file.
- Start the Elasticsearch process. It will automatically join the cluster and the cluster will start relocating the data to this node.
- Now we have N+1 nodes in the cluster, we can decommission the existing node N2 on which we have unencrypted volume using a command.
- After decommissioning the instance, wait for all the data to be relocated from N2 to other nodes.
- When no data is left on the N2, stop the Elasticsearch process on it.
Let’s go deeper and see how to do this:
Considerations:
Consider a cluster having two instances N1 and N2.
Master node: N1
Data Node: N1, N2
Volumes attached to N1: V1N1 with mount-point /vol/es1 and V2N1 with mount-point /vol/es2
Volumes attached to N2: V1N2 with mount-point /vol/es1 and V2N2 with mount-point /vol/es2
All volumes are unencrypted.
Take AMI of root volume of N2 node, with the option “No Reboot” to make sure that current instance should not go down.
Launch a new instance from the AMI by adding two additional drive of required size and the check the encryption check-box.
Now, we have the new instance with two volumes which is going to serve the purpose of data volumes for Elasticsearch. These volumes are not yet mounted, so go ahead and mount the volumes.
Check the volume first with the command
fdisk -l
The available volumes may be /dev/xvdb and /dev/xvdc.
Now format the volumes with the required file system
mkfs.xfs /dev/xvdb
mkfs.xfs /dev/xvdc
Now mount the volumes:
mount /dev/xvdb /vol/es1
mount /dev/xvdc /vol/es2
As we have taken the AMI of the existing node, the mount point must be present with the required permission, if not, than do create them and give required permission.
Edit the /etc/elasticsearch/elasticsearch.yml file and change the values of following option
node.name give_unique_name_of_node
network.host private_ip_address_of_node
We do not need to configure Master Node IP address since we have take the AMI of existing instance, its already configured. If have created a new instance than do configure it.
Now, we are done with the required configuration. Simply start the elasticsearch process and observe the logs.
Logs after adding the new elasticsearch instance.
[2016-09-20 13:10:04,718][INFO ][cluster.service ] [X.X.X.X] detected_master {ip-X-X-
X-X.ec2.internal}{UfGNZVZ0SayYjZW50NO_FA}{X.X.X.X}{X.X.X.X:9300}{data=1, master=1},
added {{ip-X-X-X-X.ec2.internal}{UTZRgnw4TfiswzBTV2gEXw}{X.X.X.X}{X.X.X.X:9300}
{data=1, master=0},{ip-X-X-X-X.ec2.internal}{UfGNZVZ0SayYjZW50NO_FA}{X.X.X.X}
{X.X.X.X:9300}{data=1, master=1},}, reason: zen-disco-receive(from master [{ip-X-X-X-
X.ec2.internal}{UfGNZVZ0SayYjZW50NO_FA}{X.X.X.X}{X.X.X.X:9300}{data=1, master=1}])
Here X.X.X.X are the IPs. The important thing here to note is the line detected_master and added {ip-X-X-X-X.ec2.internal} which signify that our new instance has detected the master node and it has been added to the cluster.
Run the command to check if data relocation (Transfer of data to newly added node) has started or not.
curl P.P.P.P:9200/_cat/shards?v
where P.P.P.P is the private IP of the node. Output must be as follows:
collection_name 1 p RELOCATING 1392274 1.2gb X.X.X.X ip-X-X-X-X.ec2.internal -> Y.Y.Y.Y Mvv7ZxnCQqCapOhALzOAgA Y.Y.Y.Y
You may observe several lines like this which signifies that the data relocation is going on.
Data would get distributed evenly on the nodes. After this we will deregister the N2 node from the cluster. To deregister, run the following command on master node N1.
curl -XPUT M.M.M.M:9200/_cluster/settings -d '{"transient" :
{"cluster.routing.allocation.exclude._ip" :"P.P.P.P"}}';echo
Here replace the M.M.M.M with the private IP address of master node and P.P.P.P with the private IP address of the N2 node. This command will give the acknowledgement to remove the N2 from the cluster with the following output:
{"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":
{"allocation":{"exclude":{"_ip":"P.P.P.P"}}}}}}
After deregistering the N1, the data should start moving from N1 to other nodes which can be seen by running the following command:
curl http://M.M.M.M:9200/_cat/allocation?v
The output should be as follows:
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
0 0b 64.4mb 199.8gb 199.9gb 0 P1.P1.P1.P1 P1.P1.P1.P1 ip-p1-p2-p3-p4.ec2.internal
41 50.1gb 51.2gb 148.6gb 199.9gb 25 P2.P2.P2.P2 P2.P.2.P2.P2 P2.P.2.P2.P2
41 47.4gb 47.9gb 151.9gb 199.9gb 23 P3.P3.P3.P3 P3.P3.P3.P3 ip-P3-P3-P3-P3.ec2.internal
When the shards and disk.indices on the node N2 gets 0 that means all data is being transferred to other nodes and the N2 is safe to remove from the cluster.
Now Just stop the Elasticsearch process and you may terminate the instance N2.
Replacing the Master Node
We are done replacing the data-node. Now we will replace the Master node. In our case the master is also a data node which have two unencrypted volume. So we will be following the same process:
- Create a new node from the AMI of root volume of the node N1 and adding two volumes with encrypted option checked.
- Format and mount newly added volumes.
- Change the node.name, network.host paramters in /etc/elasticsearch/elasticsearch.yml file.
- Make sure to have node.master set to 1 in /etc/elasticsearch/elasticsearch.yml file.
This setting will make this node master enabled. so if we remove the current master node, this node will automatically become the master.
When this instance is ready, just start the elasticsearch process and it will automatically join the cluster and data relocation will start automatically which can be seen by the commands mentioned above for data node.
To check the current master node and the new master enabled node, run the following command:http://M.M.M.M:9200/_cat/nodes
This will show all the nodes in the cluster. An "*" sign in front of a node shows that its a master node, "m" signifies a master enabled node which we created just now and "-" shows a data node.
P1.P1.P1.P1 P1.P1.P1.P1 41 99 0.02 d - P1.P1.P1.P1
P2.P2.P2.P2 P2.P2.P2.P2 25 99 0.30 d * ip-P2-P2-P2-P2.ec2.internal
P3.P3.P3.P3 P3.P3.P3.P3 0 19 0.34 d m P3.P3.P3.P3
We can now deregister the N1 node with the command which we previously used to deregister the data-node. Just put the IP of N1 to deregister it. Also, after deregistering it, check if the acknowledgement is true in the output and data relocation has started or not.
Once, when the data relocation is done and no shards is left on N1, stop elastsicsearch process and now we can terminate/stop the N1 instance.
After stopping elasticsearch process, the master enabled node will automatically become the master and there would be only two nodes left in the cluster.