Elasticsearch Cluster Tutorial
Video Lecture
Part | Video Link |
---|---|
1 | |
2 |
Description
This is a small tutorial about creating a Cluster of Elasticsearch Servers with Metricbeat instances.
I will create 3 identical Ubuntu 20.04 servers in different regions of the world.
I will install Elasticsearch and Metricbeat on them and configure them with identical settings. Note that I am using Metricbeat as an example collector. You can install other beats, such as Filebeat and other collectors instead or in addition to Metricbeat. There are many possibilities.
My servers will from Digital Ocean.
I will select the basic droplets being $10 a month - Ubuntu 20.04, 2GB Ram, 1 CPU, 50GB SSD servers and start them in New York, Amsterdam and Singapore.
I will give them hostnames of ES1, ES2 and ES3.
They all have unique IP addresses which I will need to use in the Elasticsearch and Metricbeat configurations.
I will also name the nodes in the cluster, as node-1
, node-2
and node-3
.
Hostname | Node Name | IP Address |
---|---|---|
ES1 | node-1 | 203.0.113.1 |
ES2 | node-2 | 203.0.113.2 |
ES3 | node-3 | 203.0.113.3 |
Note
The ip addresses used in the above example table are for demonstration only. Replace with the IPs or domain names for each of your Elasticsearch server addresses.
Install Elasticsearch
SSH onto all 3 servers and enter the following commands.
Download and install the Elasticsearch public signing key.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Install dependencies
sudo apt-get install apt-transport-https
Save the repository definition
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
Update and install the Elasticsearch package
sudo apt-get update && sudo apt-get install elasticsearch
Edit the Elasticsearch configuration.
sudo nano /etc/elasticsearch/elasticsearch.yml
Modify properties in each elasticsearch.yml
by adding your node names and IP addresses.
ES1
cluster.name: mycluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["203.0.113.1", "203.0.113.2", "203.0.113.3"]
cluster.initial_master_nodes: ["203.0.113.1"]
ES2
cluster.name: mycluster
node.name: node-2
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["203.0.113.1", "203.0.113.2", "203.0.113.3"]
cluster.initial_master_nodes: ["203.0.113.1"]
ES3
cluster.name: mycluster
node.name: node-3
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["203.0.113.1", "203.0.113.2", "203.0.113.3"]
cluster.initial_master_nodes: ["203.0.113.1"]
Note that I named the cluster mycluster
. You can name it anything you want containing the letters a-z
,-
or.
Also, in the above settings, I have chosen my node-1
to be the initial master node. This is only important for when starting the servers for the first time. I will start node-1 first and confirm it has started before starting node-2
and node-3
. This is to ensure that all nodes register using the same cluster UUID. After the cluster has started, and all nodes are connected, any of the nodes can be chosen as master node if the current master node in use goes offline for any period of time. In poor network conditions, your master node may change regularly, and all the other nodes will re synchronise with the new agreed master.
Start Elasticsearch Master Node
Start Elasticsearch on ES1 first, wait and confirm its status as active
sudo service elasticsearch start
sudo service elasticsearch status
Check its default response and cluster health.
curl -XGET 'http://localhost:9200/?pretty'
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
There should be no errors.
Take note of the cluster_uuid
of the master mode.
curl -XGET 'http://localhost:9200/_cluster/state/master_node?pretty'
Start Elasticsearch Data Nodes
Start the other nodes and confirm statuses are active
sudo service elasticsearch start
sudo service elasticsearch status
Check Health
Now on any of the nodes (master or data), check the cluster health.
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
It should show that number_of_nodes
is > 1 and if you have 3 nodes in total, it should say number_of_nodes : 3
If not, then the nodes have probably not detected the master and created there own cluster UUID.
On each other node,
curl -XGET 'http://localhost:9200/_cluster/state/master_node?pretty'
and check if the cluster_uuid
matches the cluster_uuid
on the master node that you started first. node-1
in my case.
If the cluster_uuid
doesn't match, then delete the nodes folder in the data node server,
rm -rf /var/lib/elasticsearch/nodes
and restart
sudo service elasticsearch restart
Check again the cluster health for the correct value
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
In the end, when all nodes are running, they should all agree on the same cluster_uuid
when running and all have chosen the same master node.
curl -XGET 'http://localhost:9200/_cluster/state/master_node?pretty'
To see a list of node UUIDs that are active in the cluster,
curl -XGET 'http://localhost:9200/_cluster/state/nodes?pretty'
IP Rules
If your Elasticsearch servers are all public on the internet, then you should create some ip rules to block access.
In my example the IP address of the ES nodes are 203.0.113.1
, 203.0.113.2
and 203.0.113.3
so I will create IP rule that allow only them to communicate between each other.
Elasticsearch will use ports 9200
and 9300
by default.
On all 3 ES nodes execute,
#allow 9200 for certain ips and drop everything else
iptables -A INPUT -p tcp -s localhost --dport 9200 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.1 --dport 9200 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.2 --dport 9200 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.3 --dport 9200 -j ACCEPT
iptables -A INPUT -p tcp --dport 9200 -j DROP
#allow 9300 for certain ips and drop everything else
iptables -A INPUT -p tcp -s localhost --dport 9300 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.1 --dport 9300 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.2 --dport 9300 -j ACCEPT
iptables -A INPUT -p tcp -s 203.0.113.3 --dport 9300 -j ACCEPT
iptables -A INPUT -p tcp --dport 9300 -j DROP
#view rules
iptables -L
Note
Replace my example IP address above with your real IP addresses or domain names.
Install Metricbeat
Now that the cluster is confirmed running, its time to start ingesting data into it. I will use Metricbeat since it is a very popular solution and quick to setup.
On each of the master and data nodes, install the Metricbeat service.
curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.10.0-amd64.deb
sudo dpkg -i metricbeat-7.10.0-amd64.deb
Edit the configurations to point to all of the elastic search nodes.
sudo nano /etc/metricbeat/metricbeat.yml
Confirm that the system module is enabled
cd /etc/metricbeat
metricbeat modules list
Start and test status
sudo service metricbeat start
sudo service metricbeat status
Check for indices
curl http://localhost:9200/_cat/indices
Check for cluster health
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
Check who is the master node
curl -XGET 'http://localhost:9200/_cluster/state/master_node?pretty'
Check the ids of each node
curl -XGET 'http://localhost:9200/_cluster/state/nodes?pretty'
Add An Elasticsearch Datasource in Grafana
Key | Value | Notes |
---|---|---|
Name | Elasticsearch |
Or whatever name you want to use |
URL | http://203.0.113.1:9200 |
IP address or Domain name of your ES Server |
Index name | metricbeat-7.10.* |
check the correct index name using the curl http://localhost:9200/_cat/indices from the ES server |
Version | 7.0 | Elasticsearch version 7.10 was used in this tutorial |
Save and Test
Do you have problems connecting?
It is probably ip/firewall rules or the particular ES server is not running.
Add a new rule to each ES server to allow your Grafana server to access port 9200.
Get IP rule line numbers
iptables -L --line-numbers
Insert a rule for your Grafana server at 5 for example. Your Grafana IP address or domain name will be different than mine.
iptables -I INPUT 5 -p tcp -s 203.0.113.123 --dport 9200 -j ACCEPT