-
-
Notifications
You must be signed in to change notification settings - Fork 515
Elasticsearch setup guide
This guide will walk you through setting up an Elasticsearch production instance on Linux (please note that we assume that you have some Linux experience). We use this guide to setup and configure our own nodes on Azure.
We'd love your feedback on how we can improve our setup and configuration. It would be greatly appreciated if a docker guru could create some docker images based on the following tutorial :).
Let's start by creating a new virtual machine and select the latest 64bit Ubuntu
Operating System. After your up and running lets ensure it's running the latest software:
sudo apt-get update
sudo apt-get upgrade
Please note that this guide will install Elasticsearch 1.7.x and not the recent 2.x release.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list
sudo apt-get install elasticsearch
For more information on running Elasticsearch as a service (using SystemD) please read this.
You will want to attach a secondary hard disk/storage to your virtual machine before continuing. We use this disk to store the elastic search indexes. We create the largest one possible in azure as we only pay for space that is actually used. A plus side of doing this is you only have to pay for what is actually allocated on disk.
Get a list of the attached SCSI
devices.
dmesg | grep SCSI
Make sure it’s sdc
and that we are formatting the correct one.
sudo fdisk /dev/sdc
Command n
then p
and all defaults then w
to write it
sudo mkfs -t ext4 /dev/sdc1
Mount the new drive to /mnt/data
sudo mkdir /mnt/data
sudo mount /dev/sdc1 /mnt/data
Auto mount the drive on reboot.
sudo -i blkid
Grab the GUID for /dev/sdc1
and open fstab
.
sudo nano /etc/fstab
Paste in under the existing UUID:
UUID=YOUR_GUID /mnt/data ext4 defaults 0 0
Create the storage folders by creating a db
, log
and work
directory in /mnt/data
cd /mnt/data
mkdir db
mkdir log
mkdir work
Make elasticsearch user the owner of the folders
sudo chown -R elasticsearch:elasticsearch /mnt/data/
sudo chown -R elasticsearch:elasticsearch /mnt/data/log
sudo chown -R elasticsearch:elasticsearch /mnt/data/work
sudo chown -R elasticsearch:elasticsearch /mnt/data/db
Lets install the Cloud Azure, HQ and Marvel plugins
cd /usr/share/elasticsearch
sudo bin/plugin -i elasticsearch/elasticsearch-cloud-azure/2.8.2
sudo bin/plugin -i royrusso/elasticsearch-HQ
sudo bin/plugin -i elasticsearch/marvel/latest
It's important that you decide early on roughly how many nodes and how much ram the nodes will have so you can configure it properly. It's recommend that you at least three nodes with two master nodes. Having lots of ram and faster storage will help greatly.
Update the Elasticsearch configuration. We have our configuration file located here:
sudo nano /etc/elasticsearch/elasticsearch.yml
Edit the environment config and set ES_HEAP_SIZE
to half of the ram size:
sudo nano /etc/default/elasticseach
Set MAX_LOCKED_MEMORY=unlimited
sudo nano /etc/init.d/elasticsearch
Update system limits
sudo nano /etc/security/limits.conf
With these values
elasticsearch - nofile 65535
elasticsearch - memlock unlimited
Update SystemD configuration settings
sudo nano /usr/lib/systemd/system/elasticsearch.service
With these values
LimitMEMLOCK=infinity
Restart the service to ensure the configuration is picked up
sudo /bin/systemctl restart elasticsearch
Finally, lets verify that mlockall
is true
and maxfiles
is 65535
.
curl http://localhost:9200/_nodes/process?pretty
Ensure Elasticsearch starts after reboot via SystemD:
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable elasticsearch.service
This section assumes that you've configured the Cloud-Azure plugin in the previous configuration step with your Azure blob storage access keys. The cleanup scripts require you to install curator.
We'll create a new snapshot repository. You'll need to follow this step as well if you wish to restore production data to a secondary cluster.
PUT _snapshot/ex_stacks
{
"type": "azure",
"settings": {
"base_path": "stacks"
}
}
PUT _snapshot/ex_organizations
{
"type": "azure",
"settings": {
"base_path": "organizations"
}
}
PUT _snapshot/ex_events
{
"type": "azure",
"settings": {
"base_path": "events"
}
}
To create a backup and view the status of a snapshot:
GET _snapshot/ex_stacks/_status
PUT /_snapshot/ex_stacks/2015-12-01-12-00
{
"indices": "stacks*",
"ignore_unavailable": "true"
}
GET _snapshot/ex_events/_status
PUT /_snapshot/ex_events/2015-12-01-12-00
{
"indices": "events*",
"ignore_unavailable": "true"
}
GET _snapshot/ex_organizations/_status
PUT /_snapshot/ex_organizations/2015-12-01-12-00
{
"indices": "organizations*",
"ignore_unavailable": "true"
}
We recommend creating these files on one of your elastic nodes:
Let's navigate to the data directory:
cd /mnt/data
Create the events snapshot script
touch events_snapshot
chmod +x events_snapshot
nano stacks_snapshot
With the content:
#!/bin/bash
DATE=`date +%Y-%m-%d-%H-%M`
curl -XPUT "localhost:9200/_snapshot/ex_events/$DATE?wait_for_completion=true" -d '{
"indices": "events*",
"ignore_unavailable": "true"
}'
Create the stacks snapshot script
touch stacks_snapshot
chmod +x stacks_snapshot
nano stacks_snapshot
With the content:
#!/bin/bash
DATE=`date +%Y-%m-%d-%H-%M`
curl -XPUT "localhost:9200/_snapshot/ex_stacks/$DATE?wait_for_completion=true" -d '{
"indices": "stacks*",
"ignore_unavailable": "true"
}'
Create the organizations snapshot script
touch organizations_snapshot
chmod +x organizations_snapshot
nano organizations_snapshot
With the content:
#!/bin/bash
DATE=`date +%Y-%m-%d-%H-%M`
curl -XPUT "localhost:9200/_snapshot/ex_organizations/$DATE?wait_for_completion=true" -d '{
"indices": "organizations*",
"ignore_unavailable": "true"
}'
Create the snapshot cleanup script
touch cleanup_snapshots
chmod +x cleanup_snapshots
nano cleanup_snapshots
With the content:
#!/bin/bash
/usr/local/bin/curator delete snapshots --older-than 7 --time-unit days --timestring %Y-%m-%d --repository ex_events
/usr/local/bin/curator delete snapshots --older-than 7 --time-unit days --timestring %Y-%m-%d --repository ex_stacks
/usr/local/bin/curator delete snapshots --older-than 7 --time-unit days --timestring %Y-%m-%d --repository ex_organizations
Create the index cleanup script
touch cleanup_indexes
chmod +x cleanup_indexes
nano cleanup_indexes
With the content:
#!/bin/bash
curator delete indices --older-than 7 --time-unit days --timestring %Y.%m.%d
Edit the Cron job
crontab -e # choose option 2
Add the following Cron jobs
* */12 * * * /mnt/data/events_snapshot >/dev/null 2>&1
40 * * * * /mnt/data/stacks_snapshot >/dev/null 2>&1
40 * * * * /mnt/data/organizations_snapshot >/dev/null 2>&1
10 23 * * * /mnt/data/cleanup_snapshots >/dev/null 2>&1
0 23 * * * /mnt/data/cleanup_indexes >/dev/null 2>&1
You can verify that your cronjob has ran by running: tail -n 20 /var/log/syslog
You'll first want to setup the snapshot repositories as well as install and configure the Cloud-Azure plugin before restoring to a new cluster.
List of all snapshots:
GET _snapshot/ex_stacks/_all
GET _snapshot/ex_events/_all
GET _snapshot/ex_organizations/_all
To do a restore of all indices run the following command (please take a look at the Elasticsearch documentation on how to restore a single index):
POST _snapshot/ex_organizations/2015-12-01-12-30/_restore
{
"include_global_state": false
}
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": "<IP ADDRESS OF A NODE>"
}
}
Add the latest repository and update packages:
add-apt-repository "deb http://packages.elasticsearch.org/elasticsearch/1.7/debian stable main"
apt-get update
apt-get upgrade
Updating plugins.
cd /usr/share/elasticsearch
bin/plugin -r cloud-azure && bin/plugin -i elasticsearch/elasticsearch-cloud-azure/2.8.2
bin/plugin -r HQ && bin/plugin -i royrusso/elasticsearch-HQ
bin/plugin -r marvel && bin/plugin -i elasticsearch/marvel/latest
Restart the service
sudo /bin/systemctl restart elasticsearch
Some good tips for making sure you setup Azure correctly:
- http://www.elastic.co/guide/en/elasticsearch/reference/1.7/setup-repositories.html
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.7/setup-configuration.html
- http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/
- http://www.elasticsearch.org/guide/en/elasticsearch/guide/1.7/hardware.html
- http://svops.com/blog/elasticsearch-best-practices/
- http://blogs.endjin.com/2014/08/gotchas-when-installing-an-elasticsearch-cluster-on-azure/
- https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/
- http://asquera.de/opensource/2012/11/25/elasticsearch-pre-flight-checklist/
- https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/