Elasticsearch backup and retention was very hard to do before ES Version 7.5. Because we need to write a script to delete the old backups from the storage and it is very difficult to find out which are old backups if you are not provided the date of snapshot in the name of snap. Now ES Version 7.5 on wards supporting the snapshots life cycle management and retention.

Now I am going to explain how to create the elsaticsearch backup using snapshots and create the snapshot life cycle management and retention.

First step is just download the latest version of the ES. The MacOS archive for Elasticsearch v7.5.0 can be downloaded and installed as follows. :

Just follow the link if you want install Elasticsearch in other formats : https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html. Recommended to use the package manager to install elasticsearch in production.

Elasticsearch can be started from the command line as follows:

By default, Elasticsearch runs in the foreground, prints its logs to the standard output (stdout), and can be stopped by pressing Ctrl-C.

You can test that your Elasticsearch node is running by sending an HTTP request to port 9200 on localhost:

Which should give a response like this :

You can follow the link if you want to run elasticsearch as daemon and get more insights on configurations: https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html

Next step is getting started with snapshot life cycle management (SLM)

This demonstration is to automatically back up Elasticsearch indices using the snapshots every day at a particular time. Once these snapshots have been created, they are kept for a configured amount of time and then deleted per a configured retention policy.

Before starting, it’s important to understand the privileges that are needed when configuring SLM if you are using the security plugin. You can disable it in config/elasticsearch.yml for demo purpose by setting xpack.security.enabled is false. But recommended to enable this on production server.

xpack.security.enabled: false

Before setting up SLM policy we need to setup a snapshot repository where the snapshots will be stored. for this example we’ll use a shared file system repository. Cloud storage service like Amazon service S3 is recommended for production use.

Now we created a repository in place, we can create a SLM policy to automatically manage the snapshots. The policies are well defined in JSON. When configuring a policy, retention can also optionally be configured. We can use PUT API to create the policy.

We can define in JSON:

When the snapshot should be taken, using cron syntax

Naming of each snapshots using date math to include the current date in the name of snapshot

The repository the snapshot should be stored in what we created already

Configuration to be used for the snapshots requests contains which indices should be in the snapshots, in this case everything.

Optional retention configuration includes snapshots expiry after how many days, always keep minimum number of snapshots, keep no more than maximum number of snapshots , even if there is less than 30days old.

Now we can set up 12.12 AM UTC automatic snapshot creating time on daily basis and name the snapshot prefixed with “my-snap”+ current date. then added the repository name as “my_repository” we already created. then configured backup to take all the indices. after that added retention policy of 3 days and minimum snapshots count 1 and maximum snapshots count 30 days. policy name is “my-snapshots” and we can use it for further validation.

While snapshots taken by SLM policies can be viewed through the standard snapshot API, SLM also keeps track of policy successes and failures in ways that are a bit easier to use to make sure the policy is working.

Instead of waiting for our policy to run, let’s tell SLM to take a snapshot as using the configuration from our policy right now instead of waiting for 12:12 AM.

Then we will get a response something like if it is successful:

The policy will continue to run on its configured schedule after this execution of the policy.

This request will return a response that includes the policy, as well as information about the last time the policy succeeded and failed, as well as the next time the policy will be executed, and also the number of snapshots taken and other stats .

NOTE: Use Cloud storage instead of system storage in production. I will explain how to setup and configure Elasticsearch backup on AWS S3 on Next.

Thanks for reading!