UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

In this article we'll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index and shard level files and give a short explanation of their contents in order to establish an understanding of the data written to disk by Elasticsearch.

Elasticsearch Paths Elasticsearch is configured with several paths: path.home : Home directory of the user running the Elasticsearch process. Defaults to the Java system property user.dir , which is the default home directory for the process owner.

: Home directory of the user running the Elasticsearch process. Defaults to the Java system property , which is the default home directory for the process owner. path.conf : A directory containing the configuration files. This is usually set by setting the Java system property es.config , as it naturally has to be resolved before the configuration file is found.

: A directory containing the configuration files. This is usually set by setting the Java system property , as it naturally has to be resolved before the configuration file is found. path.plugins : A directory whose sub-folders are Elasticsearch plugins. Sym-links are supported here, which can be used to selectively enable/disable a set of plugins for a certain Elasticsearch instance when multiple Elasticsearch instances are run from the same executable.

: A directory whose sub-folders are Elasticsearch plugins. Sym-links are supported here, which can be used to selectively enable/disable a set of plugins for a certain Elasticsearch instance when multiple Elasticsearch instances are run from the same executable. path.work : A directory that was used to store working/temporary files for Elasticsearch. It’s no longer used.

: A directory that was used to store working/temporary files for Elasticsearch. It’s no longer used. path.logs : Where the generated logs are stored. It might make sense to have this on a separate volume from the data directory in case one of the volumes runs out of disk space.

: Where the generated logs are stored. It might make sense to have this on a separate volume from the data directory in case one of the volumes runs out of disk space. path.data : Path to a folder containing the data stored by Elasticsearch. In this article, we’ll have a closer look at the actual contents of the data directory ( path.data ) and try to gain an understanding of what all the files are used for.

Where Do the Files Come from? Since Elasticsearch uses Lucene under the hood to handle the indexing and querying on the shard level, the files in the data directory are written by both Elasticsearch and Lucene. The responsibilities of each is quite clear: Lucene is responsible for writing and maintaining the Lucene index files while Elasticsearch writes metadata related to features on top of Lucene, such as field mappings, index settings and other cluster metadata – end user and supporting features that do not exist in the low-level Lucene but are provided by Elasticsearch. Let’s look at the outer levels of data written by Elasticsearch before we dive deeper and eventually find the Lucene index files.

Node Data Simply starting Elasticsearch from a empty data directory yields the following directory tree: $ tree data data └── elasticsearch └── nodes └── 0 ├── _state │ └── global-0.st └── node.lock The node.lock file is there to ensure that only one Elasticsearch installation is reading/writing from a single data directory at a time. More interesting is the global-0.st -file. The global- prefix indicates that this is a global state file while the .st extension indicates that this is a state file that contains metadata. As you might have guessed, this binary file contains global metadata about your cluster and the number after the prefix indicates the cluster metadata version, a strictly increasing versioning scheme that follows your cluster. While it is technically possible to edit these files with an hex editor in an emergency, it is strongly discouraged because it can quickly lead to data loss.

Index Data Let’s create a single shard index and look at the files changed by Elasticsearch: $ curl localhost:9200/foo -XPOST -H 'Content-Type: application/json' -d '{"settings":{"index.number_of_shards": 1}}' {"acknowledged":true} $ tree -h data data └── [ 102] elasticsearch └── [ 102] nodes └── [ 170] 0 ├── [ 102] _state │ └── [ 109] global-0.st ├── [ 102] indices │ └── [ 136] foo │ ├── [ 170] 0 │ │ ├── ..... │ └── [ 102] _state │ └── [ 256] state-0.st └── [ 0] node.lock We see that a new directory has been created corresponding to the index name. This directory has two sub-folders: _state and 0 . The former contains what’s called a index state file ( indices/{index-name}/_state/state-{version}.st ), which contains metadata about the index, such as its creation timestamp. It also contains a unique identifier as well as the settings and the mappings for the index. The latter contains data relevant for the first (and only) shard of the index (shard 0). Next up, we’ll have a closer look at this.

Shard Data The shard data directory contains a state file for the shard that includes versioning as well as information about whether the shard is considered a primary shard or a replica. $ tree -h data/elasticsearch/nodes/0/indices/foo/0 data/elasticsearch/nodes/0/indices/foo/0 ├── [ 102] _state │ └── [ 81] state-0.st ├── [ 170] index │ ├── [ 36] segments.gen │ ├── [ 79] segments_1 │ └── [ 0] write.lock └── [ 102] translog └── [ 17] translog-1429697028120 In earlier Elasticsearch versions, separate {shard_id}/index/_checksums- files (and .cks -files) were also found in the shard data directory. In current versions these checksums are now found in the footers of the Lucene files instead, as Lucene has added end-to-end checksumming for all their index files. The {shard_id}/index directory contains files owned by Lucene. Elasticsearch generally does not write directly to this folder (except for older checksum implementation found in earlier versions). The files in these directories constitute the bulk of the size of any Elasticsearch data directory. Before we enter the world of Lucene, we’ll have a look at the Elasticsearch transaction log, which is unsurprisingly found in the per-shard translog directory with the prefix translog- . The transaction log is very important for the functionality and performance of Elasticsearch, so we’ll explain its use a bit closer in the next section. Per-Shard Transaction Log The Elasticsearch transaction log makes sure that data can safely be indexed into Elasticsearch without having to perform a low-level Lucene commit for every document. Committing a Lucene index creates a new segment on the Lucene level which is fsync() -ed and results in a significant amount of disk I/O which affects performance. In order to accept a document for indexing and make it searchable without requiring a full Lucene commit, Elasticsearch adds it to the Lucene IndexWriter and appends it to the transaction log. After each refresh_interval it will call reopen() on the Lucene indexes, which will make the data searchable without requiring a commit. This is part of the Lucene Near Real Time API. When the IndexWriter eventually commits due to either an automatic flush of the transaction log or due to an explicit flush operation, the previous transaction log is discarded and a new one takes its place. Should recovery be required, the segments written to disk in Lucene will be recovered first, then the transaction log will be replayed in order to prevent the loss of operations not yet fully committed to disk.

Fixing Problematic Shards Since an Elasticsearch shard contains a Lucene Index, we can use Lucene’s wonderful CheckIndex tool, which enables us to scan and fix problematic segments with usually minimal data loss. We would generally recommend Elasticsearch users to simply re-index the data, but if for some reason that’s not possible and the data is very important, it’s a route that’s possible to take, even if it requires quite a bit of manual work and time, depending on the number of shards and their sizes. The Lucene CheckIndex tool is included in the default Elasticsearch distribution and requires no additional downloads. # change this to reflect your shard path, the format is # {path.data}/{cluster_name}/nodes/{node_id}/indices/{index_name}/{shard_id}/index/ $ export SHARD_PATH=data/elasticsearch/nodes/0/indices/foo/0/index/ $ java -cp lib/elasticsearch-*.jar:lib/*:lib/sigar/* -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex $SHARD_PATH If CheckIndex detects a problem and its suggestion to fix it looks sensible, you can tell CheckIndex to apply the fix(es) by adding the -fix command line parameter.

Storing Snapshots You might wonder how all these files translate into the storage used by the snapshot repositories. Wonder no more: taking this cluster, snapshotting it as my-snapshot to a filesystem based gateway and inspecting the files in the repository we’ll find these files (some files omitted for brevity): $ tree -h snapshots snapshots ├── [ 31] index ├── [ 102] indices │ └── [ 136] foo │ ├── [1.2K] 0 │ │ ├── [ 350] __0 │ │ ├── [1.8K] __1 ... │ │ ├── [ 350] __w │ │ ├── [ 380] __x │ │ └── [8.2K] snapshot-my-snapshot │ └── [ 249] snapshot-my-snapshot ├── [ 79] metadata-my-snapshot └── [ 171] snapshot-my-snapshot At the root we have an index file that contains information about all the snapshots in this repository and each snapshot has an associated snapshot- and a metadata- file. The snapshot- file at the root contains information about the state of the snapshot, which indexes it contains and so on. The metadata- file at the root contains the cluster metadata at the time of the snapshot. When compress: true is set, metadata- and snapshot- files are compressed using LZF, which focuses on compressing and decompressing speed, which makes it a great fit for Elasticsearch. The data is stored with a header: ZV + 1 byte indicating whether the data is compressed . After the header there will be one or more compressed 64K blocks on the format: 2 byte block length + 2 byte uncompressed size + compressed data . Using this information you can use any LibLZF compatible decompressor. If you want to learn more about LZF, check out this great description of the format. At the index level there is another file, indices/{index_name}/snapshot-{snapshot_name} that contains the index metadata, such as settings and mappings for the index at the time of the snapshot. At the shard level you’ll find two kinds of files: renamed Lucene index files and the shard snapshot file: indices/{index_name}/{shard_id}/snapshot-{snapshot_name} . This file contains information about which of the files in the shard directory are used in the snapshot and a mapping from the logical file names in the snapshot to the concrete filenames they should be stored as on-disk when being restored. It also contains the checksum, Lucene versioning and size information for all relevant files that can be used to detect and prevent data corruption. You might wonder why these files have been renamed instead of just keeping their original file names, which potentially would have been easier to work with directly on disk. The reason is simple: it’s possible to snapshot an index, delete and re-create it before snapshotting it again. In this case, several files would end up having the same names, but different contents.