At GitHub, we use Elasticsearch as the main technology backing our search services. In order to administer our clusters, we use ChatOps via Hubot. As of 2017, those commands were a collection of Bash and Ruby-based scripts.

Although this served our needs for a time, it was becoming increasingly apparent that these scripts lacked composability and reusability. It was also difficult to contribute back to the community by open sourcing any of these scripts due to the fact they are specific to bespoke GitHub infrastructure.

Why build something new?

There are plenty of excellent Elasticsearch libraries, both official and community driven. For Ruby, GitHub has already released the Elastomer library and for Go we make use of the Elastic library by user olivere. However, these libraries focus primarily on indexing and querying data. This is exactly what an application needs to use Elasticsearch, but it’s not the same set of tools that operators of an Elasticsearch cluster need. We wanted a high-level API that corresponded to the common operations we took on a cluster, such as disabling allocation or draining the shards from a node. Our goal was a library that focused on these administrative operations and that our existing tooling could easily use.

Full speed ahead with Go…

We started looking into Go and were inspired by GitHub’s success with freno and orchestrator .

Go’s structure encourages the construction of composable (self-contained, stateless, components that can be selected and assembled) software, and we saw it as a good fit for this application.

… Into a wall

We initially scoped the project out to be a packaged chat app and planned to open source only what we were using internally. During implementation, however, we ran into a few problems:

GitHub uses a simple protocol based on JSON-RPC over HTTPS called ChatOps RPC. However, ChatOps RPC is not widely adopted outside of GitHub. This would make integration of our application into ChatOps infrastructure difficult for most parties.

The internal REST library our ChatOps commands relied on was not open sourced. Some of the dependencies of this REST library would also need to be open sourced. We’ve started the process of open sourcing this library and its dependencies, but it will take some time.

We relied on Consul for service discovery, which not everyone uses.

Based on these factors we decided to break out the core of our library into a separate package that we could open source. This would decouple the package from our internal libraries, Consul, and ChatOps RPC.

The package would only have a few goals:

Access the REST endpoints on a single host.

Perform an action.

Provide results of the action.

This module could then be open sourced without being tied to our internal infrastructure, so that anyone could use it with the ChatOps infrastructure, service discovery, or tooling they choose.

To that end, we wrote vulcanizer.

Vulcanizer

Vulcanizer is a Go library for interacting with an Elasticsearch cluster. It is not meant to be a full-fledged Elasticsearch client. Its goal is to provide a high-level API to help with common tasks that are associated with operating an Elasticsearch cluster such as querying health status of the cluster, migrating data off of nodes, updating cluster settings, and more.

Examples of the Go API

Elasticsearch is great in that almost all things you’d want to accomplish can be done via its HTTP interface, but you don’t want to write JSON by hand, especially during an incident. Below are a few examples of how we use Vulcanizer for common tasks and the equivalent curl commands. The Go examples are simplified and don’t show error handling.

Getting nodes of a cluster

You’ll often want to list the nodes in your cluster to pick out a specific node or to see how many nodes of each type you have in the cluster.

$ curl localhost:9200/_cat/nodes?h=master,role,name,ip,id,jdk - mdi vulcanizer-node-123 172.0.0.1 xGIs 1.8.0_191 * mdi vulcanizer-node-456 172.0.0.2 RCVG 1.8.0_191

Vulcanizer exposes typed structs for these types of objects.

v := vulcanizer.NewClient("localhost", 9200) nodes, err := v.GetNodes() fmt.Printf("Node information: %#v

", nodes[0]) // Node information: vulcanizer.Node{Name:"vulcanizer-node-123", Ip:"172.0.0.1", Id:"xGIs", Role:"mdi", Master:"-", Jdk:"1.8.0_191"}

The index recovery speed is a common setting to update when you want balance time to recovery and I/O pressure across your cluster. The curl version has a lot of JSON to write.

$ curl -XPUT localhost:9200/_cluster/settings -d '{ "transient": { "indices.recovery.max_bytes_per_sec": "1000mb" } }' { "acknowledged": true, "persistent": {}, "transient": { "indices": { "recovery": { "max_bytes_per_sec": "1000mb" } } } }

The Vulcanizer API is fairly simple and will also retrieve and return any existing setting for that key so that you can record the previous value.

v := vulcanizer.NewClient("localhost", 9200) oldSetting, newSetting, err := v.SetSetting("indices.recovery.max_bytes_per_sec", "1000mb") // "50mb", "1000mb", nil

Move shards on to and off of a node

To safely update a node, you can set allocation rules so that data is migrated off a specific node. In the Elasticsearch settings, this is a comma-separated list of node names, so you’ll need to be careful not to overwrite an existing value when updating it.

$ curl -XPUT localhost:9200/_cluster/settings -d ' { "transient" : { "cluster.routing.allocation.exclude._name" : "vulcanizer-node-123,vulcanizer-node-456" } }'

The Vulcanizer API will safely add or remove nodes from the exclude settings so that shards won’t allocate on to a node unexpectedly.

v := vulcanizer.NewClient("localhost", 9200) // Existing exclusion settings: // vulcanizer-node-123,vulcanizer-node-456 exclusionSetttings1, err := v.DrainServer("vulcanizer-node-789") // vulcanizer-node-123,vulcanizer-node-456,vulcanizer-node-789 exclusionSetttings2, err := v.FillOneServer("vulcanizer-node-456") // vulcanizer-node-123,vulcanizer-node-789

Command-line application

Included is a small CLI application that leverages the library:

$ vulcanizer -h Usage: vulcanizer [command] Available Commands: allocation Set shard allocation on the cluster. drain Drain a server or see what servers are draining. fill Fill servers with data, removing shard allocation exclusion rules. health Display the health of the cluster. help Help about any command indices Display the indices of the cluster. nodes Display the nodes of the cluster. setting Interact with cluster settings. settings Display all the settings of the cluster. snapshot Interact with a specific snapshot. snapshots Display the snapshots of the cluster. Flags: -c, --cluster string Cluster to connect to defined in config file -f, --configFile string Configuration file to read in (default to "~/.vulcanizer.yaml") -h, --help help for vulcanizer --host string Host to connect to (default "localhost") -p, --port int Port to connect to (default 9200) Use "vulcanizer [command] --help" for more information about a command.

ChatOps

ChatOps is important for GitHub and our geographically distributed workforce. Vulcanizer enables us to build ChatOps tooling around Elasticsearch quickly and easily for common tasks:



jessbreckenridge .vulcan-go health testbed

Hubot The cluster is green: All primary and replica shards are allocated. This does NOT mean the cluster is otherwise healthy. +--------+------------+------+------------+----------+ | STATUS | RELOCATING | INIT | UNASSIGNED | ACTIVE % | +--------+------------+------+------------+----------+ | green | 0 | 0 | 0 | 100 | +--------+------------+------+------------+----------+

jessbreckenridge .vulcan-go nodes testbed

Hubot testbed nodes +--------+------+------------------------+-----------+-----------+ | MASTER | ROLE | NAME | JDK | IP | +--------+------+------------------------+-----------+-----------+ | * | mi | search-arbiter-aaaaaaa | 1.8.0_111 | 172.0.0.1 | | - | mi | search-arbiter-bbbbbbb | 1.8.0_111 | 172.0.0.2 | | - | mi | search-arbiter-ccccccc | 1.8.0_111 | 172.0.0.3 | | - | di | search-storage-ddddddd | 1.8.0_111 | 172.0.0.4 | | - | di | search-storage-eeeeeee | 1.8.0_111 | 172.0.0.5 | | - | di | search-storage-eeeeeee | 1.8.0_111 | 172.0.0.6 | +--------+------+------------------------+-----------+-----------+

nickcanz .vulcan-go drain status testbed

Hubot IP's filtered: None Name Excluded: None Host Excluded: None

nickcanz .vulcan-go drain server search-storage-zzzzzzz testbed

Hubot Server "search-storage-zzzzzzz" added to list of servers to drain. Complete list servers draining: Ips: None Names: search-storage-zzzzzzz Hosts: None

nickcanz .vulcan-go settings testbed

Hubot testbed settings No persistent settings are set. +------------------------------------------+------------------------+ | TRANSIENT SETTINGS | VALUE | +------------------------------------------+------------------------+ | cluster.routing.allocation.exclude._name | search-storage-zzzzzzz | +------------------------------------------+------------------------+

Closing

We stumbled a bit when we first started down this path, but the end result is best for everyone:

Since we had to regroup about what exact functionality we wanted to open source, we made sure we were providing value to ourselves and the community instead of just shipping something.

Internal tooling doesn’t always follow engineering best practices like proper release management, so developing Vulcanizer in the open provides an external pressure to make sure we follow all of the best practices.

Having all of the Elasticsearch functionality in its own library allows our internal applications to be very slim and isolated. Our different internal applications have a clear dependency on Vulcanizer instead of having different internal applications depend on each other or worse, trying to get ChatOps to talk to other ChatOps.

Visit the Vulcanizer repository to clone or contribute to the project. We have ideas for future development in the Vulcanizer roadmap.

Authors