Update (May 2019) – Since this post was published, we have increased the maximum number of nodes per cluster from 15 to 250 (read Amazon ElastiCache for Redis Now Supports Up To 250 Nodes Per Cluster to learn more). We also updated this post to indicate that online resizing works with Redic engine version 3.2.10 and newer.

Amazon ElastiCache makes it easy to for you to set up a fast, in-memory data store and cache. With support for the two most popular open source offerings (Redis and Memcached), ElastiCache supports the demanding needs of game leaderboards, in-memory analytics, and large-scale messaging.

Today I would like to tell you about an important addition to Amazon ElastiCache for Redis. You can already create clusters with up to 15 shards, each responsible for storing keys and values for a specific set of slots (each cluster has exactly 16,384 slots). A single cluster can expand to store 3.55 terabytes of in-memory data while supporting up to 20 million reads and 4.5 million writes per second.

Now with Online Resizing

You can now adjust the number of shards in a running ElastiCache for Redis cluster while the cluster remains online and responding to requests. This gives you the power to respond to changes in traffic and data volume without having to take the cluster offline or to start with an empty cache. You can also rebalance a running cluster to uniformly redistribute slot space without changing the number of shards.

When you initiate a resharding or rebalancing operation, ElastiCache for Redis starts by preparing a plan that will result in an even distribution of slots across the shards in the cluster. Then it transfers slots across shards, moving many in parallel for efficiency. This all happens while the cluster continues to respond to requests, with a modest impact on write throughput for writes to a slot that is in motion. The migration rate is dependent on the instance type, network speed, read/write traffic to the slots, and is generally about 1 gigabyte per minute.

The resharding and rebalancing operations apply to Redis clusters that were created with Cluster Mode enabled:

Resharding a Cluster

In general, you will know that it is time to expand a cluster via resharding when it starts to face significant memory pressure or when individual nodes are becoming bottlenecks. You can watch the cluster’s CloudWatch metrics to identify each situation:

Memory Pressure – FreeableMemory, SwapUsage, BytesUsedForCache.

CPU Bottleneck – CPUUtilization, CurrConnections, NewConnections.

Network Bottleneck – NetworkBytesIn, NetworkBytesOut.

You can use CloudWatch Dashboards to monitor these metrics, and CloudWatch Alarms to automate the resharding process.

To reshard a Redis cluster from the ElastiCache Dashboard, click on the cluster to visit the detail page, and then click on the Add shards button:

Enter the number of shards to add and (optionally) the desired Availability Zones, then click on Add:

The status of the cluster will change to modifying and the resharding process will begin. It can take anywhere from a few minutes to several hours, as indicated above. You can track the progress on the detail page for the cluster:

You can see the slots moving from shard to shard:

You can also watch the Events for the cluster:

During the resharding you should avoid the use of the KEYS and SMEMBERS commands, as well as compute-intensive Lua scripts in order to moderate the load on the cluster shards. You should avoid the FLUSHDB and FLUSHALL commands entirely; using them will interrupt and then abort the resharding process.

The status of each shard will return to available when the process is complete:

The same process takes place when you delete shards.

Rebalancing Slots

You can perform this operation by heading to the cluster’s detail page and clicking on Rebalance Slot Distribution:

Things to Know

Here are a couple of things to keep in mind about this new feature:

Engine Version – Your cluster must be running version 3.2.10 (or newer) of the Redis engine.

Migration Size – Slots that contain items that are larger than 256 megabytes after serialization are not migrated.

Cluster Endpoint – The cluster endpoint does not change as a result of a resharding or rebalancing.

Available Now

This feature is available now and you can start using it today.

— Jeff;