The Netflix member experience is offered to 83+ million global members, and delivered using thousands of microservices. These services are owned by multiple teams, each having their own build and release lifecycles, generating a variety of data that is stored in different types of data store systems. The Cloud Database Engineering (CDE) team manages those data store systems, so we run benchmarks to validate updates to these systems, perform capacity planning, and test our cloud instances with multiple workloads and under different failure scenarios. We were also interested in a tool that could evaluate and compare new data store systems as they appear in the market or in the open source domain, determine their performance characteristics and limitations, and gauge whether they could be used in production for relevant use cases. For these purposes, we wrote Netflix Data Benchmark (NDBench), a pluggable cloud-enabled benchmarking tool that can be used across any data store system. NDBench provides plugin support for the major data store systems that we use — Cassandra (Thrift and CQL), Dynomite (Redis), and Elasticsearch. It can also be extended to other client APIs.

Introduction

As Netflix runs thousands of microservices, we are not always aware of the traffic that bundled microservices may generate on our backend systems. Understanding the performance implications of new microservices on our backend systems was also a difficult task. We needed a framework that could assist us in determining the behavior of our data store systems under various workloads, maintenance operations and instance types. We wanted to be mindful of provisioning our clusters, scaling them either horizontally (by adding nodes) or vertically (by upgrading the instance types), and operating under different workloads and conditions, such as node failures, network partitions, etc.

As new data store systems appear in the market, they tend to report performance numbers for the “sweet spot”, and are usually based on optimized hardware and benchmark configurations. Being a cloud-native database team, we want to make sure that our systems can provide high availability under multiple failure scenarios, and that we are utilizing our instance resources optimally. There are many other factors that affect the performance of a database deployed in the cloud, such as instance types, workload patterns, and types of deployments (island vs global). NDBench aids in simulating the performance benchmark by mimicking several production use cases.

There were also some additional requirements; for example, as we upgrade our data store systems (such as Cassandra upgrades) we wanted to test the systems prior to deploying them in production. For systems that we develop in-house, such as Dynomite, we wanted to automate the functional test pipelines, understand the performance of Dynomite under various conditions, and under different storage engines. Hence, we wanted a workload generator that could be integrated into our pipelines prior to promoting an AWS AMI to a production-ready AMI.

We looked into various benchmark tools as well as REST-based performance tools. While some tools covered a subset of our requirements, we were interested in a tool that could achieve the following:

Dynamically change the benchmark configurations while the test is running, hence perform tests along with our production microservices.

Be able to integrate with platform cloud services such as dynamic configurations, discovery, metrics, etc.

Run for an infinite duration in order to introduce failure scenarios and test long running maintenances such as database repairs.

Provide pluggable patterns and loads.

Support different client APIs.

Deploy, manage and monitor multiple instances from a single entry point.

For these reasons, we created Netflix Data Benchmark (NDBench). We incorporated NDBench into the Netflix Open Source ecosystem by integrating it with components such as Archaius for configuration, Spectator for metrics, and Eureka for discovery service. However, we designed NDBench so that these libraries are injected, allowing the tool to be ported to other cloud environments, run locally, and at the same time satisfy our Netflix OSS ecosystem users.

NDBench Architecture

The following diagram shows the architecture of NDBench. The framework consists of three components:

Core : The workload generator

: The workload generator API : Allowing multiple plugins to be developed against NDBench

: Allowing multiple plugins to be developed against NDBench Web: The UI and the servlet context listener

We currently provide the following client plugins — Datastax Java Driver (CQL), C* Astyanax (Thrift), Elasticsearch API, and Dyno (Jedis support). Additional plugins can be added, or a user can use dynamic scripts in Groovy to add new workloads. Each driver is just an implementation of the Driver plugin interface.

NDBench-core is the core component of NDBench, where one can further tune workload settings.

Fig. 1: NDBench Architecture

NDBench can be used from either the command line (using REST calls), or from a web-based user interface (UI).

NDBench Runner UI