Amazon RDS Performance Insights brings an intuitive tuning interface to Amazon RDS to help you discover and investigate performance issues on your RDS databases. The look and feel of Performance Insights is the same across all database engine types, such as RDS for MySQL, RDS for PostgreSQL, and Amazon Aurora. However, every engine has a slightly different implementation.

Performance Insights can always display the database load and the top SQL regardless of engine. Every engine also supports Performance Insights features like extended data retention and integration with Amazon CloudWatch. However, depending on the database engine type, Performance Insights displays slightly different information based on the engine’s native performance instrumentation.

For example, in Aurora with PostgreSQL compatibility, you find database load subdivided by PostgreSQL 10 wait events. For Aurora with MySQL compatibility and RDS MySQL, you see the load subdivided by MySQL Performance Schema wait events. By consuming each engine type’s native wait event instrumentation for wait events, Performance Insights is can keep a similar look and feel across engine types while remaining faithful to each engine’s native performance instrumentation.

We released Performance Insights for RDS MySQL on August 28, 2018, for version 5.7.22 and above, and all DB instance classes except db.t2. Because the native performance instrumentation for Aurora MySQL and RDS MySQL is provided by the MySQL Performance Schema, Performance Insights works best when the MySQL Performance Schema is enabled.

You might not have enabled the MySQL Performance Schema by using the RDS parameter group for a DB instance. In this case, enabling Performance Insights automatically enables and configures the MySQL Performance Schema for you. For more information on using the MySQL Performance Schema with Performance Insights, see Enabling Performance Insights in the Amazon RDS User Guide.

Database load

The central metric of Performance Insights is called database load. DB load condenses the overall performance health of a database system into a single metric that reflects the level of stress on a system. For a detailed description of how we compute DB load, read Using Amazon RDS Performance Insights in the Amazon RDS User Guide.

The DB load graph that forms the top portion of Performance Insights can show the load subdivided by any one of four dimensions: waits, SQL, hosts, and users. The most popular dimension for the graph is waits, which shows you what kind of activities in the database engine are giving rise to the load. The thicker color bands in the load chart indicate the wait types that contribute the most.

To select a timeframe, select one of the trailing timeframes at the upper-right of the chart, or click and drag to zoom in to any part of the graph. Because the MySQL wait event names are not particularly descriptive, we document the most common ones in the Amazon Aurora MySQL Reference. Although this documentation is for Aurora MySQL, the descriptions of these events also apply to RDS MySQL. For those wait events not covered by AWS documentation, see Performance Schema Instrument Naming Conventions in the MySQL documentation.

As an illustration, following is a trailing one-hour DB load graph from Performance Insights for a sysbench OLTP workload running on RDS MySQL. The workload is a steady-state read-heavy workload (90:10 read:write). Thus, it’s fairly typical for an OLTP application and illustrates what normal workloads look like in Performance Insights.

From the graph, you can see that the dominant contributor to DB load is the wait event io/table/sql/handler . The AWS documentation states that this is a table I/O wait event. Observing this workload, you can see that normal healthy workloads produce load by consuming basic host resources like CPU and I/O. No bottleneck is standing in the way of these resources, and the system can do its work.

What if you don’t use the MySQL Performance Schema?

If you intentionally disable the MySQL Performance Schema, then Performance Insights still runs, but in a degraded mode. You can still see the DB load and the top SQL. However, instead of displaying MySQL Performance Schema wait events, Performance Insights displays MySQL process states. These are the values you normally find in the state column of the show processlist command.

Top 10 SQL statement digests

The bottom portion of the Performance Insights page shows a list of the top 10 SQL statement digests by DB load for the time period displayed in the load graph. This list helps identify the SQL statements that have the most impact on your system, so you know how to focus your tuning efforts. Using Performance Insights, you can also select and display the top 10 of any of the other supported dimensions (waits, hosts, or users). However, the most popular dimension for the top-10 list is SQL.

MySQL statement digests are simplified versions of the SQL running in the database, with the constantly changing literal values replaced with a question mark. Digests allow Performance Insights to aggregate the impact of hundreds or thousands nearly identical SQL statements. Digests can show these statements’ true impact as a single logical SQL statement in the top 10 list.

Diagnosing problems

The DB load metric makes it obvious when performance conditions on a database are degraded. Consider the following load graph.

The graph is showing a steady-state workload that is interrupted by a period of elevated load in the io/table/sql/handler wait event. As shown before, this event is an I/O event. During this time, between four and eight database connections were waiting, mainly on I/O. The question is why. The first step in the answer is to click and drag on the graph to zoom in on the period in question.

Having zoomed in, you can see the SQL statements that were most heavily involved in the period of elevated I/O waits.

During the problem period, the top SQL is an update on a large table with no where clause. Such operations are likely to use large amounts of I/O, resulting in contention for the regular workload. Knowing that this SQL is the culprit allows us to begin investigating who is responsible. To do this, click on the Slice by drop-down menu, choose Users.

With the DB load subdivided by users, you can immediately see that one specific user was responsible for running the offending query. The user only shows up during the load spike. In the top SQL list, you can also see that the offending query is being run exclusively by that user.

Conclusion

Performance Insights brings to RDS MySQL an approach to visualizing and subdividing load that is common in commercial database engines like Oracle, but less common in open-source databases like MySQL and PostgreSQL. Although this approach might be new to users of open-source engines, it provides a simple and intuitive way to discover the sources of performance regressions.

Performance Insights has a long list of enhancements that the team is eager to release in the coming months and years. If you want to provide feedback, ask questions, or request enhancements, email us.

About the Authors

Jeremiah Wilton is a principal database engineer with the Relational Database Services (RDS) team at Amazon Web Services.

Szymon Komendera is a database engineer with the Relational Database Services (RDS) team at Amazon Web Services.