USENIX LISA 2012: Performance Analysis Methodology

At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I’ve also posted USE Method-derived checklists for Solaris- and Linux-based systems.

The video of the talk is on the LISA site, and the slides are below, also available as a PDF.

I’ve summarized the methodologies in the talk below.

Methodology Summaries

Blame-Someone-Else Anti-Method:

Find a system or environment component you are not responsible for Hypothesize that the issue is with that component Redirect the issue to the responsible team When proven wrong, go to 1

Streetlight Anti-Method:

Pick observability tools that are

familiar

found on the Internet

found at random

Run tools

Look for obvious issues

Ad Hoc Checklist Method:

..N. Run A, if B, do C

Problem Statement Method:

What makes you think there is a performance problem? Has this system ever performed well? What has changed recently? (Software? Hardware? Load?) Can the performance degradation be expressed in terms of latency or run time? Does the problem affect other people or applications (or is it just you)? What is the environment? What software and hardware is used? Versions? Configuration?

Scientific Method:

Question Hypothesis Prediction Test Analysis

Workload Characterization Method:

Who is causing the load? PID, UID, IP addr, … Why is the load called? code path What is the load? IOPS, tput, type How is the load changing over time?

Drill-Down Analysis Method:

Start at highest level Examine next-level details Pick most interesting breakdown If problem unsolved, go to 2

Latency Analysis Method:

Measure operation time (latency) Divide into logical synchronous components Continue division until latency origin is identified Quantify: estimate speedup if problem fixed

USE Method:

For every resource, check:

Utilization Saturation Errors

Stack Profile Method: