It has been 6 months since the last Stockwell update. With new priorities for many months and reducing our efforts on Stockwell, it was overlooked by me to send updates. While we have been spending a reasonable amount of time hacking on Stockwell, it has been a less transparent.

I want to cover where we were a year ago, and where we are today.

1 year ago today I posted on my blog about defining intermittent. We were just starting to focus on learning about failures. We collected data, read bugs, interviewed many influential people across Mozilla and came up with a plan which we presented Stockwell at the Hawaii all hands. Our plan was to do a few things:

Triage all failures >=30 instances/week

Build tools to make triage easier and collect more data

Adjust policy for triaging, disabling, and managing intermittents

Make our tests better with linting and test-verification

Invest time into auto-classification

Define test ownership and triage models that are scalable

While we haven’t focused 100% on intermittent failures in the last 52 weeks, we did about half the time, and have achieved a few things:

Triaged all failures >= 30 instances/week (most weeks, never more than 3 weeks off)

Many improvements to our tools, including: adjusteing orange factor robot, intermittent-bug-filer, and added |mach test-info|

Played with policy on/off, have settled on needinfo “owner” when 30+ failures/week, and disabling if 200 failures in 30 days.

Added eslint to our tests, pylint for our tools, and the new TV job is tier-2.

added source file -> bugzilla components in-tree to define ownership.

31 bugzilla components triage their own intermittents

While that is a lot of changes, it is incremental yet effective. We started with an Orange Factor of 24+, and often we see <12 (although last week it is closer to 14). While doing that we have added many tests, almost doubling our test load and the Orange Factor has remained low. We still don’t think that is success, we often have 50+ bugs in a state of “needswork”, and it would be more ideal to have <20 in progress at any one time. We are still ignoring half the problem, all the other failures that do not cross our threshold of 30 failures/week.

Some statistics about bugs over the last 9 months (Since January 1st):

Category # Bugs Fixed 511 Disabled 262 Infra 62 Needswork 49 Unknown 209 Total 1093

As you can see that is a lot of disabled tests. Note, we usually only disable a test on a subset of the configurations, not 100% across the board. Another NOTE: unknown bugs are ones that were failing frequently and for some undocumented reason have reduced in frequency.

One other interesting piece of data is many of the fixed bugs we have tried to associate with a root cause, we have done this for 265 bugs and 90 of them are actual product fixes 🙂 The rest are harness, tooling, infra, or more commonly test case fixes.

I will be doing some followup posts on details of the changes we have made over the year including:

Triage process for component owners and others who want to participate

Test verification and the future

Workflow of an intermittent, from first failure to resolution

Future of Orange Factor and Autoclassification

Vision for the future in 6 months

Please note that the 511 bugs that were fixed were done by the many great developers we have at Mozilla. These were often randomized requests in a very busy schedule, so if you are reading this and you fixed an intermittent, thank you!