Android Studio Project Marble: Lint Performance

Details on recent Lint performance fixes and a tool we made to pinpoint bottlenecks.

This is the fourth in a series of blog posts by the Android Studio team diving into some of the details behind the scenes of Project Marble. Beginning with the release of Android Studio 3.3, Project Marble is a multi-release, focused effort on making fundamental features of the IDE rock-solid and polished. In case you missed it, we posted the following blogs on Project Marble so far:

This blog post was written by Matthew Gharrity, an engineer who works on Android Lint.

Android Lint is our static analysis framework for finding potential code issues. Many of the warnings you see in the IDE editor come from Lint, and developers often set up Lint to run on a continuous integration server in order to enforce those warnings on each new code change. Lint has now grown to hundreds of detectors each looking for a different set of potential issues.

We received strong feedback, though, that Lint could be slow — especially when analyzing large codebases on a CI server. So, in the spirit of Project Marble, we did an investigation into Lint performance and how it might be improved. In this blog post we’ll explain the technical details of how we fixed some of the top performance issues — achieving a roughly 2x speedup in Studio 3.3 — and we’ll even provide an open source tool that you can use to pinpoint Lint performance bottlenecks in your own builds.

The low hanging fruit

First, some of the performance fixes for Studio 3.3 were the result of ordinary digging through CPU profiles and heap dumps. We’d like to thank César Puerta at Twitter for providing a reproducible instance where Lint experienced extreme slowdowns, and for letting us dig around with a profiler onsite. That collaboration resulted in a few critical fixes:

We fixed a memory leak in one of our Lint checks, where an anonymous inner class accidentally captured a reference to some Lint data structures.

We fixed a memory leak when running Lint from Gradle, in which the custom class loader that is used to run Lint was retained through thread local variables and JNI global references.

We fixed excessive class loading done by Lint across multiple runs in the same Gradle daemon.

These fixes alone gave us a 3x speedup on our internal Lint performance benchmark test, and somewhere around a 2x speedup on Twitter’s codebase. Memory leak regression tests were added to prevent similar bugs in the future.

A tool for pinpointing bottlenecks

Next, we started to look for bottlenecks in individual Lint detectors. There were a few challenges.

There is no per-detector performance attribution in Lint — and adding this could be tricky. For example, Lint multiplexes between hundreds of different detectors in a single pass through each source file. Wrapping all of the many possible call paths into detector code could be error-prone and would pollute our source code.

Using a conventional profiler on Lint works just fine, but digging through the results is time consuming and hard to automate. Engineering hours are limited, so in practice this limits how many sample projects we can test with. Yet testing on a large variety of sample projects is critical because some performance bottlenecks only manifest on particular project topologies.

Even when there is time to dig through profiler results, there can be overhead issues with CPU tracing, and even safe point bias concerns with CPU sampling.

Developers can write their own custom Lint checks, so any manual work that goes into finding bottlenecks in our internal checks would not be helpful in finding bottlenecks in third party checks.

With those considerations in mind we created a tool to help pinpoint performance issues in individual Lint checks automatically. The tool relies on Java byte code instrumentation to inject code before and after methods of interest. The idea is simple: we use a regular expression to find all methods associated with a Lint detector, instrument just those methods to collect timing information, and then attribute the timing information back to the associated Lint detector. The results can then be printed to the console.

To instrument Java byte code we used YourKit probes, though there are likely other instrumentation agents that could be used just as well.

Applying the instrumentation is as simple as adding a JVM flag to Gradle, so we were able to use the tool on a large number of projects quite rapidly. The results for most projects looked normal. However, one notable exception was an open source stress test project called android-studio-gradle-test. Here were the raw results:

Number of probe hits: 4720354

Total time in LintDriver.analyze(): 221482ms

Total time inside detectors: 177446ms InvalidPackageDetector 176853ms

MergeMarkerDetector 363ms

GradleDetector 38ms

PrivateKeyDetector 27ms

TrustAllX509TrustManagerDetector 25ms

CordovaVersionDetector 18ms

UnusedResourceDetector 13ms

ManifestDetector 12ms

MissingClassDetector 10ms

[...]

In this project, InvalidPackageDetector was taking the vast majority of Lint analysis time! For context, this detector checks for calls to Java language APIs which are not supported on Android. After further investigation we found that in some cases InvalidPackageDetector was scanning through jar files multiple times and could become a bottleneck on projects with many modules and binary dependencies. The fix was simple, and for this project resulted in a 4x speedup. On Twitter’s codebase the fix resulted in a more modest 16% speedup.

That bottleneck would have been visible in a conventional profiler, too. However, without the help of an automated tool, we may never have had the time to investigate enough projects to find one that made the bottleneck obvious.

Profiling memory allocations to find redundant computation

Performance debugging is all about collecting data and looking for surprising results. In our case, the data collected was timing information for each detector, and the surprise was that some detectors took more time than expected. However, another good metric to track is memory allocations. If a detector allocates more memory than expected, it could be a sign that the detector is doing redundant computation — even if the detector happens to run relatively quickly.

So, we combined our YourKit probe with memory allocation instrumentation in order to attribute memory allocations to individual Lint detectors. Here are the raw results of doing this on the same test project from above.

Total allocations within detectors: 262 MB

MergeMarkerDetector 240 MB

PrivateKeyDetector 7 MB

GradleDetector 6 MB

AndroidTvDetector 2 MB

[...]

The results immediately made MergeMarkerDetector suspect. For context, this detector looks for git-style merge markers such as <<<<<< accidentally left behind in the source code. After further investigation we found that MergeMarkerDetector was occasionally looking at non-source files too, which for some projects could include arbitrarily large binary files. The fix was simple, and the best part is that we hadn’t needed to find a project for which this bug became a noticeable performance problem — the allocation information was sufficient to get us on the right trail.

Details on the YourKit probe

The YourKit probe we wrote is open source on GitHub; feel free to play around with it and add coverage for your own custom Lint checks! The README explains how to point to your local YourKit installation, how to generate the JVM arguments needed to instrument a Lint invocation, and how to interpret the results printed out by the tool.

Note that there are some limitations of the probe in finding Lint performance issues:

The probe currently depends on the Java byte code instrumentation agent bundled with YourKit, which is not a free profiler (although there is a free trial version available). In principle the tool could be adapted to use an alternative instrumentation agent.

There is a fairly large amount of upfront computation such as parsing and type attribution which cannot be attributed to any individual Lint detector. If there is a performance issue in this precomputation phase, the probe will not be useful in finding it.

Caching effects may distort the performance numbers for individual detectors. For example, the first Lint check to run might get the blame for the initial cache misses that occur when resolving calls for the first time.

Wrapping up

The Lint performance improvements described above have already landed in Studio 3.3, and we’ll continue to monitor Lint performance to catch regressions. If you run into major performance issues when running Lint on your own project, first check out our Lint performance tips. If that doesn’t help, please feel free to file a bug, and consider attaching the results of running Lint with our YourKit probe. If you have general questions or suggestions for Lint, we also have the lint-dev mailing list.