Our lint rules use LibCST’s syntax tree to match patterns in code. As a high level representation, the syntax tree is easy to inspect and removes the problem of dealing with a non-regular language.

Let’s say that you’ve got a circular dependency in your module due to a type-only import. Python lets you solve this by putting type-only imports behind an if TYPE_CHECKING: guard to avoid actually importing anything at runtime.

# value imports

from typing import TYPE_CHECKING

from util import helper_fn



# type-only imports

if TYPE_CHECKING:

from circular_dependency import CircularType

Later, someone adds another type-only dependency to the code and guards it. However, they might fail to realize that there was already a type guard in the module.

# value imports

from typing import TYPE_CHECKING

from util import helper_fn



# type-only imports

if TYPE_CHECKING:

from circular_dependency import CircularType



if TYPE_CHECKING: # Whoops! Duplicate type guard!

from other_package import OtherType

We can prevent this redundancy with a lint rule!

We’ll start by initializing a counter of type checking blocks we’ve encountered.

class OnlyOneTypeCheckingIfBlockLintRule(CstLintRule):

def __init__(self, context: Context) -> None:

super().__init__(context)

self.__type_checking_blocks = 0

Then, when we encounter a type-checking conditional, we’ll increment the counter, and verify that there is no more than one type checking block. If this happens, we’ll generate a warning at that location by calling our report helper.

def visit_If(self, node: cst.If) -> None:

if node.test.value == "TYPE_CHECKING":

self.__type_checking_blocks += 1

if self.__type_checking_blocks > 1:

self.context.report(

node,

"More than one 'if TYPE_CHECKING' section!"

)

These lint rules work by traversing LibCST’s tree, and collecting information. In our linter, this is done using the visitor pattern. As you may have noticed, rules override visit and leave methods associated with a node’s types. Those visitors get called in a specific order.

class MyNewLintRule(CstLintRule):

def visit_Assign(self, node):

... # called first



def visit_Name(self, node):

... # called for each child



def leave_Assign(self, name):

... # called after all children

Visit methods are called before any of the node’s children are visited. Leave methods are called after all of the node’s children are visited.

Our philosophy at Instagram is to “do the simple thing first”. Our first custom lint rules were implemented in a single file, with a single visitor, using shared state.

A single file, with a single visitor, using shared state.

The single visitor class had to be aware of the state and logic of all of our unrelated lint rules, and it wasn’t always clear what state corresponded with what rule. This works fine when you’ve got a few custom lint rules, but we have nearly a hundred custom lint rules, which made the single-visitor pattern unmaintainable.

It’s difficult to figure out what state and logic is associated with each check being performed.

Of course, one possible solution to this problem is to define multiple visitors and to have each visitor re-traverse the entire tree each time. However, this would incur a large performance penalty, and the linter must remain fast.

Each lint rule can re-traverse the tree, executing in sequence over the file. However, this frequent re-traversal would incur a large performance penalty.

Instead, we took inspiration from linters in other language ecosystems, like JavaScript’s ESLint, and developed a centralized visitor registry.

A centralized visitor registry. We can efficiently determine which nodes each lint rule cares about, saving time on nodes they don’t care about.

When a lint rule is initialized, all of the rule’s method overrides are stored in the registry. When we traverse the tree, we look up all of the registered visitors and call them. If a method is not implemented, we don’t need to call it.

This reduces the computation cost of every new lint rule. Though we usually run the linter over the small number of recently changed files, we can run all of our new lint rules in parallel over Instagram’s whole server codebase in just 26 seconds.

Once we had a performant framework in place, we built a testing framework that works to enforce best practices by requiring tests for both false-positives, and false-negatives.

class MyCustomLintRuleTest(CstLintRuleTest):

RULE = MyCustomLintRule



VALID = [

Valid("good_function('this should not generate a report')"),

Valid("foo.bad_function('nor should this')"),

]



INVALID = [

Invalid("bad_function('but this should')", "IG00"),

]

Lint Fatigue

With nearly a hundred custom rules, pedantic lints can quickly turn into a waste of time for developers. Spending time fixing style nits and deprecated coding patterns gets in the way of more important progress.

We’ve found that with too many nags engineers start to ignore all lints, even the important ones. At a certain point, it doesn’t matter what good advice we present, it just gets ignored.

Let’s say we needed to deprecate a function named ‘fn’ for a better named function called called ‘add’. Unless developers are made aware of the fact that ‘fn’ is deprecated, they won’t know not to use it. Even worse, they won’t know what to use instead. So, we can create a lint. But any sufficiently large codebase is bound to have plenty of other lints already. Chances are that this important lint will get lost in the noise.

With too many nitpicks, the signal can get lost in the noise.

So, what can we do about it?

We can automatically fix many issues found by lint. Much like lint itself is documentation on demand, auto-fixers can provide fixes on-demand. Given the sheer number of developers at Instagram, it’s not feasible to teach each developer all of our best practices. Adding auto-fixers allows us to educate and on-board developers to new best practices on-demand. Auto-fixers also allow us to preserve developer focus by removing monotonous changes. Essentially, auto-fixers are more actionable and educational than simple lint warnings.

So, how do you build an auto-fixer? Syntax tree based linting gives us the offending node. There’s no need to duplicate discovery logic since the lint rule itself already exists! Since we know the node we want to replace, and we know its location in source, we can replace it with the updated function name safely! This works great for fixing individual lint violations as they are introduced, but when we introduce a new lint rule we may have hundreds of existing violations of it. Can we proactively resolve all the existing cases at once?

Codemods

A codemod is just a scriptable way to find issues and make changes to source code. Think of a codemod as a refactor on steroids: It can be as simple as renaming a variable in a function or as complex as rewriting a function to take a new argument. It uses the exact same concept as a lint, but instead of alerting the developer, it can take action automatically.

So, why would you write a codemod instead of a lint? In this example, we want to deprecate the use of get_global . We could use a lint, but the fix time is unbounded and spread across multiple developers. Even with an auto fixer in place, it can be some time before all code is upgraded.

We want to deprecate the use of get_global using instance variables instead.

To fix this, in addition to linting against get_global , we can also write a codemod! At Instagram, we believe that deprecated patterns and APIs left to slowly wither away take focus away from developers and reduce code readability. We’d rather proactively remove deprecated code than let it disappear over time. Given the sheer size of the code and number of active developers, this often means automating deprecations. If we can remove deprecated patterns from our code quickly, we keep all of Instagram productive.

Okay, so how do you actually codemod? How do you replace just the text you care about, while preserving comments, spacing, and everything else? You can use a concrete syntax tree like LibCST to surgically modify code while preserving comments and spacing. So, if I wanted to rename “fn” to “add” in the below tree, I would update the Name node to have the value “add” instead of “fn” and then write the tree back to disk!