The idea of using machine learning to teach programs how to automatically write or modify code has always been tempting for computer scientists. The feat would not only greatly reduce engineering time and effort, but could also lead to the creation of novel and advanced intelligent agents. In a new paper, Google Brain researchers propose using neural networks to model human source code editing. Effectively this means treating code editing as a sequence and having a machine learn how to “write code” like in a natural language model — by analysing a short paragraph of editing, the model can extract intent and leverage that to generate subsequent edits.

To understand the intent behind developers’ source code editing actions, the main challenge was how to learn from earlier editing sequences in order to predict upcoming edits. Researchers explain the AI models needed to understand “the relationship of the change to the state” rather than “the content of the edits” or “the result of the edit.”

To develop a good representation that captures the desired intent information and scales the sequence length, Google Brain researchers proposed two editorial characterizations: explicit characterization and implicit characterization. Explicit characterization clarifies the result of each edit in an instantiated sequence, while implicit characterization instantiates complete initial state and subsequent editing in a more compact, diff-like characterization.

In implicit characterization, researchers created a simple sequence-to-sequence model and a two-headed attention-based model that has a pointer network head for generating editing positions and a content head for generating editing content. The models weigh different problem formulations and then provide a design for future editing sequence models.

Researchers used synthetic data to test models’ ability to learn editing patterns, then collected and trained the models on large datasets from Google Sources, which contain millions of fine-grained edits from thousands of Python developers.

From a modeling perspective, researchers concluded that the new combination of attention and pointer network components provides the best overall performance and scalability. The results also open up various potential AI-powered opportunities for future machine-human collaboration on coding.

Read the paper Neural Networks for Modeling Source Code Edits on arXiv.