Program synthesis is a powerful technique for automatically generating programs from high-level specifications, such as input-output examples. Due to its myriad use cases across a wide range of application domains (e.g., spreadsheet automation [1, 2, 3], data science [4, 5, 6], cryptography [7, 8], improving programming productivity [9, 10, 11]), program synthesis has received widespread attention from the research community in recent years.

Because program synthesis is, in essence, a very difficult search problem, many recent solutions prune the search space by utilizing program abstractions [4, 12, 13, 14, 15, 16]. For example, state-of-the-art synthesis tools, such as Blaze [14], Morpheus [4] and Scythe [16], symbolically execute (partial) programs over some abstract domain and reject those programs whose abstract behavior is inconsistent with the given specification. Because many programs share the same behavior in terms of their abstract semantics, the use of abstractions allows these synthesis tools to significantly reduce the search space.

Blaze synthesis framework [ 14 Open image in new window While the abstraction-guided synthesis paradigm has proven to be quite powerful, a down-side of such techniques is that they require a domain expert to manually come up with a suitable abstract domain and write abstract transformers for each DSL construct. For instance, thesynthesis framework [] expects a domain expert to manually specify a universe of predicate templates, together with sound abstract transformers for every DSL construct. Unfortunately, this process is not only time-consuming but also requires significant insight about the application domain as well as the internal workings of the synthesizer.

In this paper, we propose a novel technique for automatically learning domain-specific abstractions that are useful for instantiating an example-guided synthesis framework in a new domain. Given a DSL and a training set of synthesis problems (i.e., input-output examples), our method learns a useful abstract domain in the form of predicate templates and infers sound abstract transformers for each DSL construct. In addition to eliminating the significant manual effort required from a domain expert, the abstractions learned by our method often outperform manually-crafted ones in terms of their benefit to synthesizer performance.

The workflow of our approach, henceforth called Atlas1, is shown schematically in Fig. 1. Since Atlas is meant to be used as an off-line training step for a general-purpose programming-by-example (PBE) system, it takes as input a DSL as well as a set of synthesis problems \({\varvec{\mathcal {E}}}\) that can be used for training purposes. Given these inputs, our method enters a refinement loop where an Abstraction Learner component discovers a sequence of increasingly precise abstract domains \(\mathcal {A}_1, \cdot \cdot , \mathcal {A}_n\), and their corresponding abstract transformers \(\mathcal {T}_1, \cdot \cdot , \mathcal {T}_n\), in order to help the Abstraction-Guided Synthesizer (AGS) solve all training problems. While the AGS can reject many incorrect solutions using an abstract domain \(\mathcal {A}_i\), it might still return some incorrect solutions due to the insufficiency of \(\mathcal {A}_i\). Thus, whenever the AGS returns an incorrect solution to any training problem, the Abstraction Learner discovers a more precise abstract domain and automatically synthesizes the corresponding abstract transformers. Upon termination of the algorithm, the final abstract domain \(\mathcal {A}_n\) and transformers \(\mathcal {T}_n\) are sufficient for the AGS to correctly solve all training problems. Furthermore, because our method learns general abstractions in the form of predicate templates, the learned abstractions are expected to be useful for solving many other synthesis problems beyond those in the training set.

From a technical perspective, the Abstraction Learner uses two key ideas, namely tree interpolation and data-driven constraint solving, for learning useful abstract domains and transformers respectively. Specifically, given an incorrect program \(\mathcal {P}\) that cannot be refuted by the AGS using the current abstract domain \(\mathcal {A}_i\), the Abstraction Learner generates a tree interpolant \(\mathcal {I}_i\) that serves as a proof of \(\mathcal {P}\)’s incorrectness and constructs a new abstract domain \(\mathcal {A}_{i+1}\) by extracting templates from the predicates used in \(\mathcal {I}_i\). The Abstraction Learner also synthesizes the corresponding abstract transformers for \(\mathcal {A}_{i+1}\) by setting up a second-order constraint solving problem where the goal is to find the unknown relationship between symbolic constants used in the predicate templates. Our method solves this problem in a data-driven way by sampling input-output examples for DSL operators and ultimately reduces the transformer learning problem to solving a system of linear equations.

We have implemented these ideas in a tool called Atlas and evaluate it in the context of the Blaze program synthesis framework [14]. Our evaluation shows that the proposed technique eliminates the manual effort involved in designing useful abstractions. More surprisingly, our evaluation also shows that the abstractions generated by Atlas outperform manually-crafted ones in terms of the performance of the Blaze synthesizer in two different application domains.