This is a guest blog post by Fabio Niephaus from the Software Architecture Group, headed by Robert Hirschfeld, at the Hasso Plattner Institute at the University of Potsdam, Germany.

In the summer term 2019, we ran a seminar on Polyglot Programming as part of our graduate program at the Hasso Plattner Institute in Potsdam, Germany. The main goal of the seminar was to explore the domain of polyglot programming using various GraalVM technologies and GraalSqueak, our Squeak/Smalltalk implementation for GraalVM. This helped us to better understand both benefits and problems of polyglot programming. Since Programming Experience is one of our key research topics, we focused on language interoperability and the tools that GraalVM provides. To gain more insights on both, we encouraged our students to discuss how they are using GraalVM and to share their lessons learned throughout the seminar. Seven student teams participated in the seminar and worked on their projects over the course of four months. In this blog post, we summarize the results and highlight some of those projects.

Polyglot Jupyter Kernel

Jupyter notebooks are very popular tools for data analysis and machine learning as they allow documentation of code, text, as well as (intermediate) results in the form of tables, plots, and other rich media in a single document.

In previous work, we built PolyJuS, a polyglot notebook system written in Squeak/Smalltalk. This system can not only make use of GraalVM’s Polyglot API. More importantly, it helps the user to understand the interaction between objects and data from different programming languages.

The goal of the student project was twofold: build a GraalVM polyglot kernel for Jupyter and offer a similar experience to PolyJuS in conventional Jupyter notebooks. Here’s what our students came up with:

Instead of writing a Jupyter kernel from scratch, our students decided to fork IJavaScript into IPolyglot, which runs on GraalVM’s NodeJS. This allowed them to focus on the integration of the Polyglot API. One of the aspects that has a user-facing impact is the way objects are shared between code cells. The Polyglot API provides calls for exporting values from one language and for importing these values again into others. However, having to explicitly use this API is a burden for the user. Instead, we worked out a mechanism which automates sharing of variables between languages. For this, IPolyglot collects the variables predefined in the global namespace of each language. It then is able to determine new variables based on this information which it automatically exports accordingly. Before the execution of each code cell, all these variables are automatically imported in the target language. While this approach has some limitations, it works well enough in common cases. Nonetheless, the user can always fall back to sharing objects explicitly if necessary.

To help users keep track of the objects and data shared between languages, our students also forked the Variable Inspector from Jupyter notebook extensions and turned it into the Polyglot Inspector. Similar to PolyJuS’ explorer for polyglot bindings, this inspector can be used to explore, which variables are (automatically) shared and therefore are available in all code cells. We believe this makes it easier to understand what kind of objects are shared and which messages they respond to.

If you would like to give IPolyglot a try, here’s how you can run it on Docker.

Code Editor for Polyglot Programming

Code editors are essential tools for programming. They support developers with useful features such as code completion, syntax highlighting, and sometimes integrations of specific APIs.

As part of this project, the students built a code editor in GraalSqueak, which integrates the Polyglot API. Here is what they came up with:

Switching to another programming language within the same code base means switching to a different syntax and also to different semantics. We decided to use colors to visualize this type of change in the Polyglot Code Editor. In addition, the editor prompts for user input when using a functionality provided by the Polyglot API. Although this API is available in all officially supported GraalVM languages, the actual ways to interact with it are slightly different (due to language constraints for example). The editor, however, always prompts the user in the same way for the same functionality. It then automatically generates the API call in the corresponding language and may even update imports if necessary. It also keeps track of what variables have been exported to GraalVM’s polyglot bindings and lists them when the user requests an import.

Inspired by language boxes, the code editor also supports Code Boxes. Code boxes are our editor’s way of integrating API calls for code evaluation. Normally, the user would need to create and switch to a new file in the editor. Code boxes, on the other hand, are managed entirely by the editor. The editor creates a new file for each code cell in the background and inserts Polyglot API calls accordingly. Code boxes support multiple levels of nesting. So as an example, you could write a JavaScript file with a nested Python code box, which again contains a JavaScript code box.

The implementation of our editor is polyglot as well: For example, it uses a Ruby library for syntax highlighting and Python for managing line endings.

Helping Developers Find the Right Code

One key idea of polyglot programming is the ability to always use the right language, framework, library, or tool for the job. However, that assumes that the developer already knows the right thing for the job. To address this problem, a student team worked on a tool that helps developers find code more efficiently on StackOverflow. Here is a demo of what that looks like:

The Polyglot Code Finder tool allows the user to search for code snippets written in languages supported by the underlying GraalVM on StackOverflow. It is integrated into the code editor and into our PolyJuS notebook system. In the code editor, the tool automatically creates new code boxes, and new code cells in the notebook system. Similar to the code editor, the code finder tool is polyglot: Its UI is written in Squeak/Smalltalk while the backend uses Python and Ruby for searching, validating, and cleaning code snippets. Imagine having a tool like this in the IDE or editor of your choice with support for StackOverflow, GitHub, and language documentations!

Benchmarking the Oracle Database MLE

Oracle Labs is working on an integration of GraalVM into the Oracle Database. This allows stored procedures and user-defined functions to be written in high-level languages such as JavaScript or Python.

As part of our seminar, a student team benchmarked Oracle Database Multilingual Engine (MLE) using two different algorithms taken from real-world applications: Elo is a chess rating system, which calculates scores of players based on their previous score and a random variable. Therefore, this benchmark causes a lot of reads and writes when used in a database setup, but is relatively low on CPU usage. Available-to-promise (ATP) is a business algorithm for determining quantities and delivery dates for order enquiries. For this, the algorithm needs to read all required data once to be able to calculate the result. The calculation, however, is based on backtracking. Therefore, this benchmark has relatively low database usage, but high demands on the CPU.

Figure 1: The MLE Benchmark Setup.

Our students decided to implement both benchmarks in JavaScript to be able to compare MLE with a NodeJS setup. Figure 1 shows their benchmark setup: The benchmarks ran on a Computing Instance on Oracle Cloud using a benchmark supervisor. This supervisor in turn ran both benchmarks with different problem sizes separately in the MLE docker image and in a standard NodeJS connected to the same Docker-hosted Oracle Database.

Here is a summary of what our students measured: