4 min read

Last week, Netflix announced the open source launch of Polynote which is a polyglot notebook. It comes with a full scale Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, SQL, and provides IDE-like features such as interactive autocomplete, a rich text editor with LaTeX support, and more.

Polynote renders a seamless integration of Netflix’s Scala employed JVM-based ML platform with Python’s machine learning and visualization libraries. It is currently used by Netflix’s personalization and recommendation teams and is also being integrated with the rest of the Netflix research platform.

The Netflix team says, “Polynote originated from a frustration with the shortcomings of existing notebook tools, especially with respect to their support of Scala.” Also, “we found that our users were also frustrated with the code editing experience within notebooks, especially those accustomed to using IntelliJ IDEA or Eclipse.”

Key features supported by Polynote

Reproducibility

A traditional notebook generally relies on a Read–eval–print loop (REPL) environment to build an interactive environment with other users. According to Netflix, the expressions and the results of a REPL evaluation is quite rigid. Thus, Netflix built the Polynote’s code interpretation from scratch, instead of relying on a REPL.

This helps Polynote to keep track of the variables defined in each cell by constructing the input state for a given cell based on the cells that have run above it. By making the position of a cell important in its execution semantics, Polynote allows the users to read the notebook from top to bottom. This ensures reproducibility in Polynote by increasing the chances of running the notebook sequentially.

Editing Improvements

Polynote provides editing enhancements like:

It integrates code editing with the Monaco editor for interactive auto-complete.

It highlights errors internally to help users rectify it quickly.

A rich text editor for text cells which allows users to easily insert LaTeX equations.

Visibility

One of the major guiding principles of Polynote is its visibility. It enables live view of what the kernel is doing at any given time, without requiring logs. A single glance at a user interface imparts with many information like-

The notebook view and task list displays the current running cell, and also shows the queue to be run.

The exact statement running in the system is highlighted in colour.

Job and stage level Spark progress information is shown in the task list.

The kernel status area provides information about the execution status of the kernel.

Polyglot

Currently, Polynote supports Scala, Python, and SQL cell types and enables users to seamlessly move from one language to another within the same notebook. When a cell is running in the system, the kernel handovers the typed input values to the cell’s language interpreter. Successively, the interpreter provides the resulted typed output values back to the kernel. This enables the cell in a Polynote notebook to run irrespective of the language with the same context and the same shared state.

Dependency and Configuration Management

In order to ease reproducibility, Polynote yields configuration and dependency setup within the notebook itself. It also provides a user-friendly Configuration section where users can set dependencies for each notebook.

This allows Polynote to fetch the dependencies locally and also load the Scala dependencies into an isolated ClassLoader. This reduces the chances of a class conflict of Polynote with the Spark libraries. When Polynote is used in Spark mode, it creates a Spark Session for the notebook, where the Python and Scala dependencies are automatically added to the Spark Session.

Data Visualization

One of the most important use cases of a notebook is its ability to explore and visualize data. Polynote integrates with two open source visualization libraries- Vega and Matplotlib. It also has a native support for data exploration such as including a data schema view, table inspector and plot constructor. Hence, this feature helps users to learn about their data without cluttering their notebooks.

Users have appreciated Netflix efforts of open sourcing their Polynote notebook and have liked its features

This is so cool. + open source. @netflix 👏 "Polynote: a new, polyglot notebook with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more." 🤯https://t.co/Oihzp1PL5p pic.twitter.com/PaQgvOMmAH — Suzana Ilić (@suzatweet) October 25, 2019

2019 has seen a lot of growth in integrating data science with teams and to production. Netflix's solution looks very promising. "Polynote allows multiple languages to be combined in a single program." by @jrdothoughtshttps://t.co/PAi0x3qpSO — Julian Harris (@julianharris) October 26, 2019

Visit the Netflix Techblog for more information of Polynote. You can also check out the Polynote website for more details.

Read Next

Netflix security engineers report several TCP networking vulnerabilities in FreeBSD and Linux kernels

Netflix adopts Spring Boot as its core Java framework

Netflix’s culture is too transparent to be functional, reports the WSJ

Linux foundation introduces strict telemetry data collection and usage policy for all its projects

Fedora 31 releases with performance improvements, dropping support for 32 bit and Docker package