The ACM Symposium on Operating Systems Principles (SOSP) has a long history and a great reputation in Operating Systems (OS) research. This year SOSP was held in Huntsville, a charming town located in lake country, some 200km north of Toronto. On a rainy Sunday, Synced visited Huntsville to check out the SOSP AI System Workshop.

At first Glance

The growing and widespread deployment of AI has motivated OS researchers to develop novel system engineering for AI. The SOSP AI System Workshop explored these efforts to advance research in AI and operating systems.

The agenda of the AI Systems Workshop:

Invited Talks

1.What are the Unique Challenges and Opportunities in Systems for ML?

-Matei Zaharia, Stanford University and Databricks

2. A View of Programming Languages & Software Engineering for ML Software

-Caroline Lemieux, UC Berkeley

Poster Session

Contributed Talks

1.Video Event Specification using Programmatic Composition

2.Adaptive Distributed Training of Deep Learning Models

3.Standardizing Evaluation of Neural Network Pruning

4.AliGraph: An Industrial Graph Neural Network Platform

Invited Talks

1.Asynchrony and Quantization for Efficient and Scalable Learning

-Christopher De Sa, Cornell University

2.Learning Based Coded-Computation: A Novel Approach for Resilient Computation in ML Inference Systems

-Rashmi K. Vinayak, Carnegie Mellon University

Poster Session

Invited Talks

1.Building Scalable Systems for Reinforcement Learning and Using Reinforcement Learning for Better Systems

-Yuandong Tian, Facebook

2.Challenges and Progress in Scaling ML Fairness

-Alex Beutel, Google Brain

Invited Talks

What are the Unique Challenges and Opportunities in Systems for ML?

An alumni of St. Joseph’s College School in Toronto and the University of Waterloo, Matei Zaharia (known in China as “马铁”) from Stanford University and San Francisco based data processing platform Databricks returned to Ontario to give a talk on the unique challenges facing systems for machine learning. As a PhD student Zaharia created a high-efficiency distributed computation engine that evolved into “Apache Spark,” which comprises a wide spectrum of libraries and components supporting SQL query and ML. Zaharia’s views on system research and machine learning are undoubtedly significant for researchers in both systems and machine learning. In his talk, Zaharia reviewed his Stanford team’s work over the past two years and explained how they tackle challenges in a systematic approach.

Before identifying the unique challenges in systems for ML, Zaharia introduced the difference between traditional software development and ML development.

He identified three major directions for system researchers to improve ML:

Data-oriented model training and inference

ML application quality assurance and debugging tools

ML platforms improvement

For data-oriented model training and inference, Zaharia introduced two Stanford projects: NoScope (VLDB’17) and Blazelt (CIDR’19).

ML inference is expensive — Zaharia explained that processing video streams in real time with CNNs requires GPUs costing US$1,000.

NoScope was proposed to optimize execution of ML models.

Blazelt was proposed to jointly optimize SQL query and machine learning tasks via query optimization and model specialization.

Regarding quality assurance (QA) and debugging tools, Zaharia discussed the work on model assertions that his team presented at NeurIPS 2018.

ML applications might fail because of the complex, hard-to-debug development procedures. For example, human gender classification accuracy can vary depending on a person’s race.

Assertion can be used to improve the quality of ML applications. For example, with video analytics on cars.

Assertions can be also used with active learning to help select data to label and train on, and to reduce the number of failed assertions.

Wrapping up his talk, Zaharia addressed challenges in ML platforms at an industrial scale. Since ML development is ad-hoc, a number of pain points can emerge when developing ML applications for industry:

Zaharia suggested developing a new class of systems for standardizing the ML data preparation, training and deployment cycle, and introduced the MLflow platform developed by Databricks.

Other ML specific opportunities mentioned by Zaharia:

Zaharia suggested developing a new class of systems for standardizing the ML data preparation, training and deployment cycle, and introduced the MLflow platform developed by Databricks.

Other ML specific opportunities mentioned by Zaharia:

A View of Programming Languages & Software Engineering for ML Software

Caroline Lemieux is a fourth year PhD candidate at the University of California, Berkeley, and a great public speaker.

Lemieux’s talk followed Berkeley’s semi-tradition of beginning with “A Berkeley view of…” and focused on efforts to improve ML systems from the perspective of programming languages and software engineering.

She first referenced an ML systems roadmap:

Although there are now many tools and platforms available for developing ML applications, Lemieux emphasized that building deep learning applications remains difficult, prompting one bold audience member to disagree. Lemieux proceeded to explain her opinion.

There are three pillars that support conventional software development regarding programming systems. Lemieux proposed deep learning development could be facilitated in a similar way.

First, the performance of deep learning applications can be improved by compilers, which can adaptively generate binary code for different hardware.

Because users may have better knowledge than general compilers regarding optimizing compilation for different hardware, Lemieux suggested high-level platforms enable users to express their performance optimization methods for compilers.

She said using a language like Halide could help improve performance.

She also suggested the different APIs of various ML platforms are a headache for developers.

Lemieux proposed adding a code recommendation system to ML development, similar to what an IDE does for Java developers.

She explained however that adding code recommendations requires a typing system, which languages like Python lack.

“What we should do then? We try to infer the types of tensors. In fact, type information has already been embedded in ML platforms.”

Based on this insight, Lemieux and her team proposed AutoPandas, neural-backed generators that can generate code for better APIs.

Lemieux questioned whether conventional software debugging tools could be used for deep learning, concluding they cannot because there are no traces available to reproduce the bug. Because a single input is not useful for debugging, machine learning applications must rely on massive input. This means developers have to know their data well when debugging their ML applications.

A variety of voices were heard in Lemieux’s Q&A session, with an audience member from Google suggesting ML development is no harder than traditional software development, which is also “swimming in an ocean of APIs.”

Asynchrony and Quantization for Efficient and Scalable Learning

The third invited talk was given by Prof. Chris De Sa, a Stanford University graduate, assistant professor at Cornell, and a very energetic speaker. No doubt, this talk was impressive and original. Even the font in his slides was the seldom-seen serif.

In this talk, De Sa introduced how to improve ML efficiency, particularly on distributed systems, by applying different numerical precisions.

De Sa wants to make ML faster and more computation-efficient.

Ways to make ML systems more efficient:

Low-precision arithmetic

Asynchronous parallel/distributed learning

There are a few existing works applying these two methods to provide efficient ML services. However, there are also drawbacks to the methods.

Drawbacks of low-precision:

Drawbacks of asynchronous parallelism:

To overcome these drawbacks and further improve the ML systems — De Sa proposed:

Think about these errors as noise in an already-noisy ML system

Prove theory that bounds the effect of this noise on the algorithm

Use theory to build more reliable algorithms and systems

De Sa used DNN training as an example to introduce the effects of numeric precisions in ML training.

And gave a few examples on different types of floating precisions:

Half-precision floating

Bfloat16 — by Google

Earlier this year, Synced evaluated the RTX Titan GPU with half-precision and observed a increase in performance by using half-precision.

Fixed point numbers may also work but are limited to a much narrower range of numbers than floating numbers. So De Sa introduced a hybrid approach: Block Floating Point.

He summarized low-precision formats and said the current problem is a lack of hardware support.

To make it easier to simulate low-precision training, De Sa and his team proposed a framework called QPyTorch that supports multiple numeric precisions formats.

In addition to the simulation framework for different low-precision formats, De Sa’s team also presented a ICML 2019 paper on theoretical guarantees of the lost accuracy due to low-precision numbers.

The other research topic introduced by De Sa was improving asynchronous parallel/distributed learning. One common solution is communication compression.

Traditional communication compression however is limited to the dimensions of model parameters, due to the nature of gradient communication.

Another parallelism mechanism that has been proposed is pipeline parallelism.

De Sa and his team developed a new asynchronous DNN training based on pipeline parallelism: PipeMare, which can train models with performance comparable to synchronous training.

Finally, De Sa summarized his team’s work on low-precision arithmetic and distributed ML training.

Learning Based Coded-Computation: A Novel Approach for Resilient Computation in ML Inference Systems

Rashmi K. Vinayak is an assistant professor at CMU, leading the TheSys group, which is part of the renown Parallel Data Lab (PDL).

Vinayak and her group are applying today’s leading theories to improve computer systems.

Distributed ML systems have multiple instances that might fail or occasionally become stragglers. How to efficiently help instances recover from failures is a significant problem.

The high level idea of Vinayak’s talk was exploiting the recovery capability of erasure coding to balance the recovery delay and resource overhead in distributed ML systems.

Erasure coding is widely used in data storage and transmission, because it provides an efficient way to recover data without introducing too much redundancy.

However, coding for storage and coding for computation are very different. Vinayak highlighted the differences.

She classified erasure coding for functions into two categories:

Linear functions, which is easy to implement.

Non-linear functions, which is challenging

Vinayak and her team proposed a learning-based approach to generalize the coding for different computations.

The idea is to use a encoding-decoding neural network to recover the computation.

By learning a code, any numerically differentiable computation can be recovered.

In summary, Vinayak provided an overview of her team’s papers and code repos:

Building Scalable Systems for Reinforcement Learning and Using Reinforcement Learning for Better Systems

Yuandong Tian is a Facebook research scientist and manager who graduated from CMU and SJTU.

Tian is a cool guy who can’t stop talking about reinforcement learning. At SOSP he discussed about how to improve RL training systems and how to build RL algorithms to solve problems in system research.

He likened his work to a loop — building scalable systems for RL, and applying RL for improving systems.

Tian began his talk by identifying three distributed systems for training RL agents.

He also summarized the challenges in building large-scale RL systems.

To tackle such challenges, Facebook researchers have proposed a RL framework for game research, ELF, which will appear at NeurIPS this year.

Before Tian introduced the impressive OpenGo model trained on the ELF system, he took a swipe at Google’s AlphaZero: “Impressive results, no code, no model”.

He then proudly introduced Facebook’s open-sourced ELF OpenGo.

Tian delivered an in-depth introduction on the distributed ELF system’s Version 1 and Version 2, which adds a number of tricks for accelerating RL training.

He then introduced the NP-hard combinatorial optimization problems that his team is attempting to solve with RL.

A number of system researchers have recently started applying RL to solve their realistic problems, such as DeepRM (H.Mao et al, Resource Management with Deep Reinforcement Learning, ACM Workshop on Hot Topics in Networks, 2016).

Tian also gave a few examples of how RL can be used to improve systems, including online job scheduling and expression simplification.

In these two applications, Tian’s team’s solutions outperformed most existing solutions.

Lastly, Tian summarize his Facebook Research team’s work on RL and large-scale systems.

Challenges and Progress in Scaling ML Fairness

Alex Beutel of Google Brain did his PhD in computer science at Carnegie Mellon University, advised by Christos Faloutsos and Alex Smola.

It’s a risky proposition to talk about ML fairness in a room filled with system people who might not even buy ML, but Beutel did a fine job considering the circumstances.

Beutel used a “gender shades’” example to discuss unfairness in ML, showing how recognition models scored significantly lower in accuracy for dark-skinned females.

He followed up by exploring the concept of “algorithmic fairness.”

Presenting additional examples in the application of comment moderation, Beutel emphasized such biases and unfairness are not uncommon in ML.

To solve this problem, he proposed that we should jointly consider both major datasets and minor datasets in attempts to bring more fairness into our trained models.

Beutel also identified future work he believes is needed across the whole ML lifecycle, including data preparation and model design, to improve fairness.

More Fresh Ideas — Selected Contributed Talks

In addition to the invited talks, the AI System Workshop also included a session of contributed talks given by authors whose posters were accepted by the workshop. Two which we found particularly impressive were:

KungFu: Supporting Adaptive deep learning

This work is interesting because it attempts to monitor SGD metrics at runtime and use these to change training settings accordingly. This idea is intuitive, it’s likely many have considered such an approach but failed to implement it.

Now researchers from Imperial College London have succeeded, with “KungFu,” which has been open-sourced on GitHub.

The KungFu workflow:

Researchers demonstrated KungFu’s capability by adjusting batch size based on the gradient noise scale.

It’s possible more than a few ML system researchers may want to use KungFu to test their new ideas.

AliGraph: an Industrial graph neural network platform

This talk was presented by Dr. Wencong Xiao, who has been working on graph computation and ML systems since he was a PhD intern at MSRA.

Alibaba Cloud is providing powerful computation support across Alibaba Group businesses. The AliGraph looks like a giant engine processing all kinds of graph data.

Xiao presented an overview of AliGraph architecture.

Graphs are widely used in Alibaba to represent connections.

AliGraph provides Pythonic interfaces to perform graph computation, for example, sampling on a heterogeneous graph.

AliGraph has been integrated into many Alibaba services.

SOSP 2019 AI Workshop

The SOSP AI Workshop ran from 9 a.m. through 5 p.m. The hyperconcentration of expert information was both exhilarating and exhausting. It’s always inspiring when excellent minds get together and share their novel ideas, and it was also very interesting to hear different and even dissenting voices from the well-informed audience members.

The SOSP 2019 workshop slides are available here.