Faust: Stream Processing for Python.

Quick Links: GitHub | Documentation

Introducing Faust

We at Robinhood believe the financial system should work for everyone, not just the wealthy. Over the last few years, we’ve brought commission-free trading of U.S. stocks, ETFs, options, and cryptocurrencies to investors on Robinhood.

As a financial institution, we have many complex systems in place that need to scale and perform well with our growing user base. Over the course of the past year alone we’ve seen explosive growth welcoming more than a million new investors to our platform. To continue increasing our product offering while ensuring the highest quality, we have adopted a variety of open source and in-house technologies to improve our productivity as a team.

Today we’re excited to open source one of the technologies we’ve developed to help us build scalable and reliable distributed systems much faster than before.

What is Faust?

Faust is a distributed stream processing library built to handle the processing of large amounts of data in real-time. It is inspired by Kafka Streams but takes a slightly different approach to stream processing. We use it in several production systems at Robinhood to process billions of events spanning terabytes of data every day.

Faust is a Python 3 library, taking advantage of recent performance improvements in the language, and integrates with the new AsyncIO module for high performance asynchronous I/O.

The source code is short enough to be easy to understand and serves as an excellent resource to learn more about how Kafka Streams and similar systems work.

Why Faust?

Unlike most stream processing frameworks, Faust does not use a DSL. Instead it provides stream processing as a Python library so you can reuse the tools you already use when stream processing. Anyone already familiar with Python programming will find it familiar and intuitive to use.

We built Faust as a library that you can drop into any existing Python

code, with support for all the libraries and frameworks that you like to use. Further, there is no need for resource managers such as Yarn or Mesos, deploy your application the way you already prefer.

Use Case

For the reasons above Faust has seen rapid adoption across various engineering teams at Robinhood. Faust makes designing traditionally complex streaming architectures simple, and we continue to discover new ways to use it.

We use Faust for:

Risk and fraud detection

Ad tracking

Order execution quality monitoring

Distributed streaming of databases across Robinhood

Robinhood Feed (chat feed on the cryptocurrency pages)

Event logging pipelines

News aggregation and tagging

Faust is available for you to download and use today: you can read the documentation and fork the repository at GitHub.

Concluding Thoughts

We plan on adding a lot more features in the future. The most interesting planned feature is the “exactly once” semantics recently introduced by Kafka.

We started working on Faust a year ago and have already received many contributions from people at Robinhood and outside contributors. We want to give special thanks to:

Arpan Shah and Shrey Shahi on the Data Team for their help and support during the design and development phase.

Daniel Ko, Sanyam Satia, Ruby Wang, Jerry Li, Grace Lu, and Allison Wang on the Data Team for being early adopters.

Tom Linford, Henry Tay and Elizabeth Hong on the backend team.

Jaren Glover, Marco Morales, Aravind Gottipati, Dennis Ordanov, and others on the DevOps team for setting up the infrastructure to enable development and testing.

Archit Shah and Christine Hung on the legal team.

Special thanks to Zane Bevan on the design team and Lavinia Chirico on the Communications team, and

Taras Voinarovskyi for his extensive work on the aiokafka client for Python.

We’re excited to see what people build using Faust. Please get in touch if you have any comments, suggestions, or want to chat about stream processing architectures. We’re also hiring!