On this week’s podcast, Danny Yuan, Uber’s Real-time Streaming/Forecasting Lead, lays out a thorough recipe book for building a real-time streaming platform with a major focus on forecasting. In this podcast, Danny discusses everything from the scale Uber operates at to what the major steps for training/deploy models in an iterative (almost Darwinistic) fashion and wraps with his advice for software engineers who want to begin applying machine learning into their day-to-day job.

Key Takeaways Uber processes 850,000 - 1.3 million messages per second in their streaming platform with about 12 TB of growth per day. The system’s queries scan 100 million to 4 billion documents per second.

Uber’s frontend is mobile. The frontend talks to an API layer. All services generate events that are shuffled into Kafka. The real-time forecasting pipeline taps into Kafka to processes events and stores the data into Elasticsearch. * There is a federated query layer in front of Elasticsearch to provide OLAP query capabilities.

Apache Flink’s advanced windowing features, programming model, and checkpointing convinced Uber to move away from the simplicity of Apache Samza.

The forecasting system allows Uber to remove the notion of delay by using recent signals plus historical data to project what is happening now and what will happen into the future.

Uber’s pipeline for deploying ML models: HDFS, feature engineering, organizing into data structures (similar to data frames), deploy mostly offline training models, train models, & store into a container-based model manager.

A model serving layer is used to pick which model to use, forecasting results are stored in an OLAP data store, a validation layer compares real results against forecast results to verify the model is working as desired, and a rollback feature enables poor performing models to be automatically replaced by previous one.

“Without output, you don’t have input.” If you want to start leveraging machine learning, developers just need to start doing. Start with intuition and practice. Over time ask questions and learn what you need, then apply a laser focus to gain that knowledge.

Discover QCon Plus by InfoQ: A Virtual Conference for Senior Software Engineers and Architects (Nov 4-18)

QCon Plus covers the trends, best practices, and solutions leveraged by the world's most innovative software shops. Taking place between November 4-18, the event is thoughtfully designed with shorter, focused technical sessions spread over 3 weeks. You'll learn from 54 speakers and 4 keynotes across 18 tracks. The event includes highly interactive sessions, Q&As, AMAs, breakouts, and real-time collaborative action. Save your spot now!

More about our podcasts

Previous podcasts

Rate this Article Adoption Style

Author Contacted

You can keep up-to-date with the podcasts via our RSS Feed , and they are available via SoundCloud Overcast and the Google Podcast . From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.