Unbounded data streams create difficult challenges for our application architectures. The data never stops coming, and we are forced to assume that we will never know if or when we have seen all of our data. Some streaming systems give us the tools to deal partially with unbounded data streams, but we have to complement those streaming systems with batch processing, in a technique known as the Lambda Architecture.

Apache Beam is a unified model for defining and executing data processing workflows, and Frances Perry joins the show to explain how Beam provides a way for us to model our data processing, agnostic of whether we choose to run those workflows on Spark, Flink, or Google’s Dataflow.

Links

Sponsors

Alooma is your data pipeline as a service. Alooma is a fully managed tool for pulling from different data sources–MySQL, Postgres, elasticsearch, Salesforce, and many others. Go to alooma.com/sedaily for more information.