By the time you are done reading this blog post, Google Cloud Platform customers will have processed hundreds of millions of messages and analyzed thousands of terabytes of data utilizing Cloud Dataflow , Cloud Pub/Sub , and BigQuery . These fully-managed services remove the operational burden found in traditional data processing systems. They enable you to build applications on a platform that can scale with the growth of your business and drive down data processing latency, all while processing your data efficiently and reliably.

Every day, customers use Google Cloud Platform to execute business-critical big data processing workloads, including: financial fraud detection, genomics analysis , inventory management, click-stream analysis, A/B user interaction testing and cloud-scale ETL.

Today we are removing our “beta” label and making Cloud Dataflow generally available. Cloud Dataflow is specifically designed to remove the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model. Based on more than a decade of Google innovation, including MapReduce , FlumeJava , and Millwheel , Cloud Dataflow is built to free you from the operational overhead related to large scale cluster management and optimization.

Cloud Dataflow provides a unified computation model for batch and streaming processing

With Cloud Dataflow GA you get:

A fully managed , fault tolerant, highly available, SLA-backed service for batch and stream processing.



"We are utilizing Cloud Dataflow to overcome elasticity challenges with our current Hadoop cluster. Starting with some basic ETL workflow for BigQuery ingestion, we transitioned into full blown clickstream processing and analysis. This has helped us significantly improve performance of our overall system and reduce cost." Sudhir Hasbe, Director of Software Engineering, Zullily.com

“The current iteration of Qubit’s real-time data supply chain was heavily inspired by the ground-breaking stream processing concepts described in Google’s MillWheel paper. Today we are happy to come full circle and build streaming pipelines on top of Cloud Dataflow - which has delivered on the promise of a highly-available and fault-tolerant data processing system with an incredibly powerful and expressive API.” Jibran Saithi, Lead Architect, Qubit

A comprehensive model for balancing correctness, latency, and cost when dealing with unordered data at massive scale. These concepts power key elements of the Cloud Dataflow programming model.













"Streaming Google Cloud Dataflow perfectly fits requirements of time series analytics platform at Wix.com, in particular, its scalability, low latency data processing and fault-tolerant computing. Wide range of data collection transformations and grouping operations allow to implement complex stream data processing algorithms." Gregory Bondar, Ph.D., Sr. Director of Data Services Platform, Wix.com

Great performance. Cloud Dataflow is 2-3x faster and cheaper than Hadoop when evaluating classic MapReduce based pipelines, such as PageRank and WordCount. And with dynamic work rebalancing , Cloud Dataflow effectively optimizes resource utilization which provides additional performance gains without requiring manual intervention.



"We're excited to collaborate with Google Cloud Platform on integrations with Salesforce Wave. The integrations with Google Cloud Dataflow further enable Wave to deliver insights to business users. Businesses can now use vast, diverse datasets like machine-generated data to derive customer insights in near-real-time." Olivier Pin, VP of Product Management, Wave Analytics, Salesforce.com



"Tamr and Google Cloud Dataflow are simplifying how people access and use crucial data and distributed computing assets in the enterprise. The combination of Cloud Dataflow and Tamr running on Google Cloud Platform enables organizations to connect and enrich their enterprise data at internet scale." Andy Palmer, co-founder and CEO of Tamr, Inc.

Cloud Dataflow seamlessly integrates with Google Cloud Platform, third party services & data stores

Native Google Cloud Platform integration for Cloud Storage, Cloud Datastore, BigQuery, and Cloud Pub/Sub. You now get full query support for our BigQuery source. Our integration with Cloud Pub/Sub now provides source timestamp processing in addition to arrival time processing. Source timestamps, when combined with flexible Windowing and Triggering primitives, enable developers to produce more accurate windows of data output.



"We are very excited about the productivity benefits offered by Cloud Dataflow and Cloud Pub/Sub. It took half a day to rewrite something that had previously taken over six months to build using Spark" Paul Clarke, Director of Technology, Ocado

A decade of internal innovation also stands behind today’s general availability of Google Cloud Pub/Sub. Delivering over a trillion messages for our alpha and beta customers has helped tune our performance, refine our v1 API , and ensure a stable foundation for Cloud Dataflow’s streaming ingestion , Cloud Logging’s streaming export , Gmail’s Push API , and Cloud Platform customers streaming their own production workloads — at rates up to 1 million message operations per second.

Such diverse scenarios demonstrate how Cloud Pub/Sub is designed to deliver real-time and reliable messaging — in one global, managed service that helps you create simpler, more robust, and more flexible applications.

Cloud Pub/Sub connects your services to each other, to other Google APIs, and third parties.

Cloud Pub/Sub can help integrate applications and services reliably, as well as analyze big data streams in real-time. Traditional approaches require separate queueing, notification, and logging systems, each with their own APIs and tradeoffs between durability, availability, and scalability. Cloud Pub/Sub addresses a broad range of scenarios with a single API, a managed service that eliminates those tradeoffs, and remains cost-effective as you grow, with pricing as low as 5¢ per million message operations for sustained usage.

General availability is a key milestone, though hardly the end of the road. We are continuing to innovate with the alpha release of the gcloud pubsub tool and today’s beta release of our new Identity and Access Management (IAM) APIs and Permissions Editor in the Google Developers Console.These improvements allow users to control access down to the level of particular operations on specific topics and subscriptions. IAM ACLs make it easier to connect multiple Cloud Platform projects, either within the same organization or to third-party services.

Get Started

We’re looking forward to this next step for Google Cloud Platform as we continue to help developers and businesses everywhere benefit from Google’s technical and operational expertise in big data. Please visit Cloud Dataflow and Cloud Pub/Sub to learn more and contact us with your feedback, ideas for new connectors, or even new public data feeds we can help you share.

- Posted by Eric Schmidt (not that Eric), PM Cloud Dataflow & Rohit Khare, PM Cloud Pub/Sub