Yahoo! has made available Pulsar, their publish-subscribe messaging platform used internally in production by several services.

According to Yahoo!, Pulsar is a low latency Pub/Sub messaging system that can be scaled horizontally across multiple hosts and datacenters. Yahoo! has been using Pulsar in production for Mail, Finance, Gemini Ads, Sherpa, and Sports since Q2/2015. By making it open source, they hope it will be widely used by being integrated with other open source products. Yahoo! has deployed Pulsar in over ten datacenters, reaching over 100B msgs/day spread over 1.4 million topics with an average publish latency of less than 5ms. Pulsar comes with guaranteed delivery of messages and two persisted copies, automatic cursor management for message readers and cross-datacenter replication.

One can set up Pulsar to provide messaging-as-a-service to run on a cluster or multiple clusters, and can manage it – add/remove users, add computing and storage capacity, accounting, monitoring, etc. – through an API. Clients, both producers and consumers, are set up as tenants and can access the functionality through a Java library. This library takes care of service discovery, message delivery, and other related tasks.

Pulsar uses the topic concept as the intermediary between message producers and consumers. Producers publish messages to topics either synchronously or asynchronously. Messages can be batched and compressed (LZ4, ZLIB). Clients consume those messages through subscriptions, which can be exclusive, shared (round-robin) or failover.

To provide guaranteed delivery, Pulsar persists messages to durable storage via Apache BookKeeper ledgers. Reads and writes are directed to separate physical disks to keep publish latency as low as possible. Yahoo! affirmed that using a SSD for the bookie journal device, Pulsar can achieve “99 percentile latencies of 5ms with two guaranteed copies and total ordering.”

For the future, Yahoo! plans to support non-persistent messaging, reduce topic migration time between message brokers to a value under 1 second from 10 seconds as it is now, to keep latency under 5ms for 99.9 percentile of messages published (from 99 percentile), and to support other client languages besides Java.