In the open source space, especially in the field of “big data,” distributed systems are being driven more and more by data streaming message brokers such as Apache Kafka, which bring into the enterprise rivers of data larger than any single machine’s storage capacity.

In so doing, these architectures introduce a new and valuable characteristic to their streaming data: persistence, essentially by tripping operating systems into utilizing large pools of fast memory for their disk caches.

Now, the Apcera NATS messaging platform has been extended to handle such torrents of live data, as well as to provide the needed persistence.

Until this release, “the thing that NATS lacks, for all of its performance and simplicity and ease of use, is persistence,” admitted Larry McQueary, Apcera NATS product manager. “If you look at what we learned in enterprise messaging, one thing is that you definitely need to persist data in order to do transactional types of processing, but also to maintain the history of events for recapturing and re-creating states of applications for troubleshooting, as well as recovery purposes, or perhaps for analytics purposes — which, as we see, is very prevalent today with big data, fast data, and event processing.”

McQueary says Apcera decided that, for NATS to join this movement, it did not need to reinvent the wheel or create another new, complex system. Rather, company engineers aimed to facilitate replays of streams of events in a variety of manners, while maintaining NATS’ native pub/sub model.

The NATS Streaming is the culmination of their work. Introduced in early July, it is a NATS client that overlays the basic NATS client. From there, it overlays NATS transport with its own protocol, adding streaming and replay capabilities in the process. Messages continue to use text-based primitives with binary payloads, though the streaming supplements end up being more structured than NATS’ typically “freeform” payload, as McQueary described it:

“We’ve overlaid a simple, but structured, protobuf message on top of the basic NATS client protocol. Basic NATS is an on/off stream of real-time messages using publish/subscribe. Clients publish a blob of data, and that blob might be text, part of a file, or some sort of status information, using a very simple publish API mechanism. It’s published to the NATS server — the Go-based NATS daemon, the message router within NATS. It can be clustered, and made highly available and fault-tolerant. But the stream of messages that’s flowing through is all very real-time. We call that a ‘fire-and-forget’ mode; some folks call it ‘spray-and-pray.’”

Apcera and NATS

Before Apcera founder Derek Collison built the container staging platform around which he built his company, he designed and constructed messaging systems for interconnected, distributed systems. Metaphorically speaking, Collison invented much of the glue that holds cloud platforms together — most memorably, Cloud Foundry. In a very real sense, Apcera’s CEO is responsible for inventing a good portion of the infrastructure that supports what we call “the new stack” in modern data centers.

Earlier, for TIBCO, Collison created a middleware platform called Rendezvous —a way for developers to build event-driven applications using common languages, using a simplified publish/subscribe (“pub/sub”) model. It’s a loose, but adequately reliable, connectivity, where a message is broadcast, and the various subscribers may expect to receive event messages at some point down the road. Originally, he used UDP as the system’s transport protocol for multicast messages, sent from one host to a series of recipients — triggering a staggeringly intricate process of routing and re-routing for all those recipients (or at least as many as possible) to get their messages.

For NATS to join this movement, Apcera decided it did not need to reinvent the wheel or create another new, complex system. Rather, company engineers aimed to facilitate replays of streams of events in a variety of manners, while maintaining NATS’ native pub/sub model.

Then it occurred to him, as more networks began fast-tracking packets with the highest traffic levels, they were prioritizing TCP packets over UDP datagrams, simply because TCP was the backbone of the Web. It ended up making more sense to send messages by brute force — for example, to repeat 30 messages to individual recipients over TCP, than to send one multicast message to the same group of 30 over UDP.

All that work culminated in NATS, a message broker for distributed workloads to communicate data between each other. Originally written in Ruby, Collison later rewrote the messaging core in Go, to speed performance.

Offsets and Durables

Typically a subscriber must be connected to NATS at the time messages are published. Clients filter through only those message streams to whose topics they subscribe. This is unlike a message queue model, such as RabbitMQ, where the publisher posts messages in a sequence and clients retrieve them whenever they get around to it, in the same sequence. As McQueary points out, this makes NATS better suited to tasks such as signaling and real-time event processing (RTEP).

But adding persistence gives NATS one of the benefits that RabbitMQ and its counterparts originally counted as advantages: for instance, being able to pull up histories, looking back in time, as a subscription option. Like Apache Kafka, NATS Streaming introduces the concept of an offset, which is a simple pointer that can either initiate the lookback or can hold one’s last-viewed position for later use — this latter use case, with the help of a feature called a durable.

“In most messaging systems, you’ve got the ability to do anything you want in your own code. Your client can keep track of lots of different things. That can sometimes be cumbersome, though, when what you’re trying to keep track of is something that you have no control over, like a data stream,” McQueary explained.

“So rather than force a client to keep track of the last message that it has received or acknowledged, durables cause the NATS Streaming server to keep track of whatever the latest acknowledged message was in the data stream. In that way, if a client happens to become disconnected for some period of time — and perhaps that’s an extended outage, or a network outage over a WAN — when the client reconnects, it really needs to pick up where it left off immediately, without having to keep track of that information itself. Durables allow the client to subscribe to that topic, and using the same durable ID that it used the last time it subscribed, the server will begin sending messages to that client, starting at the last sequence number after the last acknowledgment.”

Originally, NATS did not recognize acknowledgments (or what digital communications folks call “ACKs,” with one syllable). Although TCP/IP does use ACK in its handshaking sequence to establish an interface connection, HTTP — the Web protocol layered atop TCP/IP — is a stateless protocol that doesn’t rely on acknowledgments from Web browsers. It assumes TCP has taken care of that underlying connection. As a result, it’s hard to build an acknowledgment system atop a protocol that’s essentially leveraging HTTP.

NATS Streaming adds the acknowledgment protocol that NATS had been missing, at the messaging service layer. This gives NATS something else it didn’t have before: delivery guarantees. Whereas NATS by itself was only capable of what Derek Collison conceded to be “at-most-once” delivery (in other words, certainly not duplicated), NATS Streaming now enables “at-least-once” delivery.

“For any given message on a subject, and for any eligible subscriber to that subject,” described Apcera’s McQueary, “each message will be transmitted to that client at least once, and in some cases twice. But because we mark those as duplicates, most applications will be happier with that, in the sense that they’re able to receive all messages and ignore duplicates, yet never miss a message on a subscription.”

With What to Bundle and How

For now, Apcera’s plan is to refrain from embedding NATS Streaming into the principal NATS server. “We really want to keep NATS as simple as possible,” McQueary said, “and fight the urge to add every feature in the book to it, and make it part of the core platform.” He cited Apache ActiveMQ (after paying due respects to friends of his who work on that project) as an example of a project that began with a noble purpose, but that in his view took on too many goals, and as a result became overloaded with features.

For that reason, he explained, NATS Streaming will not be bundled or integrated with NATS, although NATS Streaming is now being shipped with an embedded NATS server that loads directly into the Streaming process.

McQueary sees the optimum use case candidates for NATS Streaming as “any application that needs to process events that are potentially happening in real-time, or being replayed from a data source, to reconstruct the state of a particular series of events or do things like time-series analysis,” and, “anything that needs to do control plane type processing” — the latter being a reference to Netflix microservices architecture, which uses Kafka.

Today, Kafka is referred to almost synonymously with real-time message brokering. Whether NATS Streaming makes a dent in this field and gets mentioned in the same sentences as Kafka, may depend in large part upon the same factor which Apcera’s containerization platform leverages in conversations that involve Docker: the quality of its own craftsmanship.

Apcera and the Cloud Foundry Foundation are sponsors of The New Stack

Title image: A kayaker traveling the St. Francis River in Missouri by Kbh3rd, licensed under Creative Commons.