Search & Experimental Design

While gathering the requirements for this new system, it became evident that not all stream processing is created equal. Some streaming jobs were simple transformations that put messages back onto a stream and others were complex, memory and CPU intensive processes that were designed to handle the complex event processing use case. These are cases where the data may come out of order over a time window, or where certain events may trigger new stream processing pipelines to spawn. For the latter case tools like Spark, Heron and Flink seemed like a no-brainer, but for the simple case, there was some question about adopting a complex topology with the distributed state to do small computations on streams of data with no care about the order of the data. I decided to narrow down my list and research tools that would enable a simple stream processing topology for these cases.



Outside of some managed offerings, Apache Pulsar (The distributed pub-sub and queuing system) with Pulsar Functions was the simplest topology. Some additional benefits of using pulsar were around the ease of operability with Kubernetes and Pulsars flexibility in how to store data long term with the tiered storage API. Pulsar Functions are lightweight functions that run on the Pulsar nodes. They consume Pulsar topics and execute predefined logic on each message or a batch of pub/sub messages. They are ideologically similar to using AWS Lambda + Kinesis; however, there is a shared resource pool between the functions and the Pulsar Nodes. An additional benefit of this set up would be reduced network latency since the data is streamed and processed on the same hardware. My only hesitancy at the time was surrounding the scalability of Pulsar Functions. In my tests, I proved Pulsar could handle the message volume required, but Pulsar Functions was in beta at the time, and it was unclear how processing data on Pulsar nodes would affect the entire system and if I would have trouble with backpressure, or CPU constraints.