If you are a modern app developer planning some near-real-time processing of streaming data from your app — think tracking geolocation, collecting analytics, monitoring telemetry, etc.— one of your foundational necessities is choosing a cloud service capable of ingesting and streaming those million bits of data. Several options exist, so which should you choose?

First, let’s take a look at the requirements.

Requirements for Data Streaming

For streaming data processing, there’s a storage layer and a processing layer. The storage layer needs to support strong consistency (at least once delivery), fast read/writes, “infinitely” scalable, and allow very high throughput for handling truckloads of streaming data. The processing layer is responsible for consuming data from the storage layer, running analysis on that data, and overseeing the “processed” data by notifying the storage layer of handled data and removing it from the storage layer. Both layers need to be highly scalable, durable, and fault tolerant.

Data Streaming Services/Platforms

To satisfy the above requirements, several solutions have emerged in recent years to handle these types of data streaming workloads. These applications range from managed AWS services like Amazon Kinesis Data Firehose and Amazon Kinesis Data Streams. Other similar cloud streaming data platforms have emerged that run on AWS, such as Apache Kafka, Apache Flume, and Apache Storm.

Each service or platform has its own specific use case, pros/cons, and pricing structure. However, there’s one workhorse cloud service that has been around longer than all of these options AND satisfies ALL the requirements for successfully streaming data at the storage layer and the processing layer. It’s an unsexy beast that lacks a modern sparkle but unfailingly delivers on its promise — cost-effective, “infinately” scales, secure, guarenteed once delivery, fault tolerant, and has no limits on the level of throughput. Enter Amazon Simple Queue Service.

Handle All Your Data Streaming Needs

Amazon Simple Queue Service (SQS) was introduced back in 2004 as a messaging queue service supporting programmatic sending of messages at any scale. It remains the same reliable service today and can handle most of the common workloads as Amazon Kinesis and the other streaming data platforms, with less overhead, lower cost, and simple architecture when paired with AWS Lambda as an event source.

How it Works — Amazon SQS Producer & Consumer

When you create a queue, the queue becomes part of a distributed system among several Amazon SQS servers and each message you send to the queue are redundantly stored across multiple SQS servers. The applications sending messages to the queue are called producers — any client using the AWS SDK in your language of choice. The applications retrieving the messages from the queue are called consumers. The consumer can also be any service or client using the AWS SDK to retreive, process, and remove “handled” messages from the queue.

A Queue’s Message — Persistent Data Storage

Amazon SQS as the storage layer

So, we have a queue that is distributed among several SQS servers and we have messages that are stored in the queue, also redundantly. The message has two main components, message attributes, and a message body. The message attributes are optional for specifying anything from identifying the sender to providing message details for later processing. The message body is the actual message in raw text or JSON format. Each message can be up to 256kb in size and a queue can store an unlimited amount of messages. Your messages are delivered, stored, and waiting for processing up to 14 days in a standard queue. All customers can make 1 million SQS requests for free each month, and $0.40 per million requests thereafter, making this service a simple, cost-effective data source and streaming data service!

Amazon SQS as an Event Source to AWS Lambda

AWS Lambda service as the processing layer

We know all the benefits of streaming data to SQS as a storage layer, but what about the processor or consumer of the streaming data stored in a queue? As I hinted before, AWS Lambda is the processing layer of this streaming data architecture. As of mid-2018, Amazon SQS queue is a first-class event-driven data source to AWS Lambda. How does the Lambda integration work?

AWS Lambda service (not your Lambda function, but a Lambda SQS service running on your behalf) polls your SQS queue continually for incoming messages. When a message arrives, it receives the message(s) and then invokes your Lambda function by passing the message(s) as a parameter. Take a look at some of the Kinesis diagrams and this SQS event-driven architecture (with Lambda as a consumer) looks almost identical to what is provided by Kinesis Streams or Firehose but without the complexity of sharding, partitions, delivery systems, pipelines, etc..

If you want to dive deeper into Amazon SQS as an event source, check out my Dzone article here.

Go. Start. Streaming

I created a Github repo solution that contains a one-click Amazon CloudFormation template for creating the event-driven storage and processing backend layers. The CloudFormation stack creates an Amazon SQS queue, event source mapping to Lamda, and a Lambda function. For the client (data stream producer), I created an iOS Swift mobile app that acts as a producer of streaming data by sending single or batched (up to 10) messages at a time, directly to the source SQS queue.

Get started here.

Closing Thoughts

SQS has no overhead, is cost-effective, scales infinetely, and has no limits on the level of throughput, number of messages or queues. It’s an always-on managed storage layer and processing layer service (with the help of AWS Lambda) that just works for nearly any application. Give it a try and let me know what you think!

Typical Use Cases for Streaming Data to SQS

Geolocation: Device-based GPS data collection

Analytics: User activity and device logs

Telemetry Logging: Connected device state, messages, and event logs