TCP/IP over Amazon Cloudwatch Logs

Running network services inside AWS Lambda Functions

You can’t ping an AWS Lambda function. You can’t SSH into a running function without reverse tunneling. You would never run nginx or Rails inside of a function because there’s little point — accepting an inbound TCP connection on a port like a Docker container isn’t supported by function-as-a-service (FaaS) platforms. It is possible, however, to write your own userspace networking stack and do what you want with it —including using Amazon Cloudwatch Logs (or even AWS Lambda tags) as the data link layer.

This post is about implementing TCP/IP over Amazon Cloudwatch Logs using Go, which enables you to access network services running inside of AWS Lambda functions.

It’s slow and not very useful, but it was a fun way to learn more about Linux networking and using AWS services in a way that might horrify some AWS engineering teams. I’m calling this proof-of-concept “Richard Linklayer” and published the full Github code today.

A Serverless Architecture Darkly

This experiment was the escalation of:

Trying to run progressively stranger things in AWS Lambda.

Interest in exploring netstack, an IPv4/IPv6 userland networking stack from Google.

Hearing contradictory and confusing things from some clients and engineering teams around different limitations of serverless functions.

I attempted to check off all three of those things using Go. With netstack, it’s possible to do some exotic things with networking in unexpected places, no kernel hacking (i.e. writing C code) required. Any Go program with read and write access to reliable-ish bi-directional communication channel — like logs, tags or carrier pigeon — can implement full in-process TCP/IP networking. There are many AWS services that meet this criteria.

Network tunneling is not a new idea, but inside the AWS Lambda execution environment a userspace solution is needed because it’s not possible to change routing tables, modify network interfaces, or change OS-level networking (unlike container-based network overlays or service meshes).

Running in a standard Go process, Richard Linklayer tunnels IP packets over Amazon Cloudwatch Log Streams that follow a special naming convention — the stream and log group names are just MAC addresses. Using a tun or tap interface, I can bridge my AWS Lambda network endpoints to my local development machine:

High-level architecture of Richard Linklayer where a Linux host communicates with a TCP network service running inside of an AWS Lambda function. The process polls Amazon Cloudwatch for new inbound packets.

This design ignores normal AWS Lambda event-driven patterns for processing inbound packets and sending outbound packets —only a single instance of a function starts, reads and writes from Cloudwatch using polling, then processes packets until it times out. There is only one instance of a function running a unique network service running at any given time, and it can only accept network connections for up to 15 minutes, the maximum timeout of an AWS Lambda function in early 2019.

Lifecycle of an AWS Lambda function that accepts inbound TCP requests via Cloudwatch. Don’t do this.

When you put it all together, the result is that you can ping a Lambda function.

Pinging an AWS Lambda function on Mac OS X via a Linux container. It’s kind of slow but remember everything is running over Amazon Cloudwatch.

The underlying implementation over Amazon Cloudwatch is at layer 2, so it’s reading and writing ethernet packets delivered to MAC addresses. MAC address discovery happens using ARP with a special-purpose log stream that acts as a “broadcast address” for the entire network.

Since it supports anything that runs over TCP, you can also cURL a node.js HTTP server running inside of a function, or anything else that listens on a TCP port. This server returns some text after a few seconds.

Making an HTTP request to a node.js server running inside of an AWS Lambda function.

The latency means that this would likely never work for anything production-level, but TCP is resilient enough to handle the slow connection.

We have full transparency into the network traffic since it’s just data inside of log streams. It looks like this, which is the stream of IPv4 traffic from 74:74:74:74:74 to 42:42:42:42:42:

Layer 2 Observability? Traffic from one MAC address to another that uses Cloudwatch as the channel.

An example on how to get this working with a simple network running an HTTP server is on Github. While not described in this post, there is also an additional link layer that uses AWS Lambda function tags as a transport that has slightly better performance results.

Dazed and Lambfused

There’s a lot of interesting open-source networking projects in the container space right now: Envoy, Linkerd, and Cilium to name a few. With support for custom AWS Lambda runtimes released at re:Invent in 2018, I’m curious how emerging control or data planes are going to impact serverless functions beyond additional language support and monitoring and security solutions.

Beyond that, there’s a few interesting ideas to explore beyond this proof-of-concept, including:

UDP services (DNS over AWS Lambda, anyone?)

Any kind of performance optimizations (everything is based on polling APIs and could be tuned)

Exploration of other link layer transports in AWS (DynamoDB streams? SNS? SQS? S3?)

Pull requests and ideas welcome.