In this test, I have set a boundary square around the Manhattan area, using a custom function in Artillery to randomly select a latitude and longitude within these bounds. I chose this area due to the large number of Starbucks stores in New York City — I’m likely to get multiple results regardless of exactly where the test lands.

Artillery load test results.

This test runs for 2 minutes (with 20 users a second), completing 2400 requests in total. During this time DynamoDB RCUs increase to 86 and latency drops to 7–10ms:

In this initial configuration with the code shown above and with no caching in place, we can easily serve 1.7 million API hits daily (at 20 requests a second evenly spread throughout a day). But what are the costs?

DynamoDB: requires under 100 RCUs, so less than $10 a month.

API Gateway is charged at $3.50 per million requests on the lowest tier, so 52.3 million requests monthly costs around $183.

Lambda: the average function execution was 382ms, so 400ms with 1024Mb memory and 52 million executions costs around $357.

The total estimated monthly cost to serve 52 million look-ups in this setup is around $550.

More locations, more writes

The Starbucks location list is relatively static and the activity on the database table is almost exclusively ‘reads’. Based on DynamoDB’s performance characteristics, the number of items in the table isn’t likely to affect the read performance, and the table could be substantially larger with no noticeable difference in query latency.

However, the NPM library is using the geohash as a partition key so it’s not possible to update locations for items already in the table (you must delete and recreate the item). This isn’t a issue for physical places like Starbucks because they don’t change locations generally, but if you had a list of objects that move frequently, this probably isn’t the right library to use.

Global tables

If a large retail company like Starbucks used this service to find their nearest store globally, DynamoDB Global Tables would be a great option for spreading the load across multiple regions and reducing the latency of the lookup. In this case, if 50% of the searches occurred in Europe, replicating the table to this region effectively offloads 50% of the traffic from the main table in us-east-1.

This approach may not be suitable for all business cases (for example, where lookups are coming from a single geographic region or where data may not be moved out of specific regions) but given the simplicity of using this feature, it’s another tool for achieving greater scale with no change to our code.

API Gateway caching

Our current approach has the user supplying their latitude and longitude coordinates, while our Lambda function computes the geohash (z-index) and finishes the lookup. This is great for our users because they don’t need to know anything about geohashing to send a request, but it’s bad for caching. Why?

Unless two users are sitting atop one another, they will have slightly different coordinates, even though their position computes the same geohash. Currently, we won’t be able to leverage request caching at the API level.

However, if we pull the hashing computation out of the Lambda and put it in the client-side library, a user’s request goes from POST /starbucks body={lat:45.345, lon:45.123} to GET /starbucks?geohash=876543. What’s the difference?

Users from many different coordinates will ‘collapse’ or ‘bucket’ into a single geohash index and we can re-use their requests in a cache to reduce the load on our database. For example, suppose that (45.345, 45.123) geohashes to 876543, it is very likely that (45.145, 45.323) also geohashes to the same bucket and therefore their new requests would both query the same bucket, and we could reuse the response from the first request to satisfy the second request.

Because the creation of a Starbucks is a relatively slow-moving process, we can set a high time-to-live (TTL) on the cache so that requests are remembered for several days. API Gateway creates a cache on all GET requests by default when you enable caching and the owner of the API can further specify which querystring parameters to use in determining which requests “hit” or “miss” in a cache.

Is there a better (serverless) way?

Richard had an interesting point-of-view on this question: “Now that we have pulled most of the logic out of the Lambda function and pushed it to the client-side of our application, our Lambda function is a cold dead husk of its former self. The most respectful thing we can do now is to thank it for the joy (and caffeine) it once provided, hug it deeply, and promptly delete it!”

It’s an interesting idea because it gets to the heart of the tradeoffs in all of these decisions. With AWS Service Integration on our API Gateway resource, we can directly connect API Gateway to our Dynamo Table to query geohashes without the added complexity of managing a Lambda function.

The major drawback to this approach is that our Lambda function could have made several query calls to the location table and aggregated the responses into a single response. We will lose this capability with Service Integration. Richard concludes, “I would argue that if you’re getting more than 1Mb worth of location data for Starbucks near you, the second megabyte is like the second page of a Google search — it’s there, but if you haven’t decided which link to click on by now, the second page doesn’t have the answers you seek.”

Richard’s idea here dramatically reduces the cost by eliminating Lambda completely, but also improves latency and scale by creating a direct line between API Gateway and DynamoDB. It may not be suitable for all use cases but shows how some creative thinking provides other serverless solutions for this problem.

Want to learn more? Check out Richard’s cloud blog at https://rboyd.dev.