The Path Towards Macroservices

It seems like microservices are the new hotness these days. How can organizations move away from monolithic application architectures to build a more distributed and maintainable service architecture?

Here at Adwerx, we’ve been considering this question quite a bit lately. Like any other mature startup, our app has grown to epic proportions. Last year, we started taking steps towards breaking apart the app into separate services. With that effort, we had to start asking ourselves some questions:

What logically makes sense as a separate service? What criteria should we use to evaluate opportunities? How large (or small) should a microservice be? What are the easy wins? How can we rework our architecture without disrupting normal business operations and continue to build new features to meet business demands?

There are many ways to answer the first question. Some might argue that a separate service should follow the single responsibility principle as closely as possible. A service should do one thing and one thing only. One should be able to describe what that service does in just a few words without any “oh, and also” qualifiers.

One of my favorite ways to look at how complex an app should be came from one of my computer science instructors. He always said that a stand-alone app should only get so large and complex that a single developer can hold in her mind what that app does and be able to easily describe it to another developer. Once that single developer cannot grasp the full complexity of the app, you’ve crossed over into dangerous territory. One must also consider that the more distributed a system becomes, availability may improve but consistency may take a hit, so the trade-offs must be carefully considered.

While we were working through these questions, one obvious opportunity presented itself. Our monolith was responsible for running not only our entire Adwerx product, but also all features related to our product marketing. These marketing features didn’t make sense as part of the main architecture. For one, the marketing team’s needs required a quick development turnaround time. Deploying the entire monolith required a lot of code review and QA overhead just to add these small (occasionally one-off) features that weren’t part of the concerns of the rest of the app. What’s more, the marketing infrastructure relied on a large number of Sidekiq jobs, a number of which performed database-intensive queries, and the volume of which was far greater than the pool could reasonably handle.

In addition to these pain points, this summer, we were gearing up to build out a large new marketing feature. This seemed like the perfect time to branch out and build a standalone service — the Marketing Automation app. The MVP, called Meteor, would first meet the new project’s requirements, and then, once proven stable, we would begin to move some of the existing features from the monolith to the new infrastructure.

Meteor seemed to meet all our criteria for a new microservice:

It logically made sense as a separate service. This app had a single responsibility — providing tools for our product marketing team. Furthermore, the MVP required very little knowledge of the monolithic app’s data and inner workings. It seemed like an easy win. We estimated we could build the basic MVP in just about four weeks, with ongoing work to build more features. Because this was a new feature request, it allowed us to satisfy business needs while ever-so-slightly moving away from the monolith. Once we had the base MVP in place, it was easy to incrementally move the old infrastructure over. The finished product is much more reliable and user-friendly than what we had previously. We could isolate performance issues and scale each app as needed. Our main app is no longer saddled with the same volume of workers and database queries, and we significantly increased developer velocity, deploying code more quickly (and confidently).

While we liked to think of Meteor as one of our first services, based on its scale and that it’s one of our only apps external to the monolith, I’ve lovingly dubbed it our first macroservice. It moved us closer to a service-oriented architecture and doesn’t require the maintenance that would be needed if we were to move towards many small microservices.

Okay, enough background. Now you know why we built Meteor. Let’s talk about how we did it.

The Meteor Macroservice

Meteor is a relatively simple app. It’s a Rails 4 app hosted on AWS. We used a bootstrap template for the design. Meteor has its own database, and maintains its own record of each marketing activity. The admin interface of Meteor allows our marketing team to manage the majority of their marketing efforts, and a host of Sidekiq workers and third-party API integrations allow us to execute those efforts. Building that infrastructure was a fun challenge, and there’s a lot I could say about the design patterns we used and architecture we settled on. One of the more interesting pieces is the Pub/Sub pattern we used to send messages from our monolith to the new Meteor app, which is what this post will cover.

Pub/Sub FTW

One of our marketing efforts is centered around keeping customers up-to-date when new products are available to them. One of our cornerstone products, the ads by zip code product, allows realtors to purchase ads targeted to specific zip codes. When a specific zip code is sold out, realtors can join a “waitlist” for that slot. When that slot opens up again, we’ll notify realtors on the waitlist so they can purchase that slot before it gets snatched up again.

In order to send these notifications to customers from the marketing app, we needed to immediately notify Meteor when an ad campaign in the monolith was completed. We needed a way to publish events from Adwerx, and have Meteor subscribe to those events and process them as necessary.

For publishing, we turned to the RabbitMQ message broker, which is written in Erlang and uses the Advanced Message Queuing Protocol. It’s an extremely robust tool that can handle a large number of concurrent messages. For a more detailed explanation of how RabbitMQ works, head on over to the tutorials section, which gives a great rundown. For now, I’ll keep the explanation pretty light.

We found that CloudAMQP met our needs for hosting, and the Bunny gem made integration with our Rails app (mostly) a breeze.

The first step in publishing messages is establishing a connection, and then opening a channel within the connection. A channel may assert the existence of or create many different types of exchanges, which work similarly to a mail room for these messages. Messages can be published to the exchange, and the exchange may route that message to any number of queues. Once the message is in the queue, a listener can connect to the queue and pop the message off and acknowledge that it’s been received.

The beauty of this model is that multiple queues can bind to the same exchange. A message only need be published once, and any queues bound to that exchange will receive its own copy of the message and can do with it what it will. Neither the publisher nor the subscribers need any knowledge of what anyone else is doing with the message, or whether another subscriber has successfully received and acknowledged the message.

Publishing Messages

Let’s look at a concrete example of some messages we’re publishing from our Adwerx app. Code may be edited for brevity. :)

We’ve defined a module that provides the publish method, as well as the channel_pool for our Bunny connection.

module AMQPConnectionModule

def publish

channel_pool.with do |channel|

configure_channel(channel).publish(payload.to_json)

end

end def channel_pool

@channel_pool ||= AMQPConnectionManager.channel_pool

end

end

We have a number of different publishers in the app, which all include this module. In the case of our waitlist event, we publish what’s called a CampaignEvent . Our CampaignEventPublisher looks something like this:

class CampaignEventPublisher

include AMQPConnectionModule attr_reader :campaign, :event

EXCHANGE = :campaign_events def initialize(campaign, event)

@campaign = campaign

@event = "#{campaign.short_type}.#{event}"

end def publish

channel_pool.with do |channel|

configure_channel(channel).publish(

payload.to_json,

persistent: true,

routing_key: event

)

end

end protected def configure_channel(channel)

channel.topic(EXCHANGE, durable: true)

end def payload

Messaging::V1::CampaignEventSerializer.new(campaign).attributes

end end

A few things to note about this class:

We’ve defined an exchange for this specific publisher. All messages published from this publisher will go into the persistent campaign_events exchange. Any queues that want to receive those messages must bind to that specific exchange. We’ve defined the exchange as a topic exchange. This can be seen in the configure_channel method. Topic exchanges allow us to add some additional metadata (called a routing key) to the message, and allow subscribers to only receive messages with a specific routing key. When we publish the message, we add the `routing_key: event` to the message.

RabbitMQ’s docs visualize topic exchanges in the following way:

RabbitMQ topic exchange

Where “X” is the exchange, and “Q1” and “Q2” are queues bound to different routing_keys.

Then, when a campaign ends, we can publish an event by simply instantiating our publisher and calling the publish method:

CampaignEventPublisher.new(campaign, ‘ended’).publish

This publishes our message to the campaign_events exchange, with the routing_key of “ended.”

In addition to publishing these campaign ended events, we can also publish an event when a realtor subscribes to the waitlist. Cool. Now what?

Subscribing to Messages

We’ve got messages publishing to our RabbitMQ exchange, but now Meteor needs a way to receive those messages and process them.

We decided to lean on Sneakers to subscribe to our queues and process messages. We needed something that was durable and performant. Sneakers is “always on” and always listening for messages. For more background on how Sneakers is different from other solutions (like Sidekiq), check out Dotan’s great post, Why I built it.

Once your Sneakers configuration is all set up, creating a subscriber is a piece of cake. To process the messages we mentioned above, we need a way to get the campaign ended events. Here’s what the subscriber in the Meteor app looks like:

class AdwerxCampaignEndedSubscriber

include Sneakers::Worker

QUEUE_NAME = :adwerx_campaign_ended_subscriber from_queue QUEUE_NAME,

durable: true,

ack: true,

prefetch: 100,

exchange: :campaign_events,

exchange_type: :topic,

routing_key: 'ended',

arguments: { :'x-dead-letter-exchange' => "#{QUEUE_NAME}-retry" } def work(message)

response = JSON.parse(message, symbolize_names: true)

Workers::AdwerxCampaigns::ProcessEndedEvent.

perform_async(response)

ack!

end

end

When we first deployed the Meteor app, Upstart created three new queues in our RabbitMQ interface — adwerx_campaign_ended_subscriber , adwerx_campaign_ended_subscriber-retry , adwerx_campaign_ended_subscriber-error . The error and retry queues are created based on our MaxretryHandler configuration. Sneakers will receive messages and, in the event of an error, will retry sending the message until the maximum number of retries is reached. Once the max is reached, the message will be placed in the error queue. More details on maxretry can be found here. These queues, once created, are durable and remain bound to the exchange. So, if messages come in during a deploy, the messages will remain in the queue until the app has restarted and is ready to process them.

We’ve also defined the exchange we want to bind our queues to, campaign_events , and the routing_key we’re interested in. In this case, we specifically want ended events, so that’s the topic we specified. You can also specify a wildcard topic if you want all events.

This subscriber is dead simple. All it does is receive any messages in the exchange, parse them, send them off to another worker (in this case, a Sidekiq worker), and acknowledge that the message was received. This all happens in the work method. Sneakers workers can also implement a work_with params method that allows access to the message’s params and headers. routing_key s are contained within the headers, so if you want to route messages based on the topic, you can do so.

The Sidekiq worker is then responsible for processing the message as needed. In this case, because we’re looking specifically at ended campaigns, the worker will look to see if any customers are on the waitlist for that campaign’s zip codes, and notify them. Of course, in order to do so, Meteor also needs knowledge of who is on the waitlist. Similar to processing the campaign ended events, Meteor also subscribes to the waitlist messages and stores that information in its database. Finding waitlist users then becomes a simple query.

Some of our other Sneakers subscribers do a bit more heavy lifting before passing the work off to Sidekiq, but in most cases, we wanted to limit the responsibility of these subscribers. We found Sidekiq’s logging and error handling more satisfactory than what Sneakers provides, so we’re mainly relying on our Sneakers workers to grab the message from the exchange and route it to the correct Sidekiq worker. And, because multiple queues can bind to the same exchange, we can have multiple subscribers that each kick off a different worker with the messages they receive, either based on the routing key or other information contained within the message.

This architecture is fast and reliable. Now, instead of having to run a worker on a specific schedule to find whether any waitlist customers have available slots, we get the events as they happen and can process them in near real-time. There’s no need for expensive API calls or database queries to keep the two databases in sync, and any new micro/macroservices that we may build could also receive those messages and do whatever they need with that data.

And, of course, this is just one example of how we’re syncing data between the two applications. Once we were able to get the basic infrastructure working, we were able to publish many more types of messages from the monolith to Meteor as needed, and Meteor is able to process those messages in near real-time, instead or relying on hourly workers.

What’s more, we can now use this pub/sub infrastructure within the monolith. We can create queues that bind to the same `campaign_events` exchange and process the messages as needed. In the case of ended campaigns, our subscriber can handle various activities related to a campaign ending, whether that’s sending the customer a final performance report or changing the status of the campaign’s zip code slots. This allows us to handle these activities asynchronously, without having to explicitly schedule any workers to do the work.

One Gotcha

While we’ve had a lot of success with this architecture, it was not without some trials and tribulations. One of the biggest gotchas that took a while to track down was an intermittent timeout error with our AMQP connections.

Some of our messages were timing out when being published, resulting in the following (truncated) stack trace:

Completed 500 Internal Server Error in 16836ms Timeout::Error: config/initializers/rabbitmq.rb:7:in block in <top (required)>' app/publishers/event_publisher.rb:13:inpublish'

We believed these connections would be self healing, and of course, the issue did not happen locally; we were only able to reproduce in our staging environment.

We originally thought this might be caused by a low connection pool size in our CloudAMQP instance. After upgrading our account, the issue persisted.

After continued research and debugging, we discovered the AMQP connection was occasionally being created before the web server thread was forked. Our original initializers/rabbitmq.rb file looked something like this:

amqp_connection = Bunny.new(AW.amqp).tap do |c|

c.start

end



pool_size = ActiveRecord::Base.connection_config.fetch(:pool, 10)



AMQPConnectionModule.channel_pool = ConnectionPool.new(size: pool_size) { amqp_connection.create_channel }

We use Phusion Passenger as our web server, and some digging revealed establishing the AMQP connection after the worker processes were forked would do the trick. More details on that can be found in the official documentation.

Our new and improved initializers/rabbitmq.rb now looks like this:

module AMQPConnectionManager



mattr_accessor :channel_pool



def self.establish_connection

amqp_connection = Bunny.new(AW.amqp).tap do |c|

c.start

end



pool_size = ActiveRecord::Base.connection_config.fetch(:pool, 10)



self.channel_pool = ConnectionPool.new(size: pool_size) do

amqp_connection.create_channel

end

end

end



AMQPConnectionManager.establish_connection



if defined?(PhusionPassenger)

PhusionPassenger.on_event(:starting_worker_process) do |forked|

if forked

AMQPConnectionManager.establish_connection

end

end

end



if defined?(Resque)

Resque.after_fork do |job|

AMQPConnectionManager.establish_connection

end

end

We’ve defined a new establish_connection method, and will establish a connection immediately. Then, we re-establish a connection after both the Resque and PhusionPassenger workers are forked. Because the forking of these two workers doesn’t always happen in the same order, this protects us and ensures that a connection is always established. Once we made this change, we were good to go.

Wrap Up

The Meteor project was a great learning experience for the entire team. It gave us a clear use case for creating a distinct service outside of our monolithic app. Because our initial goal was to build a new feature, it was easy to start fresh and free ourselves from having to rewrite or copy existing code. We were able to maintain a strong velocity towards satisfying the business requirement, while at the same time laying a foundation to eventually transition existing features to the new infrastructure.

Implementing the Pub/Sub pattern satisfied our immediate need to keep the two application’s databases in sync, but also opened the door for many more micro (or macro) services in the future. Now that we have a reliable system for publishing and subscribing to messages, we can easily build more standalone services that get the data they need — and can ignore irrelevant data.

I’d love to hear how other organizations have taken their first steps towards a microservice architecture, and am also happy to answer any questions anyone has about our approach as best I can.

I’ll leave you with some resources that we found particularly useful as we went down this path; I hope they’re helpful to others!