Like Rails controllers, it's easy to get comfortable dumping logic into Sidekiq workers. It's all in one place like good ol' imperative style. But overtime, it gets messy, and you find yourself making new Sidekiq workers instead of just fixing or using old ones.

So, before your background tasks in Sidekiq become more chaotic, here are a few tips I have learned in our team.

1. Don't place logic in your worker.

You'd start with 10 lines and think hey, maybe it's alright for this code to be here. Six months later, you find out you need add additional checks on that code. You add 20 more lines. Then, your teammates discover a bug and add ten more lines.

Then, a newbie decides that your logic is great. She wants to re-use it. But hey, the logic is fitted for the worker use case. Instead of risking a fuss in his first deployment, he just copy-pastes your code into his worker. Yuck. Now you have two copies of the same code evolving differently.

Point is, logic almost always grows. Be responsible. Take the time to think about where to place the logic - in an interactor, service object or model, where it belongs.

2. Don't make your workers too big.

Sidekiq is made for small tasks. Small, lightweight tasks. It isn't designed for long running workers.

So, how do you know your workers are "too big"?

There's a loop. For example, we have an Invoice model attached to an Order. Every time we issue an Order, we give an Invoice and reserve the stock for the items ordered. Invoices expire after 5 days of being issued. So, it brings us to this worker...



class InvoiceExpirerWorker include Sidekiq :: Worker include Sidetiq :: Schedulable def perform expired_invoices = Invoices . expirable expired_invoices . each do | invoice | invoice . expire! end end

Now, this code snippet may look harmless to you, but when you look at the Invoices model, there are 100 lines of code involved in expiring an invoice: cancelling the order, returning the stock of the items, emailing the customer, updating the records, etc.

Imagine doing that for 10,000 invoices expiring today. Imagine each invoice taking 20 seconds to fully expire, multiplied by 10,000 invoices.

Point is, the work is too much for a single worker.

In our team, we had around a dozen workers that don't finish execution. Why? Precisely because it does WAY too much than a single Sidekiq worker is designed for. The workers just run out of memory and Linux kills that task off, and we are left with hundreds of unexpired invoices that should have been expired. Great.

So, what do we do now?

Use a master worker to spawn smaller workers. The master worker is tasked with constructing a list of invoices to be expired. It goes through them one by one, and calls a separate worker on each one. If there are 10,000 invoices in the list, then 10,000 lightweight workers would be created.

That way, it has a greater chance of being completed because each worker doesn't take too long.

Here's a better pattern:



class BatchInvoiceExpirerWorker include Sidekiq :: Worker include Sidetiq :: Schedulable def perform expired_invoices = Invoices . expirable expired_invoices . each do | invoice | InvoiceExpirerWorker . perform_async ( invoice . id ) end end end class InvoiceExpirerWorker include Sidekiq :: Worker def perform ( invoice_id ) invoice = Invoice . find ( invoice_id ) invoice . expire! end end

Instead of one big worker taking 200,000 seconds to finish (that's around 55hours everyday), we now have 10,000 small workers queued up to run for 20 seconds each. Better.

3. Organize your code into directories.

We started with just 10 workers. Then it became 20, then 40, then 50. Now, it has 100 workers in apps/workers .

Do yourself a favor. Organize your code into directories, please.

4. Plan your Sidekiq execution schedules and prioritization schemes.

If you think 3 AM is a safe time to add workers… Well, think again. Maybe three other developers thought of that as well, and would add their own workers at 3 AM.

Let’s go back to our example earlier of 10,000 small Invoice expirer workers mentioned in #2 – let’s refer to that as Worker 1. For example, you have decided to add Worker 1 at 3 AM. Then, another worker – let’s call it Worker 2 – is queued around the same time as Worker 1. Yikes. If Worker 2 is a time critical worker, you may have to wait more than an hour to get that.

To avoid this, create a tracker. It could be as simple as a Google spreadsheet, or better yet, an automated system so you don't have to clean up after every developer adds/changes Sidekiq schedules. The tracker gives you an updated view of what workers are schedule when. This way, you can avoid Worker 1 getting in the way of the more time-critical Worker 2.

Now, don’t stop there. There would be instances wherein a certain worker would have to be prioritized over another, and you must also take that into account.

Let’s say you plan Worker 2 to run on a higher priority. That way, when it runs at around 3:05 AM, it will run ahead of Worker 1.

However, what if, in the future, someone else adds another worker (eg. Worker 3, Worker 4, etc.) of higher or equal priority on top of what is currently scheduled? This would add some delay before your worker gets done.

Prioritization in Sidekiq is done by organizing your workers in queues. Each queue has a corresponding priority level in Sidekiq. Read more about it here: https://github.com/mperham/sidekiq/wiki/Advanced-Options

Hence, it’s better to plan both prioritization and scheduled worker timings together in order to have a better chance of faster Sidekiq execution times, and lower under-utilization of your Sidekiq instance.

5. Periodically review your Sidekiq tasks.

Maybe a task before took just a few seconds. Now that there are 10x more orders, the task maybe taking 5minutes. Fast forward to this year, it may not even finish executing. It is now time to refactor the worker based on the principles above.

Conclusion

I hope this blog post saves you time and effort of sorting through Sidekiq workers. By having a standard practice in place, you save yourself (and everyone!) the hassle of trying to review your code, or simply finding out why a worker failed.

Special thanks to Allen, my editor, for helping make this post more coherent