systemd is a pretty great way to schedule cron like tasks using Linux. In the past, I’ve used everything from cron to full blown distributed task scheduling systems. For my latest project (kickdb.com), I had been using a pretty hacky combo of tmux, while, and sleep on a tiny Digital Ocean server, which eventually became cumbersome. I needed to find a better alternative.

My requirements for a task scheduler were:

Run some quick tasks every 15 minutes

Run some long running tasks every 12 hours (sometimes these tasks take over 30 hours)

Don’t run a task if the same type of task is already running. For example, don’t start a new instance of the adidas.com scraper if an old instance is still running.

After learning about systemd, I learned that it met all my requirements and had some pretty awesome benefits:

Automatic log rotation

Random delays: add RandomizedDelaySec to a timer

to a timer Memory and CPU limits: add CPUQuota or MemoryLimit to a slice

or to a slice Easy to debug: systemctl lets you view logs, exit codes, and timer history

I’m still a novice at using systemd, so please keep in mind that the configuration and commands below may be incorrect.

My systemd setup

KickDB.com scrapes around 60 sneaker stores. My first attempt at using systemd involved setting up a recent import service and a full import service for each of the stores. Unfortunately, whenever I installed and enabled these 120 services, the server would run out of memory and CPU and need to be power cycled. I spent a few days trying out different configurations, but at the time I couldn’t avoid having too many node processes starting at once. So, instead I grouped the sites by platform and now run only 10 services.

To setup my cron-like system, I used these systemd features:

Services describe the work do. I have five recent import services and five full import services. The only real difference between each service is the command run in ExecStart .

describe the work do. I have five recent import services and five full import services. The only real difference between each service is the command run in . Timers describe when to run the work. I have five recent import timers and five full import timers. The recent scrapers run every 15 minutes, while the full imports run every 12 to 24 hours.

describe when to run the work. I have five recent import timers and five full import timers. The recent scrapers run every 15 minutes, while the full imports run every 12 to 24 hours. Slices describe resource limits. I have one slice that limits the resources used by all my services as a whole.

shopify-recent.service

Notice that ExecStart uses the full path to node. WorkingDirectory launches the script in the same folder as my code.

# /etc/systemd/system/shopify-recent.service [Unit]

Description=Runs shopify recent scraper

Wants=shopify-recent.timer [Service]

ExecStart=/usr/local/bin/node /app/shoes-scraper/src/scraper --recent --platform shopify

WorkingDirectory=/app/shoes-scraper

Slice=shoes-scraper.slice [Install]

WantedBy=multi-user.target

shopify-recent.timer

This timer runs every 15 minutes + some random duration below 15 minutes.

# /etc/systemd/system/shopify-recent.timer [Unit]

Description=Run shopify-recent every 15-30 minutes

Requires=shopify-recent.service [Timer]

Unit=shopify-recent.service

OnUnitInactiveSec=15m

RandomizedDelaySec=15m

AccuracySec=1s [Install]

WantedBy=timers.target

shoes-scraper.slice

Because all my scrapers depend on shoes-scraper.slice , I can make sure my scrapers collectively never use more than 80% of the CPU or more than 2.7G of RAM on my server. Doing something like this with cron would be pretty tough.

# /etc/systemd/system/shoes-scraper.slice [Unit]

Description=Limited resources Slice

DefaultDependencies=no

Before=slices.target [Slice]

CPUQuota=80%

MemoryLimit=2.7G

Installing the services

To install and start these services I run:

systemctl stop shopify-recent shopify-full ... systemctl daemon-reload systemctl enable shopify-recent.timer shopify-full.timer ... systemctl start shopify-recent shopify-full ...

If something went wrong when setting up a service, you can usually find it out by running systemctl status .

I don’t like running these commands every time I make a change, so I wrote a script to generate and install my systemd configuration.

Handy commands

systemctl start SERVICE

systemctl stop SERVICE

systemctl status SERVICE systemctl list-timers # view the status of the timers journalctl # view the full systemd logs in less

journalctl -u SERVICE # view the logs for a specific service

journalctl -f # tail the logs

journalctl -f -u SERVICE # tail the logs for a specific service

Conclusion

In conclusion systemd can be a nicer alternative to cron. Compared to cron, it can take a lot of time to configure systemd. But, systemd makes other things easier, like debugging errors, setting CPU and memory limits, and randomized scheduling.

I shared this blog on Reddit and got some pretty great feedback. Here’s some I’d like to highlight: