Designing for Reliability — Using an Internet Door Buzzer as an Example Tom Klinovsky Follow Jan 26 · 5 min read

A few months back, I moved in to a new place, an old apartment building. Not too old, however, to have a door at the lobby, which posed a bit of a problem for me.

You see, every month I order my dogs’ food from the pet store, and their delivery times overlap with my working hours almost entirely. So usually they’ll leave it by my door and I’ll pull it in when I get back. But they can’t really do that if they can’t get passed the lobby door, and if they can’t, I’ll get billed for the delivery and not get the dog food. Not good.

Lucky for me, that door in the lobby can be buzzed open with an intercom installed in my apartment, and I am the proud owner of two unused Raspberry-Pi computers. Since the buzzer is just a button that closes a circuit, I could hook it up to the raspberry pi via a relay, the raspberry pi to the internet, and then I could buzz the door open from anywhere, at any time. Now all that’s left, is how to implement the program so I’ll be able to control it from my phone.

Option #1 — The Naive Approach

Thinking of the problem in the simplest terms, I can have the Raspberry-Pi trigger the buzzer in response to a POST request, and have a small web page with vanilla javascript display a button to send that POST request, and host it on the raspberry pi as well. That would work, but let’s stop for a second to think what could go wrong.

It’s important to me that my remote door buzzer not only works, but works when I need it to. There’s not much use for a cool IoT device if, when the delivery arrives and I press the button to let them in, I get a “Cannot connect to server” error, because the IP at my home router changed. Or if in a while I’ll switch my router, and I won’t be able to forward a connection to it. Or if the power goes out for a minute and resets my raspberry pi.

My point being, when you are designing a project, before jumping in to writing code, think of what could go wrong, and see what you could do to prevent it. One way to do so is to mitigate each issue separately, but if you detect many problems that are hard to tackle, try to step back and think if you can change the design of your project, replacing the harder issues with trivial ones.

Option #2 — Complexity Is Not All Bad

In my case, many of the issues originated from the fact that the Raspberry-Pi will always be behind my router. If I wanted to mitigate those issues in option #1, I would need to set my router to forward packets to its internal IP, I’ll have to set a static or reserved IP for it in my router, I’ll have to set up a service like no-ip to track my changing IP, and if I’ll ever change my router or internet provider, I’ll have to do all of this over again. Considering all that, lets try something else.

Instead of having the Raspberry-Pi exposed, I’ll have it connect on startup to a remote server on the cloud (which is always exposed), using a plain TCP connection. The server will hang on to the connection and keep it alive by sending periodical heart-beats. On the other side, it will also serve a web page with vanilla javascript, similar to the one from option #1, the only thing different is that upon receiving the POST request, the server will use its connection to the Raspberry-Pi to send a command to open the door. And just like that, no router related issues.

Now a critic might say that I’ve added a lot of complexity, I now have a server in the way that I didn’t have before, and that also needs to be maintained. And that’s true, I’ve exchanged my router problems with other problems, possibly more, and now I need to think if the deal is worthwhile before I take it.

Here are my new problems: the server could go down, the connection to the server could be lost, and the Raspberry Pi could still be reset by a power failure (it’s an old building, after all).

I’ve decided to take the deal and go with Option #2. Here’s why: A server on the cloud is a lot less likely to go down than any of the home internet issues are likely to occur. I can tackle the issue of losing the server connection by programming the Raspberry-Pi to reconnect to the server if it misses too many heart beats, and as for the issue with the power outage resets, I’ll just add my program to systemd using a unit file, and it will also restart it for me if the program crashes for any other reason. Most importantly, unlike option #1, in which if I ever switched my internet provider or my router, I’ll have to solve everything again, once I solve these problems, they will remain solved.

What should you take from this?

That was a really fun project, but how does this apply to the real life of a Software Engineer or a Site Reliability Engineer?