On being a FleetOps Engineer at balena

At balena, we help our customers deploy and manage tens of thousands of IoT devices across the globe. The balena ‘fleet’ is extremely heterogeneous, with devices of many different types and architectures, and is constantly growing and evolving.

Coupled with a philosophy of support-driven development, our FleetOps engineers are the "special operations forces of support", often tackling the high-impact, high-complexity cases that affect the entire balena fleet.

A major focus of the FleetOps team is device reliability engineering, or helping to make device management safer for our users, which includes building tools and automating where possible. Examples of past FleetOps projects include: resinhup — our solution for managing host OS updates, and configizer — a solution we developed to more safely adjust on-device configuration remotely.

As a key member of the FleetOps team, you will be constantly alternating between reactive management practices (temporarily relieving customer friction) and preventative maintenance across the fleet. You won’t have just a single component to maintain, but instead you will work both on providing workarounds that can eventually be productized, and on making existing tools more robust and scalable. You will continuously seek new territory for what customers need in the short/medium term, and collaborate with product engineers to effectively handle the 'delta' between what the product is now and where it is heading.

You will actively contribute to product decisions with data from the field. Components like on-device metrics, monitoring, data visualization, and debugging are all common territory for the team. Things you work on today may become new capabilities in the balena platform tomorrow!

Responsibilities



Take customer interactions and issues, write scripts and turn these into tools and products that will enable our users to effectively manage the health of their own fleets



Convert reactive support into preventative maintenance — diving in to solve the problem now with whatever means necessary, but then building and automating tools/products for the entire fleet



Contribute to roadmap, development, and maintenance of key OS features such as remote host OS updates, brownfield migrations, etc.



Help define and educate users on best practices for going to production on balena; you will be a go-to resource for best practices, and will learn and teach the lessons of scaling



Be a key resource for other engineers on support; you’ll often be asked to lend your expertise and contribute to internal docs/cookbooks to extract your knowledge and educate others



Create tools to help monitor and understand the overall health of the balena device fleet



Requirements



Customer-facing skills; ability to understand the actual problem users are trying to solve and work together to find a solution



Dynamic and flexible demeanor, as user requirements and/or the product change frequently



Ability to both hold the big picture in mind and dive into the weeds. You’ll be transitioning between the two in real-time



Having the patience to research and observe patterns, and being methodical and thorough in your approach.



Continuous improvement mindset; you’re constantly thinking about how to automate your manual work and be more efficient



Ability to independently make tradeoff decisions and knowing where your marginal time is most productively spent



Being curious and willing to constantly build on your product knowledge (through projects, tutorials, support shifts, etc.)



Excellent communication skills, and fluency in English



Bonus points



Interest in or familiarity with IoT, embedded software, the balena platform, etc.



Working knowledge of Linux, for example proven ability to ‘drill into’ process, resource and full-stack issues without fear and find failure points



Good scripting experience (shell, Python, Node.js/JS, Rust, etc.) and familiarity with tool building and automation



Being comfortable working in at least one higher-level language, preferably TypeScript or JavaScript



Enthusiasm for investigating ever-changing issues and then planning for and implementing resolutions for these

Docker knowledge

Make sure to let us know if any of these items apply to you. If possible, please also share a sample of your work (URL or attachment).

To apply

We're delighted to hear about you! Along with your CV/Resume, please answer the questions in our application form to help us make an informed initial assessment.