Photo by Brendan Church on Unsplash

Recently I transitioned from a Software Engineer (SWE) to Site Reliability Engineer (SRE) role. Before I made this decision I found myself trying to find online resources from other people that did the same to understand if this will be the right decision for me.

Surprisingly, I couldn’t find enough resources so I decided to write this to answer some of my questions for anyone that will be in the same situation as me in the future.

What SRE means (to me)?

I’d like to start by mentioning that SRE always means something different from company to company. In some cases is used as a “fancy” word for Ops/DevOps roles, sometimes it’s closer to Google SRE or it can be a hybrid of Software developer and Ops.

So, I’d like to focus more on the responsibilities and the skillset that is required to have or willing to learn based on my experiences so far.

Things that you should already like to do

First I’d like to present a list of things that you should already like to do or you are interested in learning more.

Coding. This is an obvious one, you will not stop writing code by becoming SRE. You will probably find yourself writing more code for CLI tools or internal libraries/services than before and less code for product features. As I mentioned before every company interpolates differently the SRE role; so you need to check what is expected from you and find the right balance between coding and doing other tasks.

DevOps. You should already have a DevOps mindset and should be interested in not only writing code but also deploying your services more reliable and managing your infrastructure.

System Design. Architectural skills are important when you design a system on how various components will interact. Moving to an SRE type role this skill becomes more important as you will have to think about the reliability of the system a lot more. Often you will find yourself invited to design discussions for systems that you don’t have enough context knowledge; so you should be prepared to adapt fast.

Standardisation. As an SWE you usually focus on standardising things like API specs, common coding practises, etc. While all these are still important when you move to an SRE role the standardisation to more infrastructure-related things will make your life (and the system reliability) easier. The product teams (by having DevOps mindset) should pick the right tool for the right job, but your role there now will be a consultant to provide “sane defaults” if the team is not feeling very strong/confident about what tool to use.

Automation. Everyone hates doing repetitive tasks. If you like to automate and template things and not always “postpone to do it later”(been there; done that) then you will enjoy working as an SRE.

Communication + Collaboration

Being a member of an SRE team means that you will have to communicate and collaborate with product teams and various other stakeholders daily. You should work with them on an SRE context; focussing on the reliability of their system.

I believe part of SRE role is to bring teams together and start sharing practices and tools between the teams. This is more important and easy to do at small to mid-size companies. Just to clarify, you will not be there to give all the answers and “force” how something will be done but you will be there to assist and enable the SWE to do their job better.

You will find yourself to be the advocate for things like application security, alerting, monitoring, etc. So you must be willing to organise forums/events/documents to educate the teams for a specific subject.

Also, pairing/mobbing and mentoring will be very common things that you will do daily.

Learn, learn, learn

As is expected with a new role; you will be exposed to some new concepts and workflows. Things like postmortem, SLO/SLI/SLA, on-call, GitOps, chaos engineering, disaster recovery will be some of the things that you learn how to do.

Again, it is really important that you are willing to learn and read about these things. I recommend the SRE book, but the one that gave more practical examples was The Site Reliability Workbook. (both books are free)

Conclusion

I hope you found some of my thoughts useful to make a decision.

One last thing, I highly recommend asking colleagues and ex-colleagues that you value their opinion. They already worked with you and know your weaknesses and strengths so they can help you put things in perspective. I found these conversations very useful!

I will close with a tweet from Kelsey Hightower that I respect and admire.

So just do it! Until the next time 👋