How do you determine when you need to replace a third-party tool with a custom solution built by your own engineering team?

This is a question that software engineers have undoubtedly asked themselves several times throughout their careers. When building software and iterating quickly and efficiently, it’s necessary to use third-party services, libraries, and API’s to solve specific tasks — otherwise you’d have to constantly reinvent the wheel and you’d never get any work done. The trade-off is that you don’t have full control over the service, but your team then gets to focus on larger, more important software and product issues.

Experienced engineers know that there is a point at which a third-party service won’t cut it anymore, and instead a custom solution — specifically designed to cater to their requirements — needs to be built internally. But how do engineers make the decision as to when a third-party service needs to be rebuilt?

It reduces to a question of performance. At some point the service being used isn’t capable of doing what your software needs it to do, so the only sustainable, long-term option is to rebuild the service with the necessary optimizations internally.

Observing Inefficiencies

Because software can get so complex and the dependencies so intertwined, it’s important to build fault-tolerant, resilient systems that can handle the occasional, but inevitable bugs, errors, outages, network delays, etc.

At a certain point it becomes difficult to improve, or even maintain, the performance of your software if a service or library that it relies on is being pushed to its boundaries. It will reach a point where it can’t handle the scale or volume of your software anymore, or it’ll become difficult to build new features because the service will act as a blocker in some way.

Oftentimes, it’s beneficial to use a third-party service to solve a certain task and after using it in production and observing the performance of it, then to build it internally (or build on top of the service) if needed with specific optimizations and features. A few examples of companies building tools internally:

Amazon building a key-value store because of past performance issues.

building a key-value store because of past performance issues. Uber building a mapping service because of issues with Google Maps.

building a mapping service because of issues with Google Maps. Netflix building a caching service as an improvement over memcached.

building a caching service as an improvement over memcached. Netflix building on top of React by creating React-Gibbon.

building on top of React by creating React-Gibbon. Facebook creating Yarn with improvements over npm.

creating Yarn with improvements over npm. LinkedIn building Kafka because ActiveMQ was difficult to scale.

building Kafka because ActiveMQ was difficult to scale. Facebook improving parts of the PHP language by creating Hack.

improving parts of the PHP language by creating Hack. Google building TensorFlow as an improvement over DistBelief.

building TensorFlow as an improvement over DistBelief. Airbnb building their own payments platform after experimenting with other services.

By first using a popular third-party service as a component in your own software, you observe how it functions and learn what its positive and negative features are. In doing so, you’re able to identify exactly how the service should be optimized to perform better within your own software.

Rebuilding a Video Streaming Server

Haggai Weiser, VP of Engineering at Alpha, encountered exactly this type of issue when his engineering team was building a video streaming application that had to record a user’s screen and the audio from their microphone while they were taking a test online. Once the user was finished, the software would process the two media streams and create a final media file.

To get something up and running, the engineering team decided to use a leading third-party service that focused on WebRTC and video streaming services. The process of incorporating the service into their software was fairly straightforward and only took a few days. A few weeks after pushing the entire tool to production, it was processing several hundred videos per week. But soon after the team ran into an issue that rendered almost half of the videos unusable.

We found that over 40% of our videos were out of sync by several seconds. This rendered a lot of videos unusable because you would clearly see a delay when the user was interacting with the prototype and talking.

The server that was processing the media streams wasn’t syncing the video and audio files together correctly, so the first thing the engineering team did was analyze the network statistics that were recorded on the third-party’s servers.