Here at Algolia, we love SaaS companies (we’re not at all biased) and use a lot of the different services they offer. Before selecting a particular service, we go through the usual process of comparing the different features and prices, but we also dig a little deeper and think about other aspects such as the infrastructure of the service.

This is especially important to us when the proposed service will play a critical role in our own infrastructure, like enabling our cloud DNS or helping provide support to our users. Here are five criteria we seriously consider that may be overlooked when selecting an infrastructure critical SaaS provider.

1. Support

Having top notch support that you can rely on is mandatory for all critical services you use. I recommend taking the time to specifically engage support and test their capabilities and capacity during the trial period. If support is not excellent when they’re trying to sell you, there’s a high probability that it won’t blow you away later on. I also like to to ask if they provide 24/7 support with SLA requirements on the response time. If an issue comes up in the middle of the night, will they wake someone up to fix it, or will you be the only one awake, pulling your hair out?

A good way to ensure a high level of support is to verify the tools used for incident management and the escalation policy they have in place. It’s always great to hear that someone is using a tool like PagerDuty or VictorOps and has someone on call around the clock. If they don’t seem to understand the concepts of incident management and escalation, run away! Fast!

2. High availability

The availability of a service’s infrastructure should also be part of your consideration process, but a lot of SaaS providers may not even bring this up as part of the conversation. Fortunately, there are a few questions you can ask to determine whether they maintain high availability.

Is the service distributed across multiple datacenters? The minimum you should accept is a service distributed across multiple availability zones (especially the data storage component). If the service is fully hosted in one datacenter, I would not recommend using it for critical purposes.

The minimum you should accept is a service distributed across multiple availability zones (especially the data storage component). If the service is fully hosted in one datacenter, I would not recommend using it for critical purposes. Is the service distributed across multiple providers? If high availability is important, you should check that everything is redundant, including the hosting providers.

If high availability is important, you should check that everything is redundant, including the hosting providers. Is the service distributed across multiple regions? Few services have this level of distribution. Multi-region distribution ensures not only high availability, but positively impacts performance as well.

Few services have this level of distribution. Multi-region distribution ensures not only high availability, but positively impacts performance as well. Are there any SPoFs (single points of failure)? This is a tough question, but it’s definitely worth asking so you have a better idea of the service’s architecture.

When it comes to availability, I don’t put much stock in what’s promised in the SLA. I have seen too many companies promise 100% who really don’t have the architecture to back it up.

3. Trust

One tricky question you should probably ask yourself before choosing a service is—Can I trust this company, or are they completely nuts? The age of a company and a company’s certifications are a decent indicator of stability, but an older company doesn’t necessarily make for a more trustworthy company.

One of the first elements I always check that can give a lot of information regarding the transparency of a company is their service status page and the level of detail this page goes into. I also search for at least one Post Mosterm Analysis (also called Root Cause Analysis) of a previous outage. I don’t know of any serious services who haven’t experienced at least one outage, and looking at the way they communicate on the outage and the subsequent followup to ensure it won’t happen again can give you a lot of information about the technical skills of the company and the transparency you will get as a customer.

4. Security

If you are sending important data to the service, you probably care a lot about the security of the solution. I personally don’t rely on security certifications anymore because I have seen in practice that they do not mean much. I prefer to ask the following basic questions to get a better idea of the security of a solution:

Who has access to your servers? Are they employees? I like to see that only a small number of people who are part of the company have access to the server. I personally consider it a show stopper if servers are managed by an external company.

I like to see that only a small number of people who are part of the company have access to the server. I personally consider it a show stopper if servers are managed by an external company. What is the process for accessing a production server? I like to hear an answer like “SSH via passphrase but requires a VPN connection that is established via a two factor authentication,” or at least something that shows there are several actions required in order to gain access to the server. You would be surprised to hear the number of stories where a stolen laptop resulted in someone gaining access to the entire infrastructure of a company.

I like to hear an answer like “SSH via passphrase but requires a VPN connection that is established via a two factor authentication,” or at least something that shows there are several actions required in order to gain access to the server. You would be surprised to hear the number of stories where a stolen laptop resulted in someone gaining access to the entire infrastructure of a company. How do you test the security of your software? There is no magical answer to this question, but I like to hear that there is at least a penetration test done by an external company or, even better, a public bug bounty program that involves the community such as HackerOne or BugCrowd.

5. API Quality (Optional)

If you plan on using a service’s API, you need to make sure that it’s stable and scales well. We’ve had a few surprises with products that worked really well, but when it came down to it, the APIs were almost unusable because they either went down too often or had a high rate limit and just couldn’t scale. I recommend testing the API in real conditions before making the decision to move forward with a service with an API, especially if you plan on making a one-year-commitment to the service.

At the end, infrastructure is key!

When an external service is critical, you need to understand the infrastructure of this service. I hope this checklist will help you better make your next SaaS decision. If you are still not convinced of the solution after this list, it isn’t necessarily a terrible idea to use the solution, but you probably need to implement a fallback solution in case of failure. The price of the fallback service itself is usually not very high, but it can cost a lot of money to maintain usage of several services in your software. As per usual, this is a tradeoff 🙂

I would love to hear how you evaluate your mission-critical SaaS providers, especially if you think we’re forgetting something important! Feel free to leave us a comment on this post to add your top considerations.

Illustration by Justas Galaburda