Achieving high availability rests on having good health checks. HAProxy as an API gateway gives you several ways to do this.

Run your service on multiple servers.

Place your servers behind an HAProxy load balancer.

Enable health checking to quickly remove unresponsive servers.

These are all steps from a familiar and trusted playbook. Accomplishing them ensures that your applications stay highly availability. What you may not have considered, though, is the flexibility that HAProxy gives you in how you can monitor your servers for downtime. HAProxy offers more ways to track the health of your servers and react quicker than any other load balancer, API gateway, or appliance on the market.

In the previous two blog posts of this series, you were introduced to using HAProxy as an API gateway and implementing OAuth authorization. You’ve seen how placing HAProxy in front of your API services gives you a simplified interface for clients. It provides easy load balancing, rate limiting and security—combined into a centralized service.

In this blog post, you will learn several ways to create highly available services using HAProxy as an API gateway. You’ll get to know active and passive health checks, how to use an external agent, how to monitor queue length, and strategies for avoiding failure. The benefits of each health-checking method will become apparent, allowing you to make the best choice for your scenario.

Active Health Checks

The easiest way to check whether a server is up is with the server directive’s check parameter. This is known as an active health check. It means that HAProxy polls the server on a fixed interval by trying to make a TCP connection. If it can’t contact a server, the check fails and that server is removed from the load balancing rotation.

Consider the following example:

When you’ve enabled active checking, you can then add other, optional parameters, such as to change the polling interval and/or the number of allowed failed checks:

Parameter What it does inter The interval between checks (defaults to milliseconds, but you can set seconds with the s suffix). downinter The interval between checks when the server is already in the down state. fall The number of failed checks before marking the server as down. rise The number of successful checks before marking a server as up again. fastinter The interval between checks when up, but a check has failed; or the interval when down, but a check has passed. You are transitioning towards up or down. This allows you to speed up the interval.

The next example demonstrates these parameters:

If you’re load balancing web applications, then instead of monitoring a server based on whether you can make a TCP connection, you can send an HTTP request. Add option httpchk to a backend and HAProxy will continually send requests and expect to receive valid HTTP responses that have a status code in the 2xx to 3xx range.

The option httpchk directive also lets you choose the HTTP method (e.g. GET, HEAD, OPTIONS), as well as the URL to monitor. Having this flexibility means that you can dedicate a specific webpage to be the health-check endpoint. For example, if you’re using a tool like Prometheus, which exposes its metrics on a dedicated page within the application, you could target its URL. Or, you could point HAProxy at the homepage of your website, which is arguably the most important.

You can also set a different IP address and/or port to check by adding the addr and port parameters, respectively, to the server line. In the following snippet, we target port 80 for our health checks, even though normal web traffic is sent to port 443:

Some web servers will reject requests that don’t include certain headers, such as a Host header. You can pass HTTP headers with the health check, like so:

You can accept only specific responses from the server, such as a specific status code or string within the HTTP body. Use http-check expect with either the status or string keyword. In the following example, only health checks that return a 200 OK response are classified as successful:

Or, require that the response body contain a certain case-sensitive string of text:

You can also have one server delegate its own status to another server by adding a track parameter to a server line. It should be set to the name of a backend and server. If the tracked server is down, the server tracking it will also be down.

Passive Health Checks

Active checks are easy to configure and provide a simple monitoring strategy. They work well in a lot of cases, but you may also wish to monitor real traffic for errors, which is known as passive health checking. In HAProxy, a passive health check is configured by adding an observe parameter to a server line. You can monitor at the TCP or HTTP layer. In the following example, the observe parameter is set to layer4 to watch real-time TCP connections.

Here, we’re observing all TCP connections for problems. We’ve set error-limit to 10, which means that ten connections must fail before the on-error parameter is invoked and marks the server as down. When that happens, you’ll see a message like this in the HAProxy logs:

The only way to revive the server is with the regular health checks. In this example, we’ve used inter to set the polling health check interval to two minutes. When they do run and mark the server as healthy again, you’ll see a message telling you that the server is back up:

It’s a good idea to include option redispatch so that if a client runs into an error connecting, they’ll instantly be redirected to another healthy server within that backend. That way, they’ll never know that there was an issue. You can also add a retries parameter to the server so that HAProxy retries it the given number of times. The delay between retries is set with timeout connect .

As an alternative to watching TCP traffic, you can monitor live HTTP requests by setting observe layer7 . This will automatically remove servers if users experience HTTP errors.

If any webpage returns a status code other than 100-499, 501 or 505, it will count towards the error limit. Be aware, however, that if enough users are getting errors on even a single, misconfigured webpage, it could cause the entire server to be removed. If that same page affects all of your servers, you may see widespread loss of service, even if the rest of the site is functioning normally. Note that the active health checks will still run and eventually bring the server back online, as long as the server is reachable.

Gauging Health with an External Agent

Connecting to a service’s IP and port or sending an HTTP request will give you a good idea about whether that application is functioning. One downside, though, is that it doesn’t give you a rich sense of the server’s state, such as its CPU load, free disk space, and network throughput.

With HAProxy, you can query an external agent, which is a piece of software running on the server that’s separate from the application you’re load balancing. Since the agent has full access to the remote server, it has the ability to check its vitals more closely.

External agents have an edge over other types of health checks: they can send signals back to HAProxy to force some kind of change in state. For example, they can mark the server as up or down, put it into maintenance mode, change the percentage of traffic flowing to it, or increase and decrease the maximum number of concurrent connections allowed. The agent will trigger your chosen action when some condition occurs, such as when CPU usage spikes or disk space runs low.

Consider this example:

The server directive’s agent-check parameter tells HAProxy to connect to an external agent. The agent-addr and agent-port parameters set the agent’s IP and port. The interval between checks is set with agent-inter . Note that this communication is not HTTP, but rather a raw TCP connection over which the agent communicates back to HAProxy by sending ASCII text over the wire.

Here are a few things that it might send back. Note that an end-of-line character (e.g.

) is required after the message:

Text it sends back Result down

server is put into the down state up

server is put into the up state maint

server is put into maintenance mode ready

server is taken out of maintenance mode 50%

server’s weight is halved maxconn:10

server’s maxconn is set to 10

The agent can be any custom process that can return a string whenever HAProxy connects to it. The following Go code creates a TCP server that listens on port 9999 and measures the current CPU idle time. If that metric falls below 10, the code sends back the string, 50%

, setting the server’s weight in HAProxy to half of what it is currently.

You can then artificially spike the CPU with a tool like stress. Use the HAProxy Stats page to see the effect on the load balancer. Here, the server’s weight began at 100, but is set to 50 when there is high CPU usage.

You can enable the Stats page by adding a listen section with a stats enable directive to your HAProxy configuration file. This will start the HAProxy Stats page on port 8404:

Queue Length as a Health Indicator

When you’re thinking about how to keep tabs on your servers’ health, you might wonder, what constitutes healthy anyway? Certainly, loss of connectivity and returning errors fall into the unhealthy category. Those are failed states. However, before it gets to that point, maybe there are warning signs that the server is on a downward spiral, moving towards failure.

One early warning sign is queue length. Queue length is the number of sessions in HAProxy that are waiting for the next available connection slot to a server. Whereas other load balancers may simply fail when there are no servers ready to receive the connection, HAProxy queues clients until a server becomes available.

Following best practices, you should have a maxconn specified on each server line. Without one, there’s no limit to the number of connections HAProxy can open. While that might seem like a good thing, servers have finite capacity. The maxconn setting caps it so that the server isn’t overloaded with work. The following configuration limits the number of concurrent connections to 30:

Under heavy load, or if the server is sluggish, the connections may exceed maxconn and will then be queued in HAProxy. If the queue length grows past a certain number or if sessions in the queue have a long wait time, it’s a sign that the server isn’t able to keep up with demand. Maybe it’s processing a slow database query or perhaps the server isn’t powerful enough to handle the volume of requests.

When queue length grows, clients begin to experience increasingly long wait times. With a bottleneck, downstream client applications may exhaust their worker threads, run into their own timeouts, or fail in unexpected ways. You can see why monitoring queue length is a good way to gauge the health of the system. So, what can you do when you see this happening?

Applying Backpressure with a Timeout

By default, each backend queue in HAProxy is unbounded and there’s no timeout on it whatsoever. However, it’s easy to add a limit with the timeout queue directive. This ensures that clients don’t stay queued for too long before receiving a 503 Service Unavailable error.

You might think that sending back an error is bad news. If we didn’t queue at all, then I’d say that was true. However, under unusual circumstances, failure can be better than leaving clients waiting for a long time. Client applications can build logic around this. For example, they could retry after a few seconds, display a message to the user, or even scale out server resources if given access to the right APIs. Giving feedback to downstream components about the health of a server is known as backpressure. It’s a mechanism for letting the client react to early warning signs, such as by backing off.

If you do not set timeout queue , it defaults to the value of timeout connect . It can also be set in a defaults section.

Failing Over to a Remote Load Balancer

Another way to deal with a growing queue is to direct clients to another load balancer. Consider a scenario where you have geographically redundant data centers: one in North America and another in Europe. Ordinarily, you’d send clients to the data center nearest to them. However, if a backend has an avg_queue , which is the queue length of all servers within that backend divided by the number of servers, that grows past a certain point, you could send clients to the other data center’s load balancer.

Your North American load balancer would be configured like this:

In the frontend section, there are two ACLs. The first, northamerica_toobusy, checks if the average queue length of the northamerica backend is greater than 20. The second, europe_up, uses the srv_is_up fetch method to check whether the server in the europe backend is up.

The interesting thing about this is that the europe_up ACL is checking a remote address that’s in the European data center by pointing to a backend that contains the address of the European data center’s load balancer. This means that the North American load balancer is not checking the health of the European servers directly. Instead, it’s monitoring a URL that the European load balancer has provided. That’s because the North American load balancer doesn’t have any information about connections that it didn’t proxy and wouldn’t have the full picture of the state of those servers. Essentially, you’re avoiding the problem where each load balancer sends clients directly to origin servers, but without adequate knowledge of all of the connections in play, potentially swamping those servers with connections from multiple load balancers.

When the northamerica backend has too many sessions queued and the europe backend is up, clients get redirected to the Europe URL. It’s a way to offload work until things calm down. The European load balancer would be configured like this:

Here, you’re using the monitor-uri directive to intercept requests for /check on port 8080 and having HAProxy automatically send back a 200 OK response. However, if the europe backend becomes too busy, the monitor fail directive becomes true and begins sending back 503 Service Unavailable. In this way, the European data center can be monitored from the North American data center and requests will only be redirected there if it’s healthy.

Real-Time Dashboard

HAProxy Enterprise combines the stable codebase of HAProxy with an advanced suite of add-ons, expert support and professional services. One of these add-ons is the Real-Time Dashboard, which allows you to monitor the health of a cluster of load balancers.

Out-of-the-box, HAProxy provides the Stats page, which gives you observability over a single load balancer instance. The Real-Time Dashboard aggregate information across multiple load balancers for easier and faster administration. You can enable, disable, and drain connection from any of your servers and the list can be filtered by server name or by server status.

Conclusion

In the blog post, you learned various ways to health check your servers so that your APIs maintain a high level of reliability. This includes enabling active or passive health checks, communicating with an external agent, and using queue length as an early warning signal for failure.

HAProxy Enterprise offers expert support services and a variety of advanced security modules that are necessary in today’s threat landscape. Want to learn more? Sign up for a free trial or contact us. You can stay in the loop about topics like these by subscribing to our blog or following us on Twitter. You can also join the conversation on Slack.