Zero-downtime deployment with HAProxy as load balancer

I want to deploy new versions of an application with no downtime. It turns out to be a bit tricky. Here is one solution that sort of works.

The Problem

I am not in control over the deployment process, all I can do is monitor an URL and stop sending traffic to it if there are errors.

I want to deploy small changes often to reduce the risk associated with large deploys. This is not a distributed system with lots of small services, it is a monolith that is redeployed often.

The Solution

The solution is to have more than one server handling the load and divide the traffic between these servers. The technique is called load balancing and is not new. All I have to do is to setup a load balancer and configure it properly.

Two categories of load balancers

Load balancers work on layer 4, the transport layer. Or layer 7, the application layer. I want to load balance a web application so a layer 7 load balancer is what I need. The layers here refer to the OSI model.

Using HAProxy as a layer 7 load balancer does the trick.

Installing HAProxy

The installation of HAPoxy is different on different systems, I installed it on an Ubuntu 16.04 like this:

    apt-get install software-properties-common
    add-apt-repository ppa:vbernat/haproxy-1.7

    apt-get update
    apt-get install haproxy

I found the instructions at https://haproxy.debian.net/ and was able to install the latest version, 1.7 as of this writing.

Configure HAProxy

Installing HAProxy was the easy part, the real work was in tuning its configuration. I ended up with this configuration in /etc/haproxy/haproxy.cfg


global
	log /dev/log	local0
	log /dev/log	local1 notice
	maxconn 2000
	chroot /var/lib/haproxy
	stats socket /run/haproxy/admin.sock mode 660 level admin
	stats timeout 30s
	user haproxy
	group haproxy
	daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect  5000
    timeout client  10000
    timeout server  10000

frontend loadbalanser
    stats enable
    stats uri     /admin?stats
    bind *:80
    mode http
    default_backend gfr

backend gfr
    stats enable
    stats uri /admin?stats
    mode http
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    option httpchk GET /service/foretag/6.0/ws?wsdl
    server gfr1 l7700744.ata.ams.se:8580 check rise 8 downinter 30000ms observe layer7 on-error mark-down
    server gfr2 l7700745.ata.ams.se:8580 check rise 8 downinter 30000ms observe layer7 on-error mark-down

The most important part is the two last lines. They specify two different servers that should handle the load.

server - indicates that this line specifies a server
gfr1 - a logical name for the instance
l7700744.ata.ams.se:8580 - the host and port where the application is served
check - indicates that this server should be checked if it is online or not. The option httpchk defines how the check will be done
rise 8 - the number of succesful health checks that are needed before the server is considered to be operational
downinter 30000ms - the time between health checks when the server is down. In this case, 30 seconds
observe layer7 - monitor the application response codes
on-error mark-down - mark the server as down if an error is received

The real magic, and tuning, was to find values for the server specification so a deploy could be done while using the servers. I used the servers by adding some load generated using Gatling.

The health check was performed using an HTTP call to a url where I check if the wsdl for a web service is available or not. If it isn't, the application isn't up and running.

option httpchk - an http check should be done to verify that the application is alive
GET - the http verb to use when doing the http check
/service/foretag/6.0/ws?wsdl - the url that should respond properly

Result

The load balancing works. When a server responds with an error, that particular server is marked as down. It will come back when the deploy is done and the expected wsdl is available again.

I still lose a few calls during deployment. With constant load, about twice the production load, I lose approximately ten calls per server when they are reinstalled. That's not good, but given that I'm not able to alter the deploy process, I guess it will have to do.

I wish I could find a setting that resends a failed call once to another server, but I can't find one that works. The option redispatch seemed promising, but it didn't work well for me. When I had option redispatch and retries set I lost more traffic compared to not having them set.

A better result

If I could change the deploy process, I would change it so that the server that is about to be re-deployed is removed from the load balancer before the deploy. HAProxy is really good at reloading its configuration. A script that removes a server, reloads HAProxy's configuration, performs the deployment, adds the server again, and finally reloads the configuration would not be too hard to write. This would give me a real zero-downtime deployment. Not just short downtime deployment as I am able to achieve with this setup.

Conclusion

HAProxy works very well. It is possible to re-configure it during usage without losing traffic.

Acknowledgements

I would like to thank Malin Ekholm for proof reading.

Resources

HAProxy - an open source load balancer
Thomas Sundberg - author