Here's the quick way I set up zero-downtime deployment on our two production servers.
Most Ruby on Rails sites that use mongrel clusters will have entries along these lines in their Apache config files:
# Check for maintenance file and redirect all requests
# ( this is for use with Capistrano's disable_web task )
RewriteCond %{DOCUMENT_ROOT}/maintenance.html -f
RewriteRule ^.*$ /maintenance.html [L]
# ...
# more exceptions and caching stuff, followed by
# ...
# All remaining requests get sent to the cluster
RewriteRule ^/(.*)$ balancer://jamglue%{REQUEST_URI} [P,L]
# ...
# Configure the cluster proxy
<Proxy balancer://jamglue>
BalancerMember http://127.0.0.1:4000
BalancerMember http://127.0.0.1:4001
# ...
</Proxy>
So the old deployment technique was to put a maintenance message in public/maintenance.html, restart that server's mongrels, then remove the message. That way users that hit the server would see a message telling them about momentary downtime rather than some kind of 503 error or a partially-rolled out exception.
We got tired of having these momentary downtimes dictate our deployment style- there's no reason to prefer waiting until 2am to push out the day's changes, especially with two servers.
So here's my solution, adding a redirect to the other server instead:
# this goes up near the top of the rewrite rules
RewriteCond %{DOCUMENT_ROOT}/maintenance_redirect -f
RewriteRule ^/(.*)$ balancer://maintredir%{REQUEST_URI} [P,L]
# ...
# Configure the maintenance redirect proxy
<Proxy balancer://maintredir>
BalancerMember http://otherserver.jamglue.com:80
</Proxy>
Now instead of putting an error message at public/maintenance.html, we just touch public/maintenance_redirect and the server being restarted will seamlessly send all its traffic over to the other server. If you've got more than two, just add as many balancermembers as you'd like.
There are two improvements on this I've been thinking about.
The first is obviously to catch loops in case both machines are redirecting at the same time (I haven't tried this). It's easy enough to add an extra query string to the maintredir's proxy line, then check it on all the servers so that you never redirect an already-redirected link. For more than two servers, the query string could even contain the hostname, so selective redirection could be done.
This hypothetical query string could be stripped out using a QSA rewrite rule to remove any chance of it getting shown to the rails app or the user.
The second is adding another virtual host or cookie that overrides the redirect flag. That way you can roll your new code to one server, and be the only one to see it up and fully running on your production hardware. When you're happy, you flip the site over and upgrade the other server (which you could then also check).
I had a cookie doing this for testing purposes, but removed it for simplicity's sake. It's as simple as adding a line like:
RewriteCond %{HTTP_COOKIE} !dontredirectme
to the RewriteCond chain, and giving yourself a site cookie with the
same name.
And of course, these could be combined- the automatically added query string that overrides the redirect could be the testing method also.
Of course, all this is completely applicable to a single machine also, and even simpler. Just set up two different balancer clusters and only use each if the appropriate redirect file isn't present. Separate code bases for each would be kind of messy, but certainly the safest.
Of course, because the two servers are sharing a DB, certain deployments still may require brief downtime. However, since we use multi-master replication between the two machines, it's certainly tempting to try to configure a temporary stop of replication from one DB for the duration of the deployment and then let the other catch up once everything looks good... it's probably not worth it though.
Today's photographic
nomination is in Abstract:
previous entry: