It’s interesting how much web-app downtime people will tolerate when it’s something non-critical. And these are big sites. Tonight Digg was down for a while. Digg. And Twitter has become almost beloved for its frequent downtime.
I take huge precautions to make sure that Tumblr never has any downtime. Hell, I don’t even let Marco.org have any downtime. Even with huge revisions like DB schema changes, I always code it such that there isn’t a single dropped connection or user-facing error. It doesn’t always happen this way, of course - I’ve made mistakes that caused a few minutes here and there of temporary downtime. But I never consider it an option for development.
There are some MySQL configuration options that I’d like to change but I’ve worked around them because the Tumblr database server’s mysqld process has never been intentionally restarted, and there’s no MySQL equivalent (that I know of) to Apache’s “graceful” restart (in which it staggers child process restarts such that no connections are dropped).
I couldn’t even imagine telling our users, “We’re going to be down for the next 6 hours while we run a bunch of database maintenance and ALTER TABLEs.”
But looking at other sites, maybe I’m a bit too narrow-minded on that. Sure, Amazon has trouble if they lose a few minutes’ worth of sales. But if I can’t make a smartass comment about the latest Apple rumor to 105 of my friends and followers right this second, the world isn’t going to end.