Marco.org

I’m : a programmer, writer, podcaster, geek, and coffee enthusiast.

The lesson of the Sidekick failure

Earlier this week, all data stored on T-Mobile Sidekick devices, including contacts, calendars, messages, and photos, had almost certainly been lost in a major infrastructure disaster by Microsoft. Fortunately, Microsoft is now claiming that “most, if not all” of the user data will be restored. But the Sidekick products and T-Mobile have suffered irreparable reputation damage.

It’s easy to jump on Microsoft about this, but I can’t fault them entirely. They absolutely should have had offline, offsite backups. But major outages and data loss happen all the time. Our industry is based on incredibly complex and interwoven systems that shuffle massive amounts of data around, frequently hitting physical and practical storage, bandwidth, and performance limits that need to be worked around in ways that necessarily add complexity, dependencies, and potential failure scenarios.

The real problem is the Sidekick’s design, as I learned from TidBITS’ take on the disaster:

Unlike many smartphones, Danger-based phones store data in a cloud - servers located hither and yon that you don’t manage, but are imagined to be universally and continuously accessible. These phones retrieve information as necessary and cache a temporary copy on the phone, a copy that’s not intended to be a permanent set of stored records. The data also isn’t intended to be synced to a computer as with a BlackBerry, iPhone, Android, or other smartphone, but accessed via a Web site or a phone.

It sounds more like the Sidekick data lives on something akin to an IMAP server: you can have some local copies, but the server is master. With IMAP, the copies persist even after a computer or phone restarts, of course, just as you might expect. But in the Danger approach, locally cached data is erased on restart or if the battery runs out of power.

This design, not a failed SAN upgrade with no backups, is the most severe flaw and most negligent mistake here. It didn’t technically cause the Sidekick disaster, but it dramatically increased its severity from an inconvenient service outage to a complete loss of all customers’ data.

The “cloud” — hosted, centrally-managed services — cannot be your only copy of data. Just as RAID is not its own backup, cloud services are not inherently backed up, although they usually make every effort to maintain data integrity and regular backups. But even when done well, that only accommodates for a subset of loss scenarios. For example, if someone gains access to your account and “legitimately” (as far as the service is concerned) deletes your data, or a botched sync operation unhelpfully synchronizes a mass deletion across all sync clients, cloud infrastructure probably can’t help you. Even if they have offline backups, the chances of them accessing them just to get your old files from an isolated incident are slim.

You aren’t in control of your data if you can’t easily and frequently make useful backups onto your own computer and your own media.

I recognize that it’s hypocritical for me to say this as the lead developer of Tumblr, which does not yet offer an automated feature for users to download backups of their blog content. So I took some time this week and started to write one. I’m happy to announce that Tumblr will be releasing an easy backup tool in the coming weeks. (I will also make an easy backup feature for Instapaper shortly.)

All of my blog’s content, with images, is less than 200 MB. A list of my entire Instapaper reading history is less than 1 MB. The sum of my contacts and calendar data, synced by MobileMe, is probably less than 5 MB. That’s nothing, and given how much time I’ve put into the creation of all of this data, and that it would only consume a third of a $0.26 Taiyo Yuden CD-R (or less than 5% of a $0.45 TY DVD+R), it’s embarrassing that offline backups onto my own media haven’t become routine.

So I’m hereby starting the trend of backing up my hosted data just as carefully, completely, and frequently as my local files. I know this won’t spread to most people, because most people don’t care. But I certainly do, and if you’ve made it this far into this post, you probably do, too.

If my data suddenly and permanently disappears from a hosted service, it should only be an inconvenience, not a loss.