Marco.org

I’m : a programmer, writer, podcaster, geek, and coffee enthusiast.

On database joins

@codinghorror:

basically, it’s like this: JOINS ARE EXPENSIVE. Period. Really, really expensive. I’m not kidding.

You’re right. A lot of people overuse joins because they mask a lot of the underlying complexity of what the database actually has to do to achieve what you just asked of it. And they work perfectly well if your tables are small, there isn’t much concurrent access, and performance doesn’t matter.

Unfortunately, for a popular website, your tables are huge, there’s tons of concurrent access, and performance really matters.

Joins also hurt scaling efforts later: what if you move one high-traffic table to a separate database server? No more joins against it.

Don’t underestimate the benefits of some denormalization and avoiding joins.

Fortunately, they’re really quite easy to avoid. For the common case of matching a relational table to its parent objects, we do a simple two-query substitution like this:

$user_ids = $following->query_return_column_array(
    'SELECT user_id FROM ?table WHERE following_id = ?i', $this->id
);
$followed_users = $user->find(
    'SELECT * FROM ?table WHERE id IN ?ai', $user_ids
);

I don’t think there’s a single join in all of Tumblr’s or Instapaper’s code.