Marco.org

I’m : a programmer, writer, podcaster, geek, and coffee enthusiast.

Google’s decreasingly useful, spam-filled web search

Jeff Atwood, in Trouble In the House of Google:

People whose opinions I respect have all been echoing the same sentiment — Google, the once essential tool, is somehow losing its edge. The spammers, scrapers, and SEO’ed-to-the-hilt content farms are winning.

(via Anil Dash’s nice roundup on the issue)

I’ve been frustrated as well by Google’s apparent defeat by spam. It’s not a sudden issue — it’s been gradually worsening for a few years.

When I ask Google for something, it’s usually from these types of queries:

Over the years, the impact of spam — mostly affiliate marketing and auto-generated splogs — has decimated the usefulness of the “product research” category. It’s impossible to do any meaningful product research with Google.

But recently, spam has taken over the “guide” query results, and even many “reference” queries. It wouldn’t surprise me if spam even started defeating the “address bar” queries — Google’s ranking algorithms recently have had a lot of trouble detecting the canonical source of duplicated content.

In other words, it’s now nearly impossible to find good results for many commonly asked types of queries.

Part of what exacerbates this is the apparent explosion recently of cheap-“content” sites that try to answer every search query ever asked. Like affiliate-marketing spam, much of it seems to be generated by humans (technically — I wouldn’t classify them as such), but it’s functionally useless: sites like About.com, eHow, and countless clones with .info domain names that promise to address every niche question and informational topic, but whose content lacks all quality and substance.

Google was designed to play the role of a passive observer of the internet: web content was created for people, not specific Google queries, and Google would look around, take inventory of what was available, and give it to people who asked. Google’s general, big-picture algorithms probably haven’t changed much since the days when this was relatively accurate.

But that’s no longer what web content looks like. Now, massive amounts of technically-not-spam sites are generated by penny-hungry affiliate marketers and sleazy web “content” startups to target long-tail Google queries en masse, scraping content from others or paying low-wage workers to churn out formulaic, minimally nutritious pages to answer them.

Searching Google is now like asking a question in a crowded flea market of hungry, desperate, sleazy salesmen who all claim to have the answer to every question you ask.

“Hey, anyone know how to wire an outlet?”

“Did you say ‘how to wire an outlet’?”
“I can help you with how to wire an outlet!”
“Here is info on how to wire an outlet!”
“Bargain prices on how to wire an outlet!”
“Guide to wiring outlets in New York, right here!”

And none of them actually know a damn thing about what you’re asking, of course — they’re just offering meaningless, valueless words that seem to form sentences until you actually try to make use of them.

They call this “content”. But it’s not, really — it’s filler. And by a more common-sense definition, it’s spam. But Google either doesn’t think so, or is so overwhelmed by its volume that it has seemingly stopped trying to keep it under control. (I’m betting on the former.)

One solution may be for Google to radically change their algorithms and policies for web search to de-emphasize phrase-matching and more strongly prioritize inbound links and credibility. And, in what’s probably a huge departure for them, have human employees use their opinions of site quality to manually adjust the relevance of domains.

But I doubt we’ll see real progress. Instead, I expect Google’s unwillingness to address this issue to create a critical-mass demand — and hopefully, then, a supply — of good content, reference information, and product recommendations.

Much of this will be (or currently is) solved the old-fashioned way: personal recommendations and trusted authorities. But these can’t cover the breadth of available information that web searchers need. I don’t know what will, or when, but it’s desperately needed.