March 22, 2007

Top story on Digg states "75 per cent of Google's blogspot blogs are spam."

The dugg story is a post from reporting on a post on WebMasterWorld reporting on a wire story on Yahoo! News reporting on joint project from Microsoft Research/UC-Davis to study web spam.

Unsurprisingly, some details got lost in translation ... literally, I think, as I'm gonna give the infoniac poster the benefit of the doubt and assume that no native English speaker would write "what would happen if certain World Wide Web structures, among which Google, Yahoo and other search engines, didn't fight spam?"

In any case, it turns out that the study didn't look at all URLs and count the spam. They took a bunch of commercial keywords like "prescription drugs" or "cheap tickets" and looked at the percentage of sites that were spam for specific domains. It's not terribly surprising that a high percentage of spammy sites would be found.

But that's far different than saying 75% of all sites are spam.

Update: As Kevin points out in a comment, what the study actually found was that 75% of blogs likely to be spam actually were spam.

