« Research on dynamic web page adaptation depending on statistical user behaviour

Research on new antispam measures

  April 9, 2009, by Francois Planque • Category: Research

One of our current areas of research is better comment spam filtering.

Comment spam is mainly the result of spammers trying to place one or several links to their own site on a blog powered by b2evolution. The benefit they get out of such links is the SEO value of these links (which is outside the scope of this article). If comments are published with the links as posted by the spammers, it is indeed a very easy way for them to perform what is called "mass link building" at a very low cost, especially through the use of automated processes ("spambots").

The standard protection against such comment spam is to mark these links with rel="nofollow". Most web publishing platforms do this nowadays. However, we found that spammers will:

  • Either not care about or not notice the nofollow attribute
  • Or they will still find value in the links...

Either way, they will continue to spam blogs as long as they see the links coming through.

One of the approaches experimented by the b2evolution team in the past year was to manage a central antispam database where all affected blog owners could report spam and benefit from reports from other users.

This database has proven to be effective to some point and still manages to filter out the bulk of the spam.

However, as spam develops beyond the traditional high revenue areas known as PPC ("Porn, Pills & Casino"), it gets harder to maintain a database of spam keywords and urls targetting new spam areas such as shoes and travel. Spammers may also register several dozens of new domains per day so as to bypass antispam blacklists.

This leads us to research new spam filtering technologies based more on detecting human behaviour vs automated behaviour.

We are now researching the following ideas:

  • CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) that require the commenter to have read the article instead of just recogizing a few letters.
  • Discarding comments if the commenter has not spent enough time on the blog for possibly having read the article first.
  • Making comment forms harder or impossible to find through search engines.

All these methods have in common that they aim to defeating automated comment posting or at least raise the cost of such activities significantly, mainly by requiring more time per spam comment from the spammer or spambot.

It is not clear how much success we may expect from these ideas but we will publish more after experimentation.


1 comment

Comment from: Steve [Visitor]

Spam is the scourge of the web and wish you luck in finding a solution. I doubt, unfortunately, your success though. I do quite like the idea of a time delay, but surely the bots can simply add a time delay to their software?

01/20/11 @ 16:15