Research on new antispam measures

Posted by on Apr 09, 2009 in Research

One of our current areas of research is better comment spam filtering.

Comment spam is mainly the result of spammers trying to place one or several links to their own site on a blog powered by b2evolution. The benefit they get out of such links is the SEO value of these links (which is outside the scope of this article). If comments are published with the links as posted by the spammers, it is indeed a very easy way for them to perform what is called "mass link building" at a very low cost, especially through the use of automated processes ("spambots").

The standard protection against such comment spam is to mark these links with rel="nofollow". Most web publishing platforms do this nowadays. However, we found that spammers will:

Either not care about or not notice the nofollow attribute
Or they will still find value in the links...

Either way, they will continue to spam blogs as long as they see the links coming through.

One of the approaches experimented by the b2evolution team in the past year was to manage a central antispam database where all affected blog owners could report spam and benefit from reports from other users.

This database has proven to be effective to some point and still manages to filter out the bulk of the spam.

However, as spam develops beyond the traditional high revenue areas known as PPC ("Porn, Pills & Casino"), it gets harder to maintain a database of spam keywords and urls targetting new spam areas such as shoes and travel. Spammers may also register several dozens of new domains per day so as to bypass antispam blacklists.

This leads us to research new spam filtering technologies based more on detecting human behaviour vs automated behaviour.

We are now researching the following ideas:

CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) that require the commenter to have read the article instead of just recogizing a few letters.
Discarding comments if the commenter has not spent enough time on the blog for possibly having read the article first.
Making comment forms harder or impossible to find through search engines.

All these methods have in common that they aim to defeating automated comment posting or at least raise the cost of such activities significantly, mainly by requiring more time per spam comment from the spammer or spambot.

It is not clear how much success we may expect from these ideas but we will publish more after experimentation.

I am the founder and maintainer of the b2evolution project.

I started b2evolution early 2003 after the "b2" project was abandoned by its original creator.

I tend to work almost full time on developping the next version of b2evolution, so please forgive me if I don't have a lot of time for individual support questions.

I will do my best though to make better documentation available for all frequently asked questions.

This entry was posted by Francois Planque and filed under Research.

1 comment

Comment from: Steve

Spam is the scourge of the web and wish you luck in finding a solution. I doubt, unfortunately, your success though. I do quite like the idea of a time delay, but surely the bots can simply add a time delay to their software?

Research on new antispam measures

1 comment

Comment from: Steve

About b2evolution

Downloads

About us

Webhosting Guide

Docs & Support

Other

Stay in touch