b2evolution b2evolution

  • Sign in
  • Sign up
  • About
  • Downloads
  • Hosting
  • Docs
  • Support
  • Sign in
  • Sign up
  • Manuals Home
  • Latest Updates
 
  1. b2evolution CMS User Manual
  2. Operations Reference
  3. Fighting Spam
  4. Recognizing a crawler attack

Recognizing a crawler attack

Apart from comment spam (where you see many comments coming in), your site can also be under heavy load due to a crawler attack.

When you look at your b2evolution’s Analytics Tab, you may see a huge increase of traffic like this:

Recognizing a crawler attack
The global hits graph showing a huge increase in browser traffic

If your site did not get mentioned in a prominent source of traffic, this looks suspicious.

Further drilling down into the Web browsers hit summary we can see this:

Recognizing a crawler attack
The browser hits graph showing a huge increase in self-referred traffic

The majority of the traffic is self referred. Here a minority of new users browse many many pages on the site (or they reload them madly). This is characteristic of a crawler robot pretending to be a human… but clicking through your site much faster than a human.

Note: b2evolution already detects all well known robots which play nice and identify themselves (see /conf/_stats.php -> $user_agents = array( ... ) ). Such robots would appear clearly in light orange on the first screenshot above. Such robots would also be easy to control, either by asking them in robots.txt or by blocking them with a Rewrite rule in Htaccess. In this case though, the crawling robot is *not* playing nice. It doesn’t even advertise itself as a robot. It pretends to be a human. And that would be almost fine, undetected and problem-less… if only it wasn’t "clicking around" so fast…

If you can isolate this traffic as coming from a single IP (through the "All hits" tab), you may block that IP in Htaccess.

However, modern crawler robots use many different IPs at once. In this case it’s a much more complex problem. You may look at the Performance Optimization page for ideas to optimize your site in order to better resist to such attacks.

Created by fplanque • Last edit by fplanque on 2020-06-09 00:20 • •

1 comment

Comment from: arncus

arncus

Hello!

Thanks sooo much @fplanque for creating this page! It's incredibly useful for B2evolution users. I work with the InMotion Community Support team and we were looking into the case that resulted in you creating this page. Basically, as per the report, a user was getting highly escalated traffic that was resulting in high resource usage by a B2evolution website on one of our shared hosting servers. Unfortunately, in order to keep the site from adversely affecting other accounts on the server this particular site was suspended.

There are many ways that this can happen, but the main focus of this tutorial was on recognizing a Crawler attack. Check your traffic using your available analytics tools (including B2Evolution's graph as shown above). If you are a customer of InMotion, a service ticket can also be submitted requesting an analysis of the website traffic. The question is then, how do you stop the crawler or in this case what we believe to be a case of bots hitting the site?

One of the best ways to help stem the tide is to use your .htaccess file. We have a tutorial that explains how this can be done. The title of the article is Block Unwanted Users on Your Site using .htaccess (http://www.inmotionhosting.com/support/website/security/block-unwanted-users-from-your-site-using-htaccess#block-by-user-agent). We are still in the process of investigating the issue, though the suggestion given to the user was use caching. I will be posting on the forum concerning this issue, shortly. The good news is that the site is not currently suspended. Taking the steps to block these crawlers from hammering the site will help reduce further problems.

Thanks again for your time and help!

Arnel C.
InMotion Hosting Community Support Team

2014-01-17 @ 01:51

Search the Manual

Content Hierarchy

  • b2evolution CMS User Manual
  • User's Guide
  • Installation / Upgrade
  • Front-office Reference
  • Back-office Reference
  • Developer Reference
  • Operations Reference
    • Security
    • Fighting Spam
      • Introduction to Fighting Spam (AntiSpam)
      • Recognizing a crawler attack
      • My blog is slow because it's hammered with comment SPAM! How do I recover?
      • Creating an Antispam Plugin
      • Confusing SPAMbots by changing your file structure
      • How to control referrer spam hits from search engines
      • Antispam Tips
      • Fighting SPAM with .htaccess
      • Delete User Data
      • Places of automatic blocks
      • I can't post "spam" to my own site!
      • Blocking actions
    • Performance
    • Using Multiple Domains
    • Intranet setup
    • LDAP Integration
    • Troubleshooting
    • Bundled Plugins
  • Advanced Topics
  • Glossary
  • Archives
Run your own website!

This online manual is powered by b2evolution CMS – A complete engine for your website.

About b2evolution

  • What is it?
  • Features
  • Getting Started
  • Screenshots
  • Online demo
  • Testimonials
  • Design philosophy
  • Free & open source
  • Terms of service

Downloads

  • Latest releases
  • Skins
  • Plugins
  • Language packs

About us

  • About us
  • Contact

Webhosting Guide

  • Web hosting blog
  • Best web hosting
  • Cheap web hosting
  • Green web hosting
  • Hosting with SSH
  • VPS hosting
  • Dedicated servers
  • Reseller hosting
  • Int'l: UK / France

Docs & Support

  • Online manual
  • Forums
  • Hire a pro !

Other

  • Adsense
  • Press room
  • Privacy policy

Stay in touch

  • GitHub
  • Twitter
  • Facebook
  • LinkedIn
  • News blog
  • RSS feed
  • Atom feed

Founded & Maintained by François Planque