We all want Google and other search engines to index out sites but there are also bad robots around, which you have absolutely no use of visiting your site.
Recently on one of our sites we had the issue that the server periodically was massively overloaded, and the website slowed to a crawl. The server had a load factor of over 75 – full load was 4 (yes, four), so it was overloaded with almost a factor 20. The reason? A bad bot. A specific bot was trying to index the site at a ferocious speed. This was enough to completely overload the otherwise proper sized webserver for the client.
Bots such as Semrush Bot, MJ12 Bot and DotBot just needlessly slow down your website.
216.244.66.242 - - [01/Oct/2021:10:11:51 +0000] "GET / HTTP/1.1" 301 255 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])"
But aren’t bots good? Yes, most are. We all want our sites to be included in search engines, such as Google, Yahoo etc, however there are some indexing services which are NOT being used for search indexes. Instead the data are being used internally for their customers to show comparative data and in order to provide a service to their clients. This means that unless YOU are their client, you have absolutely ZERO benefit from this. One might even argue that you have a negative benefit, as it might allow competitor analysis on your site from your competitors.
So – why allow it? There is NO benefit to doing so for you, and you really should block them.
How to block them.
There are 2 simple methods (we recommend using both) to blocking them completely. One is to block them on your server itself. In order to do that, you can change your .htaccess for apache to prevent them (we can help you do that).
The other method is to block the site in CloudFlare (or whichever other CDN you’re using. You are using a CDN right?). You do this from their Firewall settings.
Once you have done this, you’re no longer providing free data to another company, and lower the change of your website being overrun by bad Robots.