Are You a Good Bot or a Bad Bot? Identifying and Blocking Unwanted Bot Traffic

There’s no place like the Internet. Your web analytics read like the plot of a scary sci-fi movie: some of your audience clicks and views might not be human! Don’t panic! It’s not the robot uprising, just some bot traffic.

While bot traffic is not world-ending, it’s important to know the different bot types and how to avoid the unwanted kind; if left unchecked, bots can negatively impact your brand integrity, your search rankings, and cost your company money by attacking your apps, websites, or ads.

What is Bot Traffic

Bot traffic is any non-human traffic to a website or app; it accounts for about 50% of all website traffic with 30% of that traffic being malicious bots. Bots come from automated software and can perform repetitive tasks much quicker than a regular human user. Even if a malicious bot doesn’t achieve its programmed objective, it can strain your web servers and hurt your website’s performance.

3 Types of Bots

Good Bots

Good bots can help identify broken links, fetch and analyze information, give you better accuracy for search results, and help ensure website health and performance.

Good bots include search engine bots, partner/vendor bots, and website monitoring bots. Search engine bots are very desirable – they crawl the Internet to find the content that will return in search results. They can help your SEO rankings and get you to the top of the list for better visibility and client outreach.

Partner/Vendor bots are from third party service providers that use their bots to crawl your site and give you feedback, such as SEO performance or other measurements like accessibility. For example, these bots might identify how many broken links are on your site, or if you’re missing alt tags for images. Popular vendors with these bots include YoastSEO, SEMRush, and Ahrefs. However, you may want to limit their number of bots on days you anticipate high traffic, like during a big sale, to avoid impacting site speed and performance.

Commercial Bots

Commercial bots are the tricky gray area – they might not have been programmed to be malicious, but their behavior could negatively impact your site or put a strain on your server resources. Usually, they are operated by legitimate companies for collecting online content and using it elsewhere.

Aggregation bots find relevant content to be featured on aggregator sites and platforms, which can help promote your content and increase your customer outreach. However, you may want to control which commercial bots can access your content and at what rate. This is the same for scraping bots. Scraping bots could be collecting research for a weather app to use, or they could be collecting addresses to sell or copying and publishing your content under a different source. If you have public-facing data services like an API, you want to limit the number of calls any one source can make in a specified time frame to keep your server resources from being strained. For example, setting an API rate limit of no more than a hundred requests per minute.

Bad Bots

Bad bots are programmed primarily for malicious intentions. Their actions can cause sudden spikes and abnormal increases in page views, cause higher bandwidth usage, cause skewed Google Analytics reports or other KPIs, and cause lower conversion rates. They also can negatively affect website performance like response time, strain on data centers, and create high server costs because of increased traffic.

Bad bots include web scraping bots that steal info and publish or sell it to other sites. Credential stuffing bots are similar; they use stolen credentials to log in to user accounts and gain access to information that way. These have a high success rate as many people re-use the same name and password for login credentials. There’s also Denial of Service (DoS) bots that make repeated requests of large resource elements of a web app, such as large file downloads or form submissions, to cause slow performance or even take down the whole site if the traffic load overwhelms the servers.

Spam bots will post spam content or send spam emails that usually include fraudulent links; they might target you by posting in the comments of your blogs, social media channels, and chat forums. Ad fraud bots click on PPC ads to generate extra revenue or skew the costs of an ad; this means the advertiser is charged with high ad fees for a campaign that isn’t getting any conversions. These bots can also drain your Adwords account, cause Google to rate your ad’s performance as poor, stop your ad from being displayed, and will render your analytics meaningless.

How to Identify Bot Traffic

Identifying the different bot types will help you in deciding how to block or limit them from your sites and services.

To determine if a bot is a good or commercial bot, they usually follow a few criteria. They come from legitimate sources like Google and are transparent about who is their owner. Their tasks are beneficial for the most part – getting your content more visibility or giving you feedback to improve your site performance. They will also follow any rules or policies you specify in your robots.txt file.

You can limit how much access they have to your data or which ones have access to your site to get the benefits from them but prevent any negative behavior. As mentioned, they will follow rules in your robots.txt file. In there, you can specify which parts of your site you may not want search engine bots to crawl or index. There may be content you don’t want to show up in search results or links that lead to user functionality, like a form, that you don’t want to be counted as a broken link.

Bad bots will hide their identity or source, try to pose as humans, and won’t follow the rules in the robots.txt file. They will cause a sudden spike in clicks or traffic, but you will not see the conversion or sales stats expected from the increase. You want to completely block and stop these bots—they give you no benefits.

In your site’s analytics or web performance dashboard, look for the following signs that your traffic is coming from malicious bots:

An increase in traffic volume and bounce rate at the same time is usually caused by bots
Sudden drops in page-load speed means bots are attacking some part of your site
Average session durations by bots are usually super short
High page views mean bots are scraping data, not humans
A huge decrease in bounce rate means bots are stealing your content and scanning lots of pages
A decreased SEO ranking could mean bots are stealing your content and publishing in other places; the sites they post to could start outranking your site

How to Stop Bad Bot Traffic

There are several steps and resources to fight the bad bot uprising. Since there are so many types of bots, and they are constantly evolving, you will probably need to employ a combination of preventative measures. Most of these steps you can take by changing your site directly or through software services like Adobe Analytics or Google Analytics.

To prevent your SERP ranking from being affected if bots steal your content and publish in other places, set up canonical tags on every blog post so your article is always considered canonical even if content is stolen.

You can set up an allow (access) list or a block list (previously called Whitelist or Blacklist) of which bots are allowed or not allowed on your site. In this file, you can block certain bots based on language, location, or the markets they serve.

Also look into blocking IP addresses based on geolocation; be careful not to block legitimate users and know bots can have a large supply of different IP addresses to hit you from. Another way to block bots is with Web Application Firewalls, similar to a reverse proxy server. The WAF is placed between a web app or page and the client. It can protect applications and block unwanted traffic, but it can be defeated by some highly evolved bots.

If any of this seems overwhelming to figure out and manage, you may want to consider bot management solutions from third party vendors. Popular services like Cloudflare and Akamai set up protection against bots and DDoS attacks with tech solutions like a Content Delivery Network. While those help protect your site, don’t forget about your ads! You can look into click fraud prevention software to protect them.

On the front-end of your site or apps, use Captchas on your site – they can help stop bots from posting comments, filling out forms, and downloading content; be careful not to use too many and bog down your human users’ experience. Another word of warning: some bots are advanced enough now to solve some of the Captchas. You also want to make sure your Captchas meet accessibility requirements. For example, when a Captcha asks you to identify all pictures with a truck in them, is there an alternative option for a legally blind user?

Another way to check a user is human is with stricter access control. Look at implementing two-factor authentication or creating different user roles with specific permissions to limit access to your site. Anytime you ask for personal information, make the user verify that information with either an email link or a code you can text.

Use this knowledge to fight for your brand future against the bad bot traffic. But maybe now you are questioning, was this content from a bot? How do I know the author is human? Rest assured, I took a reCAPTCHA quiz this morning and passed it with flying click this link here for designer t-shirts at prices you won’t believe! Don’t let the scalper bots beat you to it.

Need help identifying and blocking bot traffic? Contact us at expert@emfluence.com.