A web bot is designed to make life on the web easier; a script that automates repetitive tasks and does them much faster than a human could. This speed is often how you can tell who or what is interacting with your site: bot or human. And when it comes to securing your web apps and your system, that difference is important.
You don’t have to worry about the good bots, like Facebook or Google, or other sites crawling to build search indexes, for example. It’s the bad bots that you have to concern yourself with, because they may be scraping your site to steal content, scanning you to learn about your system in preparation for an attack, or actively attacking you.
The name of the game is to detect who is visiting your site—a human via a browser or a bot pretending to be human and accessing things it shouldn’t.
Bot or not?
There are four major techniques IT security uses to detect bots, depending on the sophistication of the attack.
- Captcha
Challenge consisting of scrambled texts or pictures that humans, but not an automated script, can solve. Captchas are really annoying to humans, however, so you want your system to only show them if there is a high probability that the script is coming from a bot and not a human. It’s a balance. - Passive fingerprinting
Review of metadata from the request. For example, particular browsers always send certain headers identifying themselves. Bots from unsophisticated attackers do not include proper identifying headers. In fact, very unsophisticated attackers may use a bot with the name of the attack tool in the header. - Rate limiting
Controls on the volume of traffic into a site within a certain time period, and protective action when that threshold is exceeded. Machines can browse a site faster than any human, so a massive amount of traffic in a short period of time indicates a bot. Sophisticated attackers slow down requests to a more human speed; or fool your system into believing the requests are coming from different users, by using a botnet—of hundreds, thousands, or millions of different IP addresses. The good news is that many companies collect and aggregate data about suspicious IP addresses and sell the information as threat intelligence. So there are large chunks of IP addresses known to be bad. - Active fingerprinting
Review of specific actions taken by the browser making the request. Web browsers are complex and it is hard for attackers to build a bot that truly replicates browser behavior. To confirm that the request is coming from a real browser, your system sends a request to the browser to perform an action in the background, such as draw a picture or request another page and send the results back to you. Real browsers do this the same way each time, meaning the request is coming from a real human using that browser. Sophisticated attackers try to emulate all the possible attributes of a browser, while you are trying to check as many possible attributes as you can.
What good is fingerprinting a bot?
The purpose of bot fingerprinting is to decide whether or not to send a captcha and confirm the source of the request, human or bot. If it is a human, you honor the request. If it is a bad bot, you stop it. A new challenge these days is that some hackers are outsourcing captcha solving for a small fee, meaning that captchas are not a perfect way to prevent a bot attack.
Organizations can no longer survive without putting their information online. It’s an arms race: hackers versus IT security. IT security works to identify bots and prevent attacks, while hackers get more and more sophisticated in accessing your site and causing harm.