Monday, February 11, 2019

What is reCAPTCHA by Google and how does it work?

What is reCAPTCHA and how it works?

Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme.

With this aim, Google unveiled the latest version of reCAPTCHA. The goal of their new system is twofold; to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. reCAPTCHA is driven by an “advanced risk analysis system” that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox or solve a challenge by identifying images with similar content.

Also readWhat is a VPN? A Beginner's Guide.

reCAPTCHA by Google - Purpose, and Working

The reCAPTCHA service offered by Google is the most widely used captcha service and has been adopted by many popular websites for preventing automated bots from conducting nefarious activities. Google announced that the deployment of a new reCaptcha mechanism designed to be more human-friendly and secure. 
  • Widget: When visiting a webpage protected by reCAPTCHA. The widget’s JavaScript code is obfuscated, to prevent analysis from third parties. When the widget loads, it collects information about the user’s browser which will be sent back to the server. Furthermore, it performs a series of checks for verifying the user’s browser. 
  • Workflow: Once the user clicks in the checkbox, a request is sent to Google contains:
  1. Referrer
  2. Website’s site key (obtained when registering for reCaptcha)
  3. Cookie for google.com
  4. Information generated by the widget’s browser checks (encrypted). 
The request is then analyzed by the advanced risk analysis system, which decides what type of captcha challenge will be presented to the user. Once the challenge has been presented to the user, it has to be answered within 55 seconds. Otherwise, the popup is closed and the user is required to click on the checkbox again to receive a new challenge. Once the user clicks, an HTML field called recaptcha-token is populated with a token. If the user is deemed legitimate and not required to solve a challenge, the token becomes valid on Google’s side.
Also readFive Best Smartwatches to Buy
Based on the level of confidence assigned to the specific request, Google’s advanced risk analysis system will select which type of challenge to present to the user. The different versions present a varying level of difficulty and nuisance, as some are trivial to pass while others are problematic even for humans. If a specific user requests multiple challenges or provides several wrong answers in a short amount of time, the system will return increasingly harder challenges.

Threat Model and Cookie Manager:

The Google tracking cookie plays a crucial role in determining the difficulty of the challenge that is presented to the user. Furthermore, each cookie can receive up to 8 checkbox captchas in a day. As part of our attack, we develop functionality for automatically creating Google cookies. The goal is to create cookies which are subsequently “trained” to appear as originating from legitimate users and not automated bots. In each case, we create a cookie in a clean virtual machine, where our browser automation system imitates a user browsing the web.

In practice, fraudsters may follow two distinct approaches for solving challenges. 
  1. They may employ an automated captcha breaking system, which will allow them to conduct nefarious actions unencumbered (e.g., create email accounts, post in forums). 
  2. They may employ humans to manually solve challenges, i.e., through an underground captcha-solving service.

GRIS, Tag Classifiers & CR:

  • Google Reverse Image Search (GRIS) offers the ability to conduct a search based on an image. If the search is successful it may return a “best guess” description of the image (which may differ for the same image across searches) along with a list of websites where the image is contained, and other available sizes of that image. While this is not part of Google’s public API, we identified the format of the search URL so our module can replicate the functionality.
  • Returned tags do not always exactly match the description (i.e., hint) given by reCaptcha for a challenge. To overcome this, we leverage machine learning to develop a classifier that can “guess” the content of an image based on a subset of the tags. Once the classifier has been trained, it can be used to predict the similarity of the captcha’s hint and the tags by computing the cosine similarity between their corresponding word vectors, with the goal of identifying subsets of tags from each image that have been associated with the hint during the training phase.
  • Canvas rendering is a known technique to fingerprint user across machines and browsers. The captcha's JavaScript code creates a Canvas element and draws a predefined composition. After the rendering is complete, the element is encoded in base64 and sent back with the other data when the user clicks the checkbox. This piece of information can be used to browser rendering ability and determine the browser version and later compared to detect the discrepancies with the reported user-agent.
Also readWhat is Dark Web? Things You Should Know.

CheckBox Captcha:

By leveraging proxy services and running multiple attacks in parallel, this amount could be significantly higher for a single machine. Since captcha-breaking is driven by monetary incentives, we evaluate our findings from an economic perspective and compare our attack to a captcha-solving service. reCaptcha altered the safeguards and the risk analysis process to mitigate our large-scale token harvesting attacks. They also removed the solution flexibility and sample image from the image captcha for reducing the attack’s accuracy. 

No comments:
Write comments

Featured Post

What is Microsoft 365? How AI in Microsoft 365 is helping in making things better?

What is Microsoft 365? In a nutshell, Microsoft 365 is an integrated bundle of the operating system Windows 10, Microsoft Office 36...