Limiting the Use of CAPTCHA

I used to blog periodically on my company's blog. Earlier this year, the blog was taken down, and I needed to reference something I posted from 2012. I've restored the post using the Wayback machine here. The information is a little dated, but maybe it could be useful to someone. This post references the old reCAPTCHA system that Google used to verify book scans and street sign recognition results, not the more modern "I am not a robot" checkbox.

Recently our team worked on a project involving daily cash giveaways. As is the case when free money is involved, there are always some who try to game the system. And while our terms and conditions allow for multiple entries, they also reserve the right to disqualify anyone who is found tampering with the giveaway.

For some time now, we have used reCAPTCHA to discourage and slow down tampering. reCAPTCHA has worked great for us--until this particular campaign. About a week into the giveaway, our analytics showed that the abandonment rate on the entry form was significantly higher than it was for a similar campaign last year. We also started to get comments from our fans. Here are a few examples:

- Code thing not working when trying to register. Did it twice no problem. Then about 20 times won't work. I give up

- Is there a word for people who can't read CAPTCHAs? I can't ever get past them. :(

- Those CAPTCHA codes are a nightmare. I tried the audio code and wrote what she said and that didn't work either. Finally got some words I could decipher. But I'm in!

We tested the forms again and discovered that the CAPTCHAs had increased in difficulty since we deployed them. Many times we would see two scrambled nonsense words, opposed to the usual semi-distorted control word with a distorted but legible nonsense word. We also saw the new street view imagery that Google recently added, and occasionally even saw Hebrew and Arabic text.

To help our fans, we had to come up with a better solution. Due to time constraints, we decided to stick with reCAPTCHA, but instead of showing it to everyone, we decided to conditionally show it only to people who appeared to be eager to enter the form many, many times. We decided to implement the following plan.

  1. The visitor comes to our entry form. Initially there is a reCAPTCHA field, but it is hidden by JavaScript.
  2. After the visitor fills out some of their contact information, we fire off an AJAX call with this information.
  3. Our server compares this information with previous form submissions.
    1. If it looks like this person is new to the form or hasn't filled it past a certain number of times that day, it returns a secure token and stores it in a hidden input field.
    2. If they have filled out the form more than certain number of times, no token is returned. The reCAPTCHA field is shown and required.
    3. If at any point in this process there is an error or timeout, the default behavior is to require the reCAPTCHA.
    4. The visitor submits the actual form, which contains either our secure token or a reCAPTCHA token.
    5. The server validates whichever token it received. If our token doesn't validate, the visitor is taken back to enter a reCAPTCHA.

We were quite pleased at how simple and fast this solution worked for us. There was an increased risk of tampering, but as I said earlier, our terms allow us to disqualify anyone attempting to tamper with the form. More importantly, the data that was fed into the token generator was the same data used to contact a winner, so feeding it random information wouldn't help the person tampering with it win any money.

Leave a Reply