You may not know what they’re called, but you’ve seen and had to solve a CAPTCHA.
The word CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.”
You see, some people like to write programs to automatically access services across the Internet that the operators of those services would rather reserve for real human beings. For instance, if a company is running a website which is financially supported by advertising, they don’t want anyone to use a program to put messages into their site, bypassing the opportunity to show the end user the advertising. So, when you want to enter a message, they ask you to solve the CAPTCHA before continuing.
You might have noticed that there are all different sorts of CAPTCHA images.
The basic process is to come up with a couple of words or numbers, create an image of those words or numbers, and then disturb the image in some way that would make it hard for a computer to recognize the information. The image is usually warped by moving pixels around.
Some companies use the CAPTCHA to solve another problem though.
You see, there is a big push to digitize old books. Rather than have someone sit down and type in all the words in the book, they use a computer to optically recognize the characters on the pages. But optical recognition isn’t perfect, especially on old books where the printing itself wasn’t perfect or where the copy has been damaged in some way.
What they do is have the computer software rate its confidence in each word’s recognition. They have the software ask itself, “Is this word made up of all letters?” and, “Do these letters make up a word I know?”
If the confidence is low on a particular word, they throw it into the CAPTCHA pile and it is presented to people on a website. When a number of people have seen and translated the same image, then they raise the confidence level and correct the digital version of the book.
So you may ask yourself, “Hey, self, how do they know I typed in what was on the scan if they don’t know what was on the scan?” Actually, I just asked myself that same question.
My guess is that they present you with one word they do know and one word they don’t know. They basically let you slide on the word they don’t know, but they record your answer and compare it to the next person’s answer. If the two answers match, then the word has been recognized and it goes into the bucket of words they do know.
One group that’s doing this is reCAPTCHA. You can read about their process at http://recaptcha.net. They’ve made it easy to integrate their CAPTCHA process into your websites with very little complicated programming.
Here is a sample reCAPTCHA image. See how each word looks like it was scanned from a book, but the page was warped or put into the scanner wrong?
So, when you see these sorts of CAPTCHAs, perhaps instead of being frustrated by the process you can feel appreciated because you’re helping to get old books into digital form so more people can enjoy them.
Advertising Via CAPTCHAs
It blows my mind that people are not advertising through CAPTCHAs, yet.
For instance, you could be asked to recognize the McDonald’s logo, or “I’m loving it,” along with a traditional textual image. You could be played a jingle and asked to recognize the brand.
I’m sure this is coming.
Would you like fries with that?
Leave a Reply