Captcha Ranting and the Honeypot Captcha
The Lesser of Two Evils
For quite a while now, one of the greater annoyances I’ve encountered on the net is something we’ve come to accept as something like the lesser of two evils, a spambot-roadblock known as “CAPTCHA.” (This acronym actually has a meaning, which is “Completely Automated Public Turning test to tell Computers and Humans Apart.”)
Now, you might ask me, “So what? Would you prefer getting every single form on the net exploited by spambots?” Of course not, there is nothing I despise more than getting countless well-meaning offers of masculine-organ-gargantuafication.
As much as I want to avoid those mails, I can’t help feeling a great irritation every time an incomprehensible image pops up, declaring me a fifty-line script for not realizing that S was actually a 5. More than once, this frustration has lost a forum or blog a comment from yours truly, and probably many more from others.
CAPTCHA Pitfalls
When talking about these matters in a corporative fashion, you use the term “conversion ratio.” Simply put, it’s the percentage of visitors that actually follows through with the action that you as author wish for them to take, werther that is filling out a form, signing up as a member, or perhaps purchasing a certain product or service. And, as you’ve probably figured out by now, the use of CAPTCHAs might hold a negative impact on this ratio.
At least, that was what a recent post on the SEOmoz.org-blog was all about. The author of this post put together some very clear and impressive statistics, showing that the use of CAPTCHAs yielded an 88% reduction in spam, but at the same time the figure of failed “conversions” rose drastically. And the figure of spam was not that great to begin with.
So, when putting the conversion ratio in first perspective, not implementing a CAPTCHA seems to yield more favorable results. But really, we do not want that spam!
Alternative solution?
The same post as mentioned above provided a link to a soon three-year old alternative solution to the problem – called the “Honeypot CAPTCHA.”
The general idea of this solution is that, when a spam-bot traverses your page, it looks for and attacks any tasty-looking form, but rarely ever pays any attention to user-oriented code, that is the stylesheet. So, what if we would put in a field in our form that code-wize appears as a completely normal input field, but is invisible to the real user? Get it? If that field, which a real user wouldn’t fill out actually *is* filled out, we can deduce that this was the workings of something less intelligent, a couple of dirty lines of code. In the final part of this post, I wrote a simple example piece of code. The blog in which this solution, as well as two other interesting ones were originally posted can be found at the haacked.com website.
Opinions voiced against this method primarily concern the very important matter of accessibility – accessing a form with a field like this with a screen-reader or text-based browser would confuse and/or render the valid user unable to use the form. However, supplying proper commentary about the field should solve this matter. And also, how *does* a screen-reader/text-browser go about regular CAPTCHAs, anyways?
Reality-check
But facing the cold, hard facts, we can’t fool ourselves into believing that spambots will stay silly forever. In fact, there should already be quite a few sophisticated ones out there. The battle against spam has been raging since the olden days, and just to provide an example I’d like to toss in a link to this very informative post by an anti-spam software developer, written in early ’06. He discontinued working on his project, SpamKarma2, in mid-’08, and put the code up on Google Code under a standard GPLv2 licence, where it’s still being developed today.
Back to the point – he points out in the post I liked to above that he had already then observed an increase in spambot efficiency, making the access look more human-like, following links in a “common” manner, and even bypasses javascript-filters. A programmer who can implement a javascript parser in his spambot would hardly be challenged to create one for stylesheets as well, the reason there hasn’t been any indications of one yet is simply that there hasn’t been any need for it. Thus, the honeypot-solution, if widely spead, would probably be surmounted with relative ease.
If I haven’t frustrated you enough yet, breaking all the good parts of the “solution” before you’ve even had a chance to code it into your site, here’s one more. “OCR.” Utilizing this technique, invented to turn scanned images into normal text, the quite famous XRumer bot was able to break Hotmail and gMail CAPTCHAs in late ’08. So the race is, by all measures, a tight one. Obfuscated CAPTCHAs however still seem to hold pretty high ground, and thus it is indeed the optimal way to avoid spam. But, (back to square one), user-unfriendly and perhaps holding a negative commercial impact.
Summary
- Using the honeypot CAPTCHA and common sense, a “low-value” target would probably be able to avoid practically all spam without implementing intrusive techniques such as regular, hard-to-OCR CAPTCHAs.
- For as good security as possible, a hard-to-OCR CAPTCHA is the way to go, unfortunate but true. One nice system I’d like to push for is the reCAPTCHA service, which makes the pestering work into a good deed by using your human processing cycles to digitalize old books and publications.
- The battle rages on. If you’ve got any information regarding this topic I’d more than love to hear from you. Especially if you hold some information about the workings of more sophisticated spambots. Ignorance might be bliss, but living in the grey-zone in between is pure hell.
Thanks for sticking through, hope you found this somewhat useful.
xkcd 810: Constructive
