Recommended Stuff
Browse archives
Mo | Tu | We | Th | Fr | Sa | Su |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Blog Entry Categories
- Astronomy (2)
- Cognitive Science (2)
- Computer & Video Games (2)
- Computers & Hardware (2)
- CSS (3)
- Drupal (7)
- Human-Computer Interaction (11)
- Internet (4)
- Linguistics (1)
- Miscellaneous (1)
- Movies & Film (2)
- Music (5)
- Personal (1)
- Philosophy (1)
- phpBB (4)
- Politics & Society (10)
- Portfolio Updates (4)
- Psychology (3)
- Reviews (1)
- Science (1)
- Science-Fiction (1)
- Software & Applications (1)
- Templates, Themes and Skins (4)
- Travel (8)
- Usability (9)
- Web Design (11)
- Website Updates (4)
- XHTML (3)
Blogs/Sites I Like
Recent
Recent blog posts
- Five perhaps not-so-known PHP tricks for leaner and cleaner code
- PHPEclipse and PDT
- Kilowatts & Vanek are back, better than ever
- The future of my phpBB templates
- Checking in
- Web Design or the Art and Science of Solving Problems (Part 1)
- Lost in thought
- An easy way to display a customized menu in your Drupal theme
- Back on the blog with a CSS rant
- Eternal* fame on the red planet for free
Search this site with Google
In the war on spam, usability is the first casualty
I've been posting earlier about spam, and the measures taken to stop it, as well as the counter-measures taken by spammers. It's a war that has civilian casualties just like any other war, and in this case the users are the civilians. The measures to prevent spam introduce captchas and other methods which are making websites less and less usable and accessible, methods that make using many sites become a challenge for many.
Spam is the environmental issue of the Internet
Spam is, as almost everyone knows these days, mass-sent and mass-posted messages that contain content to promote a product or service. It doesn't have to be commercial, anything submitted on a mass-scale without prior solicitation is considered to be spam. Unsolicited bulk email, as email spam is often referred to, has become one of the major technical problems on the Internet, besides viruses, worms and trojan horses. Spam is comparable to environmental pollution, the destruction of something commonly owned without taking responsibility for the consequences. Spam is parasitic and uses other people's time, money and resources. Spam must be exterminated.
Spam and its many incarnations
Today spam doesn't always come in the form of emails, automated postings to forums and sites have become another way to spread unsolicited advertising. Websites like phpbuilder.com has had many of its articles polluted by spam comments made by so called spambots, software designed in order to post spam at whatever sites it finds.
Spambots target forms that lets users post content to the site. The content that is posted is seldom filtered, it's usually displayed exactly as it is. By posting links and keywords spammers can get their sites to show up better in search engine results. In SEO, or search engine optimization, one of the determining factors for a site's position on the SERPs (search engine result pages) is the number of links leading to it and what sites that are linking. Links from sites that have many backlinks and are popular are considered more valuable than those from smaller, less known sites. A link is a kind of vote, and spammers do everything they can to get free votes by abusing systems put in place for people to post content.
This is a major nuisance for anyone operating a site with many backlinks and a high pagerank (which measures a site's popularity) just like my site. I had to close my guestbook at the old version of my site due to the amounts of spam posts being made. I lated turned to manual approval but it became unfeasible since the time it took to go through every post soon amounted to half an hour every day. The current guestbook requires users to register first which presents a few additional obstacles for spambots, measures I'll write some more about in details.
Measures to prevent spam
Anti-spam software developers, and designers of forums and website content management and publishing systems have relied on a set of measures in order to prevent spam. These are more or less successful but all of them take away from the easy-of-use and add unnecessary steps when it comes to using the site.
Captchas
Every frequent user of the Internet and the web has seen these, they usually appear as an image with text or numbers which you are supposed to interpret and then type it in a field. Captchas rely on the fact that computer programs are good at following programmed logic but is poor at analyzing random, unexpected and entropic data, something humans excel in. We are extremely good at understanding each other on noisy phone lines, or seeing what's in a blurred photo. Hiring people to decode captchas would be pointless as it would be too expensive to warrant the possible return.
Usage
Most forum systems use captchas, including phpBB. A typical phpBB captcha is shown below. The captcha used by Drupal which is somewhat more complex, and also used at this site, is shown next to it. You will probably also encounter a captcha if you use a service such as looking up database records, such DNS entries for domain names, as these services are prone to be abused by other sites that attempt to cadge on the resources of others.
Problems
Captchas aren't as effective as they once were, many captchas have been broken. Spammers have turned to novel ways to break them such as letting people signing up at other sites of theirs, usually porn sites, decode captchas for them. There is also software that can break captchas, read them with often complete or extremely high success rates.
I earlier wrote that software is bad at analyzing entropic data but this is a generalization. By applying methods from AI, or artificial intelligence, research and using so called "neural nets", that excel at pattern matching, computer software can be "taught" how to read captchas. The problem with neural nets is that they're generally slow and require much processing time so with the computer hardware of today it is not yet feasible to use them to decode captchas at the rate spammers require.
Many of these captcha breaking software applications do not use neural nets but other faster algorithms to crack captchas, algorithms that are good enough to break most captchas available today. The phpBB 2 captcha has been broken but the Drupal captcha seems to be trickier and a bit more complex, so my site remains secure, for now.
Usability issues
This is clearly a major problem. As visual captchas get more and more complex, in order to make it harder for spammers to decode them, the workload is put on the user. The user is expected to try and read a series of jumbled letter and numbers from an image, and often fails, which leads to frustration. It adds unnecessary work and is a waste of time for the end users.
"Audible captchas"
This is an alternative to the visual captcha for people that are either blind or are visually impaired. It is a sound file with a number of spoken words, for letters and numbers and the person listening is required to write down what he and she hears.
Usage
PayPal, among others, offers this as an alternative to the visual captcha.
Problems
The audible captcha is subject to the same problems as the visual one as both depend on data that is hard for machines to decode. We already have systems that can interpret human voices. Booking systems for airline tickets have been using this technique for many years. What's in the favor of the audible captcha is that it requires even more processing time than the visual captcha and is less common which means the likelihood that spammers will focus efforts on breaking it are small. On the other hand, playback of audio on the web is still not something that works perfectly, and many people use computers without soundcard, speakers or headphones, not to mention people who are deaf or hearing impaired.
Usability issues
Just like the visual captcha, the audible one presents the risk of frustrating the user. In both cases, additional work is put on the users and requires them to succeed in a challenging task.
Human questions
This method expects the user to answer a question that requires the user to understand the meaning of the question. In computer scientific terms, the agent must be able to semantically decode the question and provide a correct answer. A question could be "What is the largest country on Earth what area is concerned?". The user would then be expected to type "Russia".
Usage
Human question-style spambot checks are used on a few sites and by a few blog systems but they aren't as popular as captchas as they cannot be as easily automated.
Problems
From a technical point of view, the "human question"-style protection has one major issue, it requires a list of question and answer pairs. If the spambot has access to this list the protection is more or less worthless. For this to work the list would have to be kept secret and would require constant editing to prevent spammers from building their own list of question-answer pairs.
Usability issues
This might be the worst solution presented in this article, and I'm sure you know exactly why. Obviously users may not know the answer, and even after repeated trials the user may not have got a question the user knows the answer to.
Also, to prevent spambots succeeding through "brute force" attacks, that is repeated trials until success, the number of trials must be limited which further makes this a lousy protection scheme.
While it's possible to let the user pick a category, just like when you play Trivial Pursuit, it would still be very complicated. And, even if the user does know the answer, there's not telling whether the user will enter it in the correct format, capitalization, spaces, special characters and all.
Less specific questions, on the other hand, make the chance for a spambot to break the system even greater. A question like "What is the square root of 4?" would broken instantly, it is simple arithmetics.
Email-based confirmation
This is a rather simple method that relies on the user having an email account to which the system sends an email with a link or a code. The user is expected to confirm the email account by clicking the link or providing the code to the system.
Usage
Most forum system, content managment systems and similar use this method to confirm user accounts. It's usually a way to dispatch a password to the user and require some kind of effort on the user's end. Unfortunately it doesn't prevent spambots.
Problems
Analyzing incoming email is relatively simple and since most forum systems, and all forums using that system, had the same email format, writing a piece of code that extracts the required information is a piece of cake. Spambots get around this relatively easily.
Usability issues
The greatest problem with this approach is that the system relies on the functioning of other systems. If email is used to send the confirmation code how soon the message arrives depends on all the servers the message is routed through. An email can take from one second to several hours to arrive. Once it arrives, provided the email serves works, it may end up in the bulk mail or spam folder, mistaken by the spam filter for being spam, requiring the user to actively search for it.
These problems and the delay add frustration and forces the user to do something else, possibly forgetting entirely about what he/she intended to do in the first place. In case of sites with user-contributed content such as forum this is usually a minor problem, people are used to this, but for ecommerce sites it could potentially stop a purchase because the customer either finds another merchant that doesn't require registration and email confirmation or simply doesn't care. Many impulse purchases would like fail due to this.
Conclusion
It is clear that spambot protection and spam prevention methods, all weapons in the war on spam, make websites less accessible and results in poor usability. Unfortunately and under current circumstances we are required to take these measures to prevent spam. I am aware that my site uses several of these methods, which I've criticized for their impact on usability and accessibility, but it is conscious decision on my part weighing the pros and cons.
The only solution to spam is to stop it, and make spam so costly and resulting in such sentences that people don't consider it worth the risk. It is also important to educate people, especially new Internet users, about spam and why they should never click links or follow instructions in spam emails. Any responses, any income, regardless of how small it is, will encourage spammers to continue abusing the trust and resources of other people for their own selfish gain.
Links
This site lists several types of captchas and discusses weakness and strengths of each type:
http://sam.zoy.org/pwntcha/
About using algorithms derived from AI to break captchas:
http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha
A scientific article about an algorithm to break the captchas used by Yahoo:
http://www.cs.sfu.ca/~mori/research/gimpy/
Wikipedia article on captchas and how to circumvent them:
http://en.wikipedia.org/wiki/Captcha
Another method to circumvent capthas:
http://www.puremango.co.uk/cm_breaking_captcha_115.php