My BlogAbout MePortfolioTemplatesArticlesWeb StoreMessage Board (Forums)Guestbook

Browse archives

September 2012  
Mo Tu We Th Fr Sa Su
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search this site with Google

In the war on spam, usability is the first casualty

Submitted by Jakob on 5 June, 2006 - 17:50.My Blog | Human-Computer Interaction | Internet | Usability

I've been posting earlier about spam, and the measures taken to stop it, as well as the counter-measures taken by spammers. It's a war that has civilian casualties just like any other war, and in this case the users are the civilians. The measures to prevent spam introduce captchas and other methods which are making websites less and less usable and accessible, methods that make using many sites become a challenge for many.

Spam is the environmental issue of the Internet

Spam is, as almost everyone knows these days, mass-sent and mass-posted messages that contain content to promote a product or service. It doesn't have to be commercial, anything submitted on a mass-scale without prior solicitation is considered to be spam. Unsolicited bulk email, as email spam is often referred to, has become one of the major technical problems on the Internet, besides viruses, worms and trojan horses. Spam is comparable to environmental pollution, the destruction of something commonly owned without taking responsibility for the consequences. Spam is parasitic and uses other people's time, money and resources. Spam must be exterminated.

Spam and its many incarnations

Today spam doesn't always come in the form of emails, automated postings to forums and sites have become another way to spread unsolicited advertising. Websites like phpbuilder.com has had many of its articles polluted by spam comments made by so called spambots, software designed in order to post spam at whatever sites it finds.

Spambots target forms that lets users post content to the site. The content that is posted is seldom filtered, it's usually displayed exactly as it is. By posting links and keywords spammers can get their sites to show up better in search engine results. In SEO, or search engine optimization, one of the determining factors for a site's position on the SERPs (search engine result pages) is the number of links leading to it and what sites that are linking. Links from sites that have many backlinks and are popular are considered more valuable than those from smaller, less known sites. A link is a kind of vote, and spammers do everything they can to get free votes by abusing systems put in place for people to post content.

This is a major nuisance for anyone operating a site with many backlinks and a high pagerank (which measures a site's popularity) just like my site. I had to close my guestbook at the old version of my site due to the amounts of spam posts being made. I lated turned to manual approval but it became unfeasible since the time it took to go through every post soon amounted to half an hour every day. The current guestbook requires users to register first which presents a few additional obstacles for spambots, measures I'll write some more about in details.

Measures to prevent spam

Anti-spam software developers, and designers of forums and website content management and publishing systems have relied on a set of measures in order to prevent spam. These are more or less successful but all of them take away from the easy-of-use and add unnecessary steps when it comes to using the site.

  • Captchas

    Every frequent user of the Internet and the web has seen these, they usually appear as an image with text or numbers which you are supposed to interpret and then type it in a field. Captchas rely on the fact that computer programs are good at following programmed logic but is poor at analyzing random, unexpected and entropic data, something humans excel in. We are extremely good at understanding each other on noisy phone lines, or seeing what's in a blurred photo. Hiring people to decode captchas would be pointless as it would be too expensive to warrant the possible return.

    Usage

    Most forum systems use captchas, including phpBB. A typical phpBB captcha is shown below. The captcha used by Drupal which is somewhat more complex, and also used at this site, is shown next to it. You will probably also encounter a captcha if you use a service such as looking up database records, such DNS entries for domain names, as these services are prone to be abused by other sites that attempt to cadge on the resources of others.

    phpBB Captcha

    DrupalCaptcha

    Problems

    Captchas aren't as effective as they once were, many captchas have been broken. Spammers have turned to novel ways to break them such as letting people signing up at other sites of theirs, usually porn sites, decode captchas for them. There is also software that can break captchas, read them with often complete or extremely high success rates.

    I earlier wrote that software is bad at analyzing entropic data but this is a generalization. By applying methods from AI, or artificial intelligence, research and using so called "neural nets", that excel at pattern matching, computer software can be "taught" how to read captchas. The problem with neural nets is that they're generally slow and require much processing time so with the computer hardware of today it is not yet feasible to use them to decode captchas at the rate spammers require.

    Many of these captcha breaking software applications do not use neural nets but other faster algorithms to crack captchas, algorithms that are good enough to break most captchas available today. The phpBB 2 captcha has been broken but the Drupal captcha seems to be trickier and a bit more complex, so my site remains secure, for now.

    Usability issues

    This is clearly a major problem. As visual captchas get more and more complex, in order to make it harder for spammers to decode them, the workload is put on the user. The user is expected to try and read a series of jumbled letter and numbers from an image, and often fails, which leads to frustration. It adds unnecessary work and is a waste of time for the end users.

  • "Audible captchas"

    This is an alternative to the visual captcha for people that are either blind or are visually impaired. It is a sound file with a number of spoken words, for letters and numbers and the person listening is required to write down what he and she hears.

    Usage

    PayPal, among others, offers this as an alternative to the visual captcha.

    Problems

    The audible captcha is subject to the same problems as the visual one as both depend on data that is hard for machines to decode. We already have systems that can interpret human voices. Booking systems for airline tickets have been using this technique for many years. What's in the favor of the audible captcha is that it requires even more processing time than the visual captcha and is less common which means the likelihood that spammers will focus efforts on breaking it are small. On the other hand, playback of audio on the web is still not something that works perfectly, and many people use computers without soundcard, speakers or headphones, not to mention people who are deaf or hearing impaired.

    Usability issues

    Just like the visual captcha, the audible one presents the risk of frustrating the user. In both cases, additional work is put on the users and requires them to succeed in a challenging task.

  • Human questions

    This method expects the user to answer a question that requires the user to understand the meaning of the question. In computer scientific terms, the agent must be able to semantically decode the question and provide a correct answer. A question could be "What is the largest country on Earth what area is concerned?". The user would then be expected to type "Russia".

    Usage

    Human question-style spambot checks are used on a few sites and by a few blog systems but they aren't as popular as captchas as they cannot be as easily automated.

    Problems

    From a technical point of view, the "human question"-style protection has one major issue, it requires a list of question and answer pairs. If the spambot has access to this list the protection is more or less worthless. For this to work the list would have to be kept secret and would require constant editing to prevent spammers from building their own list of question-answer pairs.

    Usability issues

    This might be the worst solution presented in this article, and I'm sure you know exactly why. Obviously users may not know the answer, and even after repeated trials the user may not have got a question the user knows the answer to.

    Also, to prevent spambots succeeding through "brute force" attacks, that is repeated trials until success, the number of trials must be limited which further makes this a lousy protection scheme.

    While it's possible to let the user pick a category, just like when you play Trivial Pursuit, it would still be very complicated. And, even if the user does know the answer, there's not telling whether the user will enter it in the correct format, capitalization, spaces, special characters and all.

    Less specific questions, on the other hand, make the chance for a spambot to break the system even greater. A question like "What is the square root of 4?" would broken instantly, it is simple arithmetics.

  • Email-based confirmation

    This is a rather simple method that relies on the user having an email account to which the system sends an email with a link or a code. The user is expected to confirm the email account by clicking the link or providing the code to the system. 

    Usage

    Most forum system, content managment systems and similar use this method to confirm user accounts. It's usually a way to dispatch a password to the user and require some kind of effort on the user's end. Unfortunately it doesn't prevent spambots.

    Problems

    Analyzing incoming email is relatively simple and since most forum systems, and all forums using that system, had the same email format, writing a piece of code that extracts the required information is a piece of cake. Spambots get around this relatively easily.

    Usability issues

    The greatest problem with this approach is that the system relies on the functioning of other systems. If email is used to send the confirmation code how soon the message arrives depends on all the servers the message is routed through. An email can take from one second to several hours to arrive. Once it arrives, provided the email serves works, it may end up in the bulk mail or spam folder, mistaken by the spam filter for being spam, requiring the user to actively search for it.

    These problems and the delay add frustration and forces the user to do something else, possibly forgetting entirely about what he/she intended to do in the first place. In case of sites with user-contributed content such as forum this is usually a minor problem, people are used to this, but for ecommerce sites it could potentially stop a purchase because the customer either finds another merchant that doesn't require registration and email confirmation or simply doesn't care. Many impulse purchases would like fail due to this.

    • Conclusion

      It is clear that spambot protection and spam prevention methods, all weapons in the war on spam, make websites less accessible and results in poor usability. Unfortunately and under current circumstances we are required to take these measures to prevent spam. I am aware that my site uses several of these methods, which I've criticized for their impact on usability and accessibility, but it is conscious decision on my part weighing the pros and cons.

      The only solution to spam is to stop it, and make spam so costly and resulting in such sentences that people don't consider it worth the risk. It is also important to educate people, especially new Internet users, about spam and why they should never click links or follow instructions in spam emails. Any responses, any income, regardless of how small it is, will encourage spammers to continue abusing the trust and resources of other people for their own selfish gain.

      Links

      This site lists several types of captchas and discusses weakness and strengths of each type:
      http://sam.zoy.org/pwntcha/

      About using algorithms derived from AI to break captchas:
      http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha 

      A scientific article about an algorithm to break the captchas used by Yahoo:
      http://www.cs.sfu.ca/~mori/research/gimpy/

      Wikipedia article on captchas and how to circumvent them:
      http://en.wikipedia.org/wiki/Captcha

      Another method to circumvent capthas:
      http://www.puremango.co.uk/cm_breaking_captcha_115.php



Trackback URL for this post:

http://www.jakob-persson.com/trackback/470
Submitted by juanm on 7 June, 2006 - 18:56.

Hi ... you probably forgot that most of spammers are using open proxies to post their crap ... But even blocking some kinds of proxies will only reduce the load on your site ....
About not clicking on links ... well, you're right, but with certain sites is quite difficult understanding if they're spammy sites or legitimate ones at a first glance.

hope it could help

Submitted by Jakob on 7 June, 2006 - 21:41.

I use a combination of heuristic analyses of the user agent as well as blacklists of spammers and their proxies. The few that get through are stopped by the captchas. You're right, blacklisting isn't enough now.

And you're right about the links too, it can be very hard to tell, especially with all the fishing going on these days, mostly faked PayPal, CitiBank and Ebay emails.

Submitted by Virtuality on 7 June, 2006 - 22:27.

That's usually easy to spot though. Fishing. Just hover over the link, and the address will look weird. Either it's just an IP, or some foreign top domain.

Sure, many don't know how domains work etc, I guess someone has to publish a brochure and send out to all internet users. :)

Submitted by Jakob on 8 June, 2006 - 00:30.

Yes I agree. Like I mention in the article, people need to get educated about these things. It's very important in order to win the fight against spam. I'm not sure if a brochure is the right way to go though, it would be wiser to strive to make email software better at identifying phishing emails.

Post new comment



The content of this field is kept private and will not be shown publicly.


*

  • Web and e-mail addresses are automatically converted into links.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Verify comment authorship
Captcha Image: you will need to recognize the text in it.
*
Please type in the letters/numbers that are shown in the image above.