E-mail protocols and spam filtering is something of a hobby and I certainly have more expertise in that area than most folks. I’m someone who receives close to 1,000 e-mails a day, of which 80-90% are spam. So, it’s a problem that is near and dear to my heart. So, what do you do about it?
I like analogies (even bad ones), so here goes …
Spam is very much like some of the complicated human diseases we face today, such as cancer or HIV. Clearly, there are rarely severe health consequences that result from e-mail spam but what I’m talking about is the way these diseases mutate and are often very difficult to treat.
Spam, is something that mutates almost as quickly as new ways of filtering are developed. The people who propagate spam are no fools and many of them are highly skilled programmers with a deep understanding of how e-mail systems function. If you attack the problem with any one solution, you’ll find that they quickly modify their approach to work around that solution much in the same way that a complex disease might react to a single medication. They key is to use a combination of techniques or “drugs” to combat the problem on many different levels all at the same time. Ok, this is as far as I’m going to take this analogy … I promise!
So, in a practical sense, how does this apply to eliminating spam from your inbox? Well, there are a number of different techniques that are available to filter spam and the trick is employing 2, 3, or even more of these techniques all at the same time. Below, I’m going to describe three broad categories of spam filtering techniques that you can explore on your own – just try Googling some of them and you’ll find a wealth of more detailed information.
Statistical analysis is probably the one of the more commonly used spam filtering techniques that you will find in use today. You’ll also here terms such as heuristic, Bayesian, and others types of related analysis, which I loosely group under the category “Statistical Analysis”. Essentially, this method of spam filtering relies on examining the headers and content of the message and assigning a score or some sort of probability that the message is spam. Based on that score, you can tell your e-mail software to either accept, reject, or file the message in your junk mail folder. SpamAssassin and DSPAM are two commonly used software applications that enable this sort of filtering and are in wide spread use by many ISPs and e-mail service providers.
In terms of effectiveness, you’ll see claims of anywhere between 70% - 99% accuracy. Most of these systems are capable of learning over time and improving their effectiveness. My personal experience is that you’re doing pretty well if you are able to filter out 70% of the spam using this technique alone. Unfortunately, that means you could still be receiving a lot of spam that makes it into your inbox. However, it’s certainly a major step in the right direction for most people.
Filtering really isn’t a specific technology, but generally refers to a suite of tools that ISPs and users can employ to eliminate undesirable e-mail. In its simplest form, most e-mail clients (i.e., Outlook) support the ability to create rules that examine messages for specific content and then take some sort of action – typically to delete the message. For example, you can create a rule that looks for the word “mortgage” in the body of the message and, if found, move the message to your junk mail folder.
More sophisticated systems, generally used by an ISP, might allow users to filter out messages based on the originating geography. For example, I really don’t expect to receive any personal e-mail from Asia so I block any e-mail that originates from that part of the world – done by examining the originating IP address. This is a technique that I commonly refer to as Geo-IP filtering.
Another common technique, and probably more widely available than Geo-IP filtering, is “Grey listing”. Grey listing relies on the fact that much of the world’s spam is not sent by mail servers but rather viruses, Trojan programs, etc. that have infected people’s PCs. Grey listing filters messages by initially refusing to accept e-mail from a source that it has never seen before. If the sending mail server is legit, it should attempt to resend the message again. If it doesn’t, then you can be pretty sure it wasn't something you needed to read.
The effectiveness of this sort of filtering ranges broadly. My personal experience is that using rules to do keyword searched in your e-mail, at best, will only eliminate only 10% - 15% of the spam. Techniques such as grey listing or Geo-IP are closer to 30% - 50% effective. You'd never want to use these techniques by themselves but they nicely compliment other techniques such as statstical analysis.
Challenge/response anti-spam includes a class of techniques that eliminate spam by automatically challenging the sender to prove their e-mail is legitimate. Typically this involves the receiveing system putting a hold on their e-mail (so you never see it) and sending an e-mail back to the sender asking them to click on a link or take some other action to verify their authenticity. As soon as they respond, their original e-mail is released and you get to see read it. Once a sender has authenticated themselves, their e-mail address is added to a trusted “white list” so that they are no longer challenged for any subsequent e-mails.
Challenge/Response systems are nearly 100% effective. So, why isn’t it more widely used? Quite frankly, many ISPs and e-mail service providers hate this solution. It’s a technique that often causes their systems to process a lot more e-mail and uses more network bandwidth due to the added overhead of the challenge aspect of this solution. In fact, there are some ISPs and e-mail providers who have decided to treat challenges as a type of spam. The other drawback is that spammers often forge the senders e-mail address to obscure their identity. Unfortunately, if they happen to use your e-mail address then any challenge message end up being sent to you. Of course, you’ll see the challenge and have no idea why the heck it was sent and if you get enough of them, you might find it down right annoying.
Personally, I think if challenge/response systems were more widely adopted, it could be quite successful at reducing the overall volume of spam. It’s a highly effective solution and there are many variations that can make it very difficult for a spammer to get around without making a major economic investment. However, due to the drawbacks discussed above, you’re not likely to see this approach widely adopted anytime soon. Still, its one more tool you can put in your arsenal if you decide it's right for you.