# I've found a spam filter that actually works!

Discussion in 'Computers' started by Rob Gardiner, May 30, 2003.

Rob Gardiner

I finally got tired of the massive quantities of "spiced ham" in my inbox so I did a bit of research and came upon this:

A Plan for Spam

This article describes the algorithm known as Bayesian Filtering, which superficially sounds very similar to regular spam filtering but has subtle differences. While other spam filtering methods use an arbitrary system of "points" to determine whether a piece of mail is spam or not, Bayesian filtering uses probability. [How many "points" is the word "viagra" worth on ths spam scale? It is a meaningless question.] The Bayesian filter, on the other hand, analyzes a collection of regular e-mail messages and a collection of spam, to determine the probability by which a particular word indicates the spaminess of the message it is in. Supposedly this method can achieve up to 99.9% accuracy. I have been using the following program:

POPFile

since this weekend and have achieved only 85% accuracy so far. That should go up as it recognizes more e-mails. Part of the reason my accuracy rate is so low, I think, is because I deal a lot on Ebay and those legit messages in some ways resemble sales pitches.

Interesting statistic: I've had the filter on my machine for only a few days. I have received 24 personal e-mails, containing a total of 1700+ unique words. In that same period, I have received over 200 spams (nearly 10 times as much) but despite the huge volume of mail, the spam messages contain only 1400+ unique words. So the presence of this small number of words in a great number of messages is a pretty good indicator that a message is spam.

Anyway I was fascinated by this. With my previous filtering, I was happy to get rid of 50% of my spams. And I got false hits, as well. I once filtered out any message containing 4 or more exclamation points in the subject line. A message like that *must* be spam, right? Well then I asked my dad if he had THE DAY THE EARTH STOOD STILL on DVD yet and if not, to keep an eye on the mail. Anyway his enthusiastic reply came with the subject: "NO!!!!!!!!!!" so into the spam bin it went. LOL.

If anyone else has war stories, I'd love to hear them. Or if anyone wants to try POPFile, I'm curious to know what accuracy you can reach. Best wishes.

John_Berger

I don't even bother. When I get spam, I take the few minutes to trace it back to it's ISP address and report it to the ISP. If a web site is involved, I also identify the ISP of the web hosting company as well as the web hosting company itself. I've had hundreds of e-mail accounts and dozens of web sites closed by doing this.

What's really fun is when the advertisement is legitimate, but still unwanted. I actually reply to the company with a message like...

"Congratulations! Because you have spammed me, you now have permanently lost a potential customer as I will never purchase anything from you now. So, you might as well not bother sending me any more spam because I will never buy anything from you due to this incident."

I've had more than one situation where I did this after the second or third instance, and I coincidentally never got any more afterwards.

TAKING THE LAZY WAY OUT AND DELETING THE E-MAIL IS NOT THE ANSWER TO THE PROBLEM!!

Rob Gardiner

John,

My understanding is that responding to spam IN ANY WAY simply adds to the problem because you have let the spammer know 2 things:

2. You read and respond to spam.

Even if you get removed from this one company's mailing list, they will simply sell your name to other spammers as a "good" target because of the 2 facts above. At one point I tried to deal with spam by "unsubscribing" as much as possible. The result was a huge increase in spam volume.

Going straight to the ISP sounds like it may be effective but it is totally impractical for me considering the volume of spam I receive. Over 200 messages in about five days. It is a big enough burden to simply delete them. Taking the time to track down ISPs seems like it would be a waste of time, for me. May I ask, what volume of spam are you dealing with? Maybe you could describe your method for tracking down an ISP and give an example.

I'd love to hear more about this method, or any other methods you folks have for dealing with spam.

Gabriel_Lam

I'm using Cloudmark's SpamNet. It integrates directly into Outlook and seems to work very well. I get at least 150 spam emails a day, and SpamNet filters all but maybe 10 or so of these.

Jeremy_Watson

I use SpamNet as well, and have been very pleased with the results. Obviously, it won't catch every bit of spam, but it helps.

John_Berger

Mark Zimmer

A word of advice: SpamAgent is terrible & doesn't work very well at all. It caught about 5% of the spam coming in. There's zero support from the mfr--the email address for support is dead and the online help doesn't register your questions and requests for help. Stay far, far away.

RandyObert

As a hosting provider I can assure you that SPAM is taken pretty seriously by any upstream provider.

I suggest http://spamcop.net
I use this in conjunction with Mcaffe SpamKiller, It will forward the entire message to the designated address for you.

You merely forward the entire email, headers and all to an address issued to you by spamcop. It will parse the headers and prepare UCE complaints to the actual providers as well as the upstream providers. This eliminates the problem of verifying your email address to the spammer.

You know the biggest contributing factor to spam?? A person actually using one of the merchants. Boycott of spammers is the best attack. You spam, you don't get my money PERIOD...

Rob Gardiner

Thanks, John, for that rundown. It looks like a little more than I'm willing to do, but I hope others find your technique useful. Yes I have thought about changing my e-mail address. I have had the same address since 1995 and I used to post to usenet, sign up for mailings lists, etc. I'm a bit smarter now. But on the other hand, long lost friends have been able to contact me by remembering my address and for this reason I am reluctant to change. Also I admit a bit of loyalty to my very first ISP, which has never let me down.

------------

Gabriel & Jeremy,

I took a look at SpamNet and that sounds like a very interesting idea. Sort of like a distributed network. My fear is that legitimate messages would potentially be blocked by this method. I'd like to point out that POPFile has two advantages over SpamNet. 1, it is open source so it is absolutely FREE. 2, since it is a proxy mail server rather than an Outlook plug-in, it is compatible with EVERY e-mail program.

-------------

Randy,

SpamCop sounds very intriguing. Perhaps I will set this up one day, in order to have a more meaningful impact. I like the idea of a complaint being filed against the spammer.

The reason I am so interested in simply filtering out spam that comes into my inbox rather than preventing it from being sent is because it is a major inconvenience to check my e-mail to see if I got a reply I am waiting for, and to have to sort through 40 unrelated and unwanted messages to see if my real mail is there.

Steve_Ch

I've been a Mailshell customer for a couple of years and been happy with them. Most people do not want to pay for a filtering service, but I have my domain and pop mail with them so the package deal is pretty reasonable.
The good thing about it is with my own domain, I have unlimited mail addresses and mail boxes with the service. With spam filtering, they have a ton of stuff that one can setup, but being lazy, I did not bother to look at them, so I just set my default detection level to the lowest they offer (if it's not good enough, I can always raise the filter level), and even at this "non special config" base level, I can't remember any spam gotten through in the 2 years since I had the service. It occasionally "over" filters, because of some of the "junk" that I signed up for and I also buy and sell a lot on EBay, but it can be easily corrected by telling it to accept mail from this sender.

BrianW

Real Name:
Brian

Rob Gardiner

Brian,

I am very happy with Windows 2000 Professional and do not anticipate upgrading in the immediate future.

Mark Paquette

Rob Gardiner

Mark,

You may want to give POPFile a try. It is absolutely FREE.

I understand that Mozilla's e-mail client (the one that comes with their browser) has a Bayesian spam filter built-in. This is also free software but I have not tried it.

John Watson

On dealing with spam - I don't get a lot (so far), but I'd say 40 "adult" spams (I've taken to calling it Sporn) have come to one of my addresses recently. When I right click and look at Properties, I can see the originating e-mail address, and there are very few duplicates, but most seem to have a similar pattern, incorporating my e-mail address into the source address. Which merely proves that e-mail addresses are abominably easy to create, and that Block Sender is of no use.

But it appears from the above posts that identifying or contacting the ISP, in order to complain about their customers who are sending the spam, is way beyond the average user's ability or computer savvy.

So I will change the e-mail address, but the comments in several posts about that, that it doesn't end the problem, intrigue me.

When the spammer gets an "undeliverable" message back, does he do anything anyway?

And if these measures fail to stem the rising tide of e-mail messages, will they spin as ever increasing arrays of electrons, eternally, thru the ether? Well, I don't want that question answered now (tho I would like some day to know where a song goes when I'm not singing it.)

So I still think the prospect of out-lawing spam (apparently California is legislating on this now), or of making it of some cost to the sender (a tax or charge based on the originating volume of use) could be explored.

Steve_Ch

>>Which merely proves that e-mail addresses are abominably easy to create, and that Block Sender is of no use.

John_Berger

Jeff Savage

If you want to help combat spammers please feel free to forward all spam mail you receive to the Federal Government at [email protected] They are really interested in helping out. I do this all the time.

Also I use this e-mail wording lots of times although I don't think the un-remove folks actually read them. I still like doing it :

Please remove this e-mail address from all of your SPAM databases. You are receiving this notification for removal in accordance with government regulations. Failure to comply with this request may result in legal action.

Like I said I just like saying it to someone.

Laters,
Jeff

MikeAlletto

