<<< Date Index >>>     <<< Thread Index >>>

[OT] Safe spam filtering methods (was: Is predictable spam filtering a vulnerability?)



On Fri, Jun 18, 2004 at 11:05:01PM +0100, Andrew Hunter wrote:
[...]
> Email "Dear Andiroo, I have found your pen, it was under my desk. You PEN 
> IS now in the top draw of your desk".
> 
> Ok i lost my sepcial pen, my friend has found it but look "PEN IS" is like 
> "PENIS" so it's been taken by the spam filter.

Which is why I don't ever rely on single-factor characterization to
determine whether something is or is not spam. At work and at home I
use SpamAssassin. It applies thousands of heuristics (which can even
be tuned on a per-user basis depending on how you deploy it) to
score a message as to how likely it is to be spam. You can use it to
simply flag messages quietly in the header, alter the subject and
body, or just ditch the message completely (though this is always a
bad idea in my opinion). In my deployments I find a successful
filter rate of better than 99% with nearly 0% false positives (hard
to characterize since it is pretty much always somebody making
multiple mistakes like getting their MTA blacklisted, having their
computer's clock set a year in the future *and* sending HTML-only
E-mail at the same time).

And before anybody jumps on the "but that would cream my MTA" train,
last week I benchmarked a modestly-tuned Debian E-mail gateway at
work (running Exim v4) filtering messages through AMaViS,
SpamAssassin and Clam AntiVirus. It was able to handle more than
8000 random incoming messages in under an hour, 99%+ of which were
spam, on an old 2-way SMP Pentium III 600MHz server I pulled from
our scrap pile. A modern server or farm would be capable of much,
much more.

> My solution for spam:
> I think there should be a huge database on spam emails, just like an anti 
> virus scanner but for spam. I think it is that simple have an anti-virus 
> but for spam, i am sure that if i get a spam email someone else will have 
> exactly the same email so if i can submit it to the database and it's added 
> to it quickly so everyone can get the updates then there would be no 
> problem, but there is soooo much spam out there we would for ever have to 
> update or ever growing in size databases.
> 
> I think this would eliminate alot of spam, I have ran out of ideas for 
> preventing spam emails, so what other effective solutions already out there?

This is far from a new idea. Both the first release of Pyzor and
Vipul's Razor version 2 were over 2 years ago. I use recent versions
of each as factors in my SpamAssassin scoring, along with multiple
RBL checks, Bayesian analysis with end-user feedback (a.k.a.
supervised training), automated sender whitelisting and much more.

URLs for the applications I've mentioned above (all of which are
free, open-source, community-supported projects):

   SpamAssassin: http://www.spamassassin.org/
   AMaViS: http://www.amavis.org/
   Pyzor: http://pyzor.sourceforge.net/
   Vipul's Razor: http://razor.sourceforge.net/
   Clam AntiVirus: http://www.clamav.net/
   Exim: http://www.exim.org/
   Debian GNU/Linux: http://www.debian.org/

Hope that helps.
-- 
{ IRL(Jeremy_Stanley); PGP(9E8DFF2E4F5995F8FEADDC5829ABF7441FB84657);
SMTP(fungi@xxxxxxxxxxx); IRC(fungi@xxxxxxxxxxxxxxx#ccl); ICQ(114362511);
AIM(dreadazathoth); YAHOO(crawlingchaoslabs); FINGER(fungi@xxxxxxxxxxx);
MUD(Nergel@xxxxxxxxx:2325); WWW(http://fungi.yuggoth.org/); }