<<< Date Index >>>     <<< Thread Index >>>

Who is spamming me - a bit of statistics



Sent to fitug-debate (actually a nontechnical discussion list) and to
spamassassin-talk. Reply-To set to me personally.

Please adjust accordingly.

A corpus of spam, freshly collected:

$ ls -l ~/Mail/OLD
total 96988
-rw-------    1 kris     kiel      1676771 2003-09-24 23:59 
spammed-probable.01.gz
-rw-------    1 kris     kiel      2510905 2003-09-23 23:48 
spammed-probable.02.gz
-rw-------    1 kris     kiel      1863673 2003-09-22 23:57 
spammed-probable.03.gz
-rw-------    1 kris     kiel      1014158 2003-09-21 23:54 
spammed-probable.04.gz
-rw-------    1 kris     kiel       617841 2003-09-20 23:16 
spammed-probable.05.gz
-rw-------    1 kris     kiel      2861005 2003-09-20 06:13 
spammed-probable.06.gz
-rw-------    1 kris     kiel       108846 2003-09-17 21:07 
spammed-probable.07.gz
-rw-------    1 kris     kiel        12130 2003-09-16 19:48 
spammed-probable.08.gz
-rw-------    1 kris     kiel        14029 2003-09-15 21:09 
spammed-probable.09.gz
-rw-------    1 kris     kiel        35414 2003-09-15 01:51 
spammed-probable.10.gz
-rw-------    1 kris     kiel     10032896 2003-09-24 23:58 spammed-sure.01.gz
-rw-------    1 kris     kiel     18746508 2003-09-23 23:58 spammed-sure.02.gz
-rw-------    1 kris     kiel     17935355 2003-09-22 23:57 spammed-sure.03.gz
-rw-------    1 kris     kiel     13535730 2003-09-21 23:48 spammed-sure.04.gz
-rw-------    1 kris     kiel     11984834 2003-09-20 23:57 spammed-sure.05.gz
-rw-------    1 kris     kiel     13597743 2003-09-20 08:40 spammed-sure.06.gz
-rw-------    1 kris     kiel       474242 2003-09-17 23:56 spammed-sure.07.gz
-rw-------    1 kris     kiel       665272 2003-09-16 23:59 spammed-sure.08.gz
-rw-------    1 kris     kiel       719339 2003-09-15 23:48 spammed-sure.09.gz
-rw-------    1 kris     kiel       584819 2003-09-15 06:42 spammed-sure.10.gz

Who sent me spam? Find out in perl:

$ cat ~/Mail/p.pl
#! /usr/bin/perl --

$hostname = "p15104972";

while (<>) {
        chomp;
        if (/^\s+/) {
                $line .= $_;
        } else {
                $line = $_;
        }

        if ($line =~ /^From /) {
                $state = "newmail";
        }
        if ($line =~ /Content-Description: original message before 
SpamAssassin/) {
                $state = "spammail";
        }
        
        if ($line =~ /^$/ and $state eq "newmail") {
                $state = "body";
        }

        if ($line =~ /^$/ and $state eq "spammail") {
                $state = "newmail";
        }

        if ($state eq "newmail" and $line =~ /^Received:/) {
                $line =~ /\[(.*?)\].*by\s+$hostname/;
                print "$1\n" if ($1 ne "" and $1 ne "127.0.0.1");
        }
}

Applied to my corpus above:

$ cd Mail/OLD
$ gzip -dc *gz | ~/Mail/p.pl > log
$ wc -l ~/Mail/OLD/log
   6614 /home/kris/Mail/OLD/log
$ sort ~/Mail/OLD/log | uniq -c | sort -rn > ~/Mail/OLD/log2
$ wc -l ~/Mail/OLD/log2
   1238 /home/kris/Mail/OLD/log2
$ head -10 ~/Mail/OLD/log2
    980 195.244.243.1
    532 193.98.110.1
    498 193.158.124.58
    196 193.110.157.89
     56 24.201.245.36
     40 209.225.8.34
     40 204.127.202.56
     34 216.148.227.85
     34 209.225.8.29
     32 204.127.202.55

These are my secondaries, an old mail address kris@xxxxxxxxxxx,
which I have not been using for years, and the freeswan mailing
list, which I can really live without.

$ awk '$1 > 8 { print $2 }' ~/Mail/OLD/log2| xargs -i dig -x {} | grep PTR > 
~/Mail/OLD/log3

This finds 64 machines that have me sent more than 8 spams, 58
of which resolve reverse.

$ perl -ne 'split; print join(".", reverse split(/\./, $_[4])), "\n";' 
~/Mail/OLD/log3 | sort > ~/Mail/OLD/log4
$ cat ~/Mail/OLD/log4

au.net.iprimus.syd.smtp01
be.skynet.ferengi
be.skynet.gallantin
be.skynet.kira
be.skynet.sarek
be.skynet.sojef
ca.videotron.relais
com.btconnect.dswu26
com.btinternet.protactinium
com.cbeyond.atl.smtp
com.latinmail.smtp
com.ntlworld.mta02-svc
com.ntlworld.mta06-svc
com.rr.nyroc.ms-smtp-02
de.netuse.ns1
de.netuse.nuki
de.netzservice.hh.proxy
de.sczn.secondary
de.toppoint.archer
it.tin.vsmtp1
it.tuttopmi.fep01
lt.takas.mail-src
net.bellsouth.mail.imf16aec
net.bellsouth.mail.imf18aec
net.bellsouth.mail.imf19aec
net.bellsouth.mail.imf20aec
net.bellsouth.mail.imf22aec
net.bellsouth.mail.imf24aec
net.bellsouth.mail.imf25aec
net.charter.cluster1.remt19
net.charter.cluster1.remt20
net.charter.cluster1.remt21
net.charter.cluster1.remt22
net.charter.cluster1.remt23
net.charter.cluster1.remt24
net.charter.cluster1.remt25
net.charter.cluster1.remt26
net.charter.cluster1.remt27
net.charter.cluster1.remt28
net.charter.cluster1.remt29
net.comcast.rwcrmhc11
net.comcast.rwcrmhc12
net.comcast.rwcrmhc13
net.comcast.sccrmhc11
net.comcast.sccrmhc12
net.comcast.sccrmhc13
net.entelchile.ismtp5
net.entelchile.mail.real1.test_web_temp
net.libertysurf.mail
net.qwest.inet.mpls-qmqp-02
net.surewest.smtp2
net.telus.defout
net.telus.outbound02
net.telus.outbound04
org.freeswan.mj2
pt.telepac.mail.fep01-svc
pt.telepac.mail.fep02-svc
ro.rdsnet.mail3

The de-Addresses are just the secondaries of mine and the
Toppoint.de-address. The rest is a surprisingly short list when
you look at just the domains.

Perhaps SpamAssassin should really maintain a list of IP numbers
which have sent detected spam within the last n hours, and I
should build a sendmail access table from that every night.

If you repeat that analysis on your corpus, can you reproduce my
results?

Thought for improvement:

What happens if you take only the domain names of the above
hosts, resolve their MXes and list their mail servers - will
that result in a better blocking closure?

Kristian

-- 
To unsubscribe, e-mail: debate-unsubscribe@xxxxxxxxxxxxxx
For additional commands, e-mail: debate-help@xxxxxxxxxxxxxx