<<< Date Index >>>     <<< Thread Index >>>

Re: Stripping Yahoo bottom ads?



On Thu, Oct  9, 2003, Seth Williamson wrote:
> Does anybody know if there's a way to strip out the bottom-posted ads
> that frequently come with list messages posted to Yahoo lists?

Attached is a perl script that I use.  And this is what I have in my
.procmailrc:


:0 fw
* ^Mailing-List:.*@(egroups|yahoogroups)
| adfilter.pl egroups

:0 fw
* ^List-Id:.*freshmeat-news
| adfilter.pl freshmeat

:0 fw
* ^Received:.*by [a-zA-Z0-9\.\-]*(hotmail|msn)\.com with HTTP
| adfilter.pl hotmail

:0 fw
* ^Received:.*by [a-zA-Z0-9\.\-]*mail[a-zA-Z0-9\.\-]*yahoo\.com
| adfilter.pl yahoo

:0 fw
* ^Received:.*by [a-zA-Z0-9\.\-]*yahoomail\.com
| adfilter.pl yahoo

#!/usr/local/bin/perl

# adfilter.pl by Mikko Hänninen <Mikko.Hanninen@xxxxxx>  20.09.2000
#
# Used to filter adverts from emails (called from procmail, maildrop, etc.)
# Takes email from stdin or filename as argument, output to stdout
# Current ad types filtered: all, egroups, freshmeat, hotmail, yahoo
#
# usage: adfilter.pl [ad type]
#
# eg. in procmail:
#
# :0 fw
# * ^Mailing-List:.*@(egroups|yahoogroups)
# | adfilter.pl egroups
#
# :0 fw
# * ^List-Id:.*freshmeat-news
# | adfilter.pl freshmeat
#
# :0 fw
# * ^Received:.*by [a-zA-Z0-9\.\-]*(hotmail|msn)\.com with HTTP
# | adfilter.pl hotmail
#
# # this matches *.mail.europe.yahoo.com as well as *.mail.yahoo.com
# :0 fw
# * ^Received:.*by [a-zA-Z0-9\.\-]*mail[a-zA-Z0-9\.\-]*yahoo\.com
# | adfilter.pl yahoo
#
# :0 fw
# * ^Received:.*by [a-zA-Z0-9\.\-]*yahoomail\.com
# | adfilter.pl yahoo
#
#
# or, if you're brave
#
# :0 fw
# | adfilter.pl all
#
# (I personally do not use this, so take that as word of caution)
#
#
# Warning!  This script filters out information from emails -- as such
# it might also remove parts which you might have wanted to see.  The
# usual disclaimer of "Use at your own risk" applies.
#
#
# Source homepage: http://www.wizzu.com/mutt/
# Last change: 09.03.2001


$adtype = lc(shift);
exit(1) unless $adtype;

$_ = join("", <>);

if ($adtype eq "egroups" || $adtype eq "all") {
  s/\n(> )*-{10,40} (Yahoo\! Groups|eGroups) Sponsor 
-{10,40}~-->\n(.{0,500}\n){1,5}(> )*.{60,70}->\n+/\n/m;
  #s/\n-{10,40} (Yahoo\! Groups|eGroups) Sponsor 
-{10,40}~-~>\n(.{0,500}\n){1,5}-{60,70}_->\n+/\n/m;
  #s/\n+-{68}<e\|-\n(.{0,140}\n){1,5}-{68}\|e>-\n\s*$/\n\n/m;
}
#if ($adtype eq "egroups2" || $adtype eq "egroups"  || $adtype eq "all") {
#  s/\n+-{72}\n.{0,50}begin eGroups banner.{0,50}\n(.{0,140}\n){1,10}.{0,50}end 
eGroups banner.{0,50}\n-{72}\n+/\n\n/m;
#}
if ($adtype eq "freshmeat" || $adtype eq "all") {
  s/\n *\n  \[ advertising \]\n[^[]*(\n  \[ [\w ]+ \]\n)/\n$1/m;
}
if ($adtype eq "hotmail" || $adtype eq "all") {
  s/\n_{72,100}\n.{0,80}(\.hotmail\.com|\.msn\.com).{0,80}\n+$/\n/m;
}
if ($adtype eq "yahoo1" || $adtype eq "yahoo" || $adtype eq "all") {
  s/\n_{50,72}\nDo You Yahoo!\?\n(.{0,100}\n){1,4}\n*$/\n/m;
}
if ($adtype eq "yahoo2" || $adtype eq "yahoo" || $adtype eq "all") {
  s/\n-{10,72}\nDo You Yahoo!\?\n(.{0,100}\n){1,4}--/\n--/m;
}
if ($adtype eq "yahoo3" || $adtype eq "yahoo" || $adtype eq "all") {
  s/\n_{10,72}\n.{0,40}Yahoo! Mail.{0,30}\n(.{0,100}\n){1,4}$/\n/m;
}

print $_;
exit(0);