<<< Date Index >>>     <<< Thread Index >>>

Re: Filtering html



On Sun, Jun 12, 2005 at 12:39:18PM +0100, James Mason wrote:
> How can I use mutt, in conjunction with fetchmail and procmail, to pipe
> emails containing html through something like html2text as they arrive,

Do you already use procmail ? It is not clear in your message. If it
is the case, please skip the first section of my answer.

First of all, use procmail by adding the following line
    mda "procmail -d baruchel"
in your .fetchmailrc for each account. Of course, replace "baruchel"
by your login name (not your login name for the POP account but your login
name on the current local machine). Then check you have no ~/.procmailrc
and perform some initial tests : nothing must have changed (your mail
should go normally in your local box). OK ?

Then edit a ~/.procmailrc and do the following things. If you don't have
any ~/.procmailrc, add the following lines at the beginning:

VERBOSE=off
MAILDIR=$HOME/Mail
PMDIR=$HOME/.procmail
DEFAULT=/var/mail/baruchel
LOGFILE=$PMDIR/log

First line : you should put 'on' for the initial tests...
Second line : fix it
Third line : leave it as it is, but create the directory ~/.procmail
Fourth line : fix it
Last line : leave it as it is

Then add a recipe for filtering HTML.
Add the line:
:0 fbw
then add you regexp. I have no idea for the best test for detecting
HTML, but maybe you should study this :
  http://www.mhonarc.org/~ehood/MIME/2045/rfc2045.html#5
It seems that the header involved in your question is Content-Type:
maybe something like that "Content-Type: text/html" but you should
ask in a newsgroup.
Thus it should look like

:0 fbw
* ^Content-Type.*text/html
| my_program

where your program will be a filter for the body (probably lynx --dump or
anything you want).

If you find several regexp, you should use
* ^Content-Type.*(text/html|another/regexp|still_another/regexp)
| html2text

where | means OR and ( ... ) makes a group.

Of course, be careful, because if you do something wrong, you will
lose the content of your mailbox (I suggest you work first on a junk
mailbox with nothing important in it).

Hope it will help,

-- 
Thomas Baruchel