Re: Filtering html
On Mon, 13 Jun, 2005, archaiesteron@xxxxxxxxxxx wrote:
> On Sun, Jun 12, 2005 at 12:39:18PM +0100, James Mason wrote:
> > How can I use mutt, in conjunction with fetchmail and procmail, to pipe
> > emails containing html through something like html2text as they arrive,
>
> Do you already use procmail ? It is not clear in your message. If it
> is the case, please skip the first section of my answer.
>
> First of all, use procmail by adding the following line
> mda "procmail -d baruchel"
> in your .fetchmailrc for each account. Of course, replace "baruchel"
> by your login name (not your login name for the POP account but your login
> name on the current local machine). Then check you have no ~/.procmailrc
> and perform some initial tests : nothing must have changed (your mail
> should go normally in your local box). OK ?
>
> Then edit a ~/.procmailrc and do the following things. If you don't have
> any ~/.procmailrc, add the following lines at the beginning:
>
> VERBOSE=off
> MAILDIR=$HOME/Mail
> PMDIR=$HOME/.procmail
> DEFAULT=/var/mail/baruchel
> LOGFILE=$PMDIR/log
>
> First line : you should put 'on' for the initial tests...
> Second line : fix it
> Third line : leave it as it is, but create the directory ~/.procmail
> Fourth line : fix it
> Last line : leave it as it is
>
> Then add a recipe for filtering HTML.
> Add the line:
> :0 fbw
> then add you regexp. I have no idea for the best test for detecting
> HTML, but maybe you should study this :
> http://www.mhonarc.org/~ehood/MIME/2045/rfc2045.html#5
> It seems that the header involved in your question is Content-Type:
> maybe something like that "Content-Type: text/html" but you should
> ask in a newsgroup.
> Thus it should look like
>
> :0 fbw
> * ^Content-Type.*text/html
> | my_program
>
> where your program will be a filter for the body (probably lynx --dump or
> anything you want).
>
> If you find several regexp, you should use
> * ^Content-Type.*(text/html|another/regexp|still_another/regexp)
> | html2text
>
> where | means OR and ( ... ) makes a group.
>
> Of course, be careful, because if you do something wrong, you will
> lose the content of your mailbox (I suggest you work first on a junk
> mailbox with nothing important in it).
I think better is to make a copy:
:0
* ^Content-Type:.*text/html
{
:0 c
original_mail_goes_here
:0
| html2text
}
or like this:
:0 c
* ^Content-Type:.*text/html
original_mail_goes_here
:0 A
| html2text
>
> Hope it will help,
>
> --
> Thomas Baruchel
--
(dogmaT
(icq 303140614)
(jabber dogmat_at_njs_dot_netlab_dot_cz)
(mail dogmat_at_dogmat_dot_us)
(web http://dogmat.us))