<<< Date Index >>>     <<< Thread Index >>>

Re: HTML email, was Re: reading color quoted replies



On Thu, Feb 08, 2007 at 06:34:19PM +0100, Rado S wrote:
> > Is there still considerable danger in dumping html via w3m or
> > some other html to text converter?

Well, theoretically, any time you operate on data provided by someone
who may not be trustworthy, you face a risk.  The magnitude of the
risk is dependent on the complexity of the program you're using to
process it.

I think most of the threat here is from javascript and stuff like that
which has no analog in plain text and would be filtered out.  The only
problem then would be a "data-directed attack" against the HTML
parser.  This would typically involve a buffer overflow of some kind
in the parser.  One thing you can try to do is sandbox it, via chroot
or jail or whatever you fancy.  The program isn't going to need to
access anything else, and has simple I/O (HTML in, text out), and
probably doesn't invoke any external programs so this shouldn't be
hard at all.

In practical terms, shoot for a program written in a HLL like python,
perl, ruby or ocaml, if you can find one.  They don't suffer from as
many problems as C programs, and speed isn't really an issue.

You would probably be very safe even without any of these procedures,
unless someone who knew you were doing this conversion, could guess
which one, and with good exploitation skills took a personal interest
in you.  In any case, if there were a bug in HTML parsers, it'd
likely be discovered on some of the phishing websites before email.
There just aren't enough people doing this to justify the time.
-- 
Good code works.  Great code can't fail. -><-
<URL:http://www.subspacefield.org/~travis/>
For a good time on my UBE blacklist, email john@xxxxxxxxxxxxxxxxxx

Attachment: pgpN9YzlXYYTY.pgp
Description: PGP signature