On Thu, Feb 08, 2007 at 06:34:19PM +0100, Rado S wrote: > > Is there still considerable danger in dumping html via w3m or > > some other html to text converter? Well, theoretically, any time you operate on data provided by someone who may not be trustworthy, you face a risk. The magnitude of the risk is dependent on the complexity of the program you're using to process it. I think most of the threat here is from javascript and stuff like that which has no analog in plain text and would be filtered out. The only problem then would be a "data-directed attack" against the HTML parser. This would typically involve a buffer overflow of some kind in the parser. One thing you can try to do is sandbox it, via chroot or jail or whatever you fancy. The program isn't going to need to access anything else, and has simple I/O (HTML in, text out), and probably doesn't invoke any external programs so this shouldn't be hard at all. In practical terms, shoot for a program written in a HLL like python, perl, ruby or ocaml, if you can find one. They don't suffer from as many problems as C programs, and speed isn't really an issue. You would probably be very safe even without any of these procedures, unless someone who knew you were doing this conversion, could guess which one, and with good exploitation skills took a personal interest in you. In any case, if there were a bug in HTML parsers, it'd likely be discovered on some of the phishing websites before email. There just aren't enough people doing this to justify the time. -- Good code works. Great code can't fail. -><- <URL:http://www.subspacefield.org/~travis/> For a good time on my UBE blacklist, email john@xxxxxxxxxxxxxxxxxx
Attachment:
pgpN9YzlXYYTY.pgp
Description: PGP signature