<<< Date Index >>>     <<< Thread Index >>>

Re: [Full-disclosure] URI handling woes in Acrobat Reader, Netscape, Miranda, Skype



Geo. ha scritto:
> I don't agree. Whatever program takes input from an untrusted source, it's 
> that programs duty to sanitize the input before passing it on to internal 
> components. It's like a firewall, you filter before it gets inside the 
> system.

NO! wrong! stop the "input sanitization" fallacy! Input is perfectly
fine. Input goes into a parser, an abstract form comes out. You operate
internally on the abstract form, which is (hopefully!) designed to avoid
all the ambiguities of the text form you get from input. You take UTF-8
in? you'd better operate on its UTF-16 or UTF-32 equivalent then. You
take XML? deserialize into the corresponding object from your data
model. And so on. The interesting issue is handing out your abstract
form to an outside component. Think about it. SQL injections? outputting
to a database engine. XSS? outputting to a DOM engine. Path traversal?
outputting to an I/O model's path parser. This is not something you can
just shoehorn in your application at any later time. You need to know
beforehand if you have any problematic external components, which e.g.
won't take Unicode input, and devise strategies that don't cause loss of
fidelty: store the password's SHA-1 in hex maybe? or pass the sort key
(see <http://blogs.msdn.com/michkap/archive/2007/09/24/5085893.aspx>) in
the invariant locale instead of the actual string, if the component only
needs to be able to compare strings for equality/collation order. No
excuses. Information is sacred and irreplaceable, and input filtering an
unacceptable blasphemy. If you filter input, you haven't thought hard enough

> Example, an ftp server has to sanitize filenames to prevent useage of 
> streams on NTFS, you don't blame the filesystem that the input gets passed 
> to, it's the job of the ftp server to do the sanitizing of untrusted input.

See what I mean? The server needs to deserialize the input (converting
it from whatever implied charset into an internal representation), and
then serialize it again into the native form of the target component
it's outputting to (UTF-16 Unicode in NT path syntax, in this case). If
the format is a delimited or otherwise marked-up string (as paths are),
special characters must be quoted. If a character cannot be quoted (as
the NTFS colon can't), you raise an error. The error bubbles up to the
client. That's the way you do it. Not any other way. You are getting it
right only by chance, you are messing up the hierarchy of components and
their responsibilities. You are forcing the layer that is the furthest
from the external component to deal with its subtleties