Re: Thoughts and a possible solution on homograph attacks
Duncan Simpson wrote:
> Homograph attacks might be a closed subject but nobody has mentioned this, so
> maybe I should. Surely it is possible for a web browser to apply some similar
> character mapping rules and react only if it finds something.
>
> Thus if the IDN looks like www.ebay.com on the screen the web browser will
> notice www.ebay.com exists, pop up a warning and deny access if you just click
> OK. An option safe from those who just click OK without reading anything could
> allow access to those websites.
>
> The best fix would be to stop the registry's granting homograph names to
> random
> people and revoking he existing ones with immediately effect but I do think
> this is within the power of bugtraq.
In fact, it seems that the Unicode folk are thinking about these issues
and similar solutions.
http://unicode.org/reports/tr36/
Draft Unicode Technical Report #36
Security Considerations for the Implementation of Unicode and
Related Technology
Summary
This document describes security considerations that are important
to be aware of when working with Unicode, and provides specific
recommendations for dealing with the issues that arise.
[ToC]
1. Introduction
Unicode represents a very significant advance over all previous
methods of encoding characters. For the first time, all of the
worlds characters could be represented in a uniform manner, for the
first time making it feasible for the vast majority of programs to
be globalized: built to handle any language in the world.
In many ways, the use of Unicode makes programs much more robust and
secure. When systems need to use a hodge-podge of different charsets
for representing characters, it was possible to take advantage of
differences between those charsets, or in the way in which programs
converted to and from them.
However, because Unicode contains such a large number of characters,
and because it incorporates the varied writing systems of the world,
incorrect usage can expose programs or systems to possible security
attacks. This document describes some of the security considerations
that should be taken into account by programmers, system analysts,
standards-developers, and users.
We anticipate that this document will grow over time, adding
additional sections as needed. Initially, there are two areas that
will be discussed: canonical representation and visual spoofing. For
more information, see also the Unicode FAQ on Security Issues.
Each section below presents a background information on the kinds of
problems that can occur, then a list of specific recommendations for
avoiding the problems.
This report discusses most of the issues raised in this thread, and
often from a greater depth of understanding. For example, several
posters disagreed with my suggestion that "homograph" was the wrong
term, but seemed to only be aware of the problems that could arise from
Latin letters that have effective equivalents in non-Latin character
sets, (necessarily) represented by different codepoints in Unicode.
The examples provided in TR36, although still noted by its author as
needing further samples, provide many other cases of clearly non-
equivalent, but visually confusable characters, especially when
displayed at small-ish point/pixel sizes as is common in web browser
address/location/status/etc bars.
3. Visual Spoofing
Visual spoofing is where a similarity in visual appearance fools a
user, and causes him or her to take unsafe actions. This is not new
to Unicode: it was possible to spoof simply with ASCII character:
"inteI.com" for example, uses a capital I instead of an L. The
infamous example here is of course "paypaI.com":
... Not only was "Paypai.com" very convincing, but the scam
artist even goes one step further. He or she is apparently
emailing PayPal customers, saying they have a large payment
waiting for them in their account.
The message then offers up a link, urging the recipient to claim
the funds. But the URL that is displayed for the unwitting victim
uses a capital "i" (I), which looks just like a lowercase "L"
(l), in many computer fonts. ...
? Beware the 'PaypaI' scam
And the spoofs nowadays are pretty clever. One is an email that
looks like it comes from a trusted source, like your bank. It even
has an explicit disclaimer to not trust links in email, and directs
you to copy text to your address bar in your browser. The text looks
ok to you, so you won't realize that you are going to a completely
different site, which is then set up to simulate your bank well
enough to get your password.
These spoofs depend on the use of visually confusable strings:
D1. Two different strings of Unicode characters are said to be
visually confusable when their appearance in common fonts in
small sizes at screen resolutions is sufficiently close that
people easily mistake one for the other.
I'll leave the rest of TR36 to those interested in reading it (it deals
extensively with IDN-based spoofing), and recommend it despite that I
still disagree with the term "homograph", which TR36 uses often. All
the definitions of "homograph" I've seen have it meaning "same spelling
but different meaning and/or pronunciation and/or origin". My
suggestion of "pseudograph" much better captures the point of these
things, which is that they are (deliberately) constructed "mis-
spellings" that falsely represent something they are not, or disguise
that they are not what they represent themselves as (think of
"pseudonym" as something of an analog).
Anyway, discussion of the terminology is probably less important than
working out some useful ameliorative actions, so I'm going to drop the
homograph vs. pseudograph issue and suggest anyone interested read TR36
as it makes several suggestions and recommendations, but is a work in
progress and may be amenable to change through your input...
Regards,
Nick FitzGerald