Re: Thoughts and a possible solution on homograph attacks

To: bugtraq@xxxxxxxxxxxxxxxxx
Subject: Re: Thoughts and a possible solution on homograph attacks
From: Nick FitzGerald <nick@xxxxxxxxxxxxxxxxxxx>
Date: Tue, 22 Mar 2005 11:31:34 +1200
In-reply-to: <200503201507.PAA09003@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:bugtraq-help@securityfocus.com>
List-id: <bugtraq.list-id.securityfocus.com>
List-post: <mailto:bugtraq@securityfocus.com>
List-subscribe: <mailto:bugtraq-subscribe@securityfocus.com>
List-unsubscribe: <mailto:bugtraq-unsubscribe@securityfocus.com>
Mailing-list: contact bugtraq-help@xxxxxxxxxxxxxxxxx; run by ezmlm
Organization: Personal account
Priority: normal
Reply-to: nick@xxxxxxxxxxxxxxxxxxx

Duncan Simpson wrote:

> Homograph attacks might be a closed subject but nobody has mentioned this, so
> maybe I should. Surely it is possible for a web browser to apply some similar
> character mapping rules and react only if it finds something.
>
> Thus if the IDN looks like www.ebay.com on the screen the web browser will
> notice www.ebay.com exists, pop up a warning and deny access if you just click
> OK. An option safe from those who just click OK without reading anything could
> allow access to those websites.
>
> The best fix would be to stop the registry's granting homograph names to 
> random
> people and revoking he existing ones with immediately effect but I do think
> this is within the power of bugtraq.

In fact, it seems that the Unicode folk are thinking about these issues
and similar solutions.

   http://unicode.org/reports/tr36/

   Draft Unicode Technical Report #36

   Security Considerations for the Implementation of Unicode and
   Related Technology

   Summary

   This document describes security considerations that are important
   to be aware of when working with Unicode, and provides specific
   recommendations for dealing with the issues that arise.

   [ToC]

   1. Introduction

   Unicode represents a very significant advance over all previous
   methods of encoding characters. For the first time, all of the
   worlds characters could be represented in a uniform manner, for the
   first time making it feasible for the vast majority of programs to
   be globalized: built to handle any language in the world.

   In many ways, the use of Unicode makes programs much more robust and
   secure. When systems need to use a hodge-podge of different charsets
   for representing characters, it was possible to take advantage of
   differences between those charsets, or in the way in which programs
   converted to and from them.

   However, because Unicode contains such a large number of characters,
   and because it incorporates the varied writing systems of the world,
   incorrect usage can expose programs or systems to possible security
   attacks. This document describes some of the security considerations
   that should be taken into account by programmers, system analysts,
   standards-developers, and users.

   We anticipate that this document will grow over time, adding
   additional sections as needed. Initially, there are two areas that
   will be discussed: canonical representation and visual spoofing. For
   more information, see also the Unicode FAQ on Security Issues.

   Each section below presents a background information on the kinds of
   problems that can occur, then a list of specific recommendations for
   avoiding the problems.

This report discusses most of the issues raised in this thread, and
often from a greater depth of understanding.  For example, several
posters disagreed with my suggestion that "homograph" was the wrong
term, but seemed to only be aware of the problems that could arise from
Latin letters that have effective equivalents in non-Latin character
sets, (necessarily) represented by different codepoints in Unicode.
The examples provided in TR36, although still noted by its author as
needing further samples, provide many other cases of clearly non-
equivalent, but visually confusable characters, especially when
displayed at small-ish point/pixel sizes as is common in web browser
address/location/status/etc bars.

   3. Visual Spoofing

   Visual spoofing is where a similarity in visual appearance fools a
   user, and causes him or her to take unsafe actions. This is not new
   to Unicode: it was possible to spoof simply with ASCII character:
   "inteI.com" for example, uses a capital I instead of an L. The
   infamous example here is of course "paypaI.com":

      ... Not only was "Paypai.com" very convincing, but the scam
      artist even goes one step further. He or she is apparently
      emailing PayPal customers, saying they have a large payment
      waiting for them in their account.

      The message then offers up a link, urging the recipient to claim
      the funds. But the URL that is displayed for the unwitting victim
      uses a capital "i" (I), which looks just like a lowercase "L"
      (l), in many computer fonts. ...

                                            ? Beware the 'PaypaI' scam

   And the spoofs nowadays are pretty clever. One is an email that
   looks like it comes from a trusted source, like your bank. It even
   has an explicit disclaimer to not trust links in email, and directs
   you to copy text to your address bar in your browser. The text looks
   ok to you, so you won't realize that you are going to a completely
   different site, which is then set up to simulate your bank well
   enough to get your password.

   These spoofs depend on the use of visually confusable strings:

   D1.  Two different strings of Unicode characters are said to be
        visually confusable when their appearance in common fonts in
        small sizes at screen resolutions is sufficiently close that
        people easily mistake one for the other.

I'll leave the rest of TR36 to those interested in reading it (it deals
extensively with IDN-based spoofing), and recommend it despite that I
still disagree with the term "homograph", which TR36 uses often.  All
the definitions of "homograph" I've seen have it meaning "same spelling
but different meaning and/or pronunciation and/or origin".  My
suggestion of "pseudograph" much better captures the point of these
things, which is that they are (deliberately) constructed "mis-
spellings" that falsely represent something they are not, or disguise
that they are not what they represent themselves as (think of
"pseudonym" as something of an analog).

Anyway, discussion of the terminology is probably less important than
working out some useful ameliorative actions, so I'm going to drop the
homograph vs. pseudograph issue and suggest anyone interested read TR36
as it makes several suggestions and recommendations, but is a work in
progress and may be amenable to change through your input...

Regards,

Nick FitzGerald

References:
- Re: Thoughts and a possible solution on homograph attacks
  - From: Duncan Simpson

Prev by Date: Re: [ISN] How To Save The Internet
Next by Date: MDKSA-2005:060 - Updated MySQL packages fix multiple vulnerabilities
Previous by thread: Re: Thoughts and a possible solution on homograph attacks
Next by thread: Gene6 FTP Server Local Privilege Escalation Vulnerability
Index(es):
- Date
- Thread