Re: Thoughts and a possible solution on homograph attacks
At 00:48 09/03/2005, Nick FitzGerald wrote:
> Maybe it's better to attack this problem on the browser side and have a
Given IDN would seem to be here to stay, I'd say that will be the only
place we can attack it...
Personally, I think the best place to attack it is at the registry -
unfortunately most of those are bound by commercial motivation instead of
trying to be good citizens, or help Internet users in general, so it might
be harder to change their behaviour than change browser behaviour.
My proposal would be:
1) IDNs only allowed on ccTLDs (not gTLDs). After all , the whole point of
IDNs is to have a domain name in the locally readable script to target
people within your own region/nation/etc. gTLDs are to have domains to
target people globally. I see no purpose (other than vanity) to having an
IDN in a gTLD .
2) IDNs should only be allowed to consist of a single character set - be
that Latin, Western European, Japanese, Cyrillic etc.
3) A ccTLD should only allow IDNs in their local character set(s). So, you
couldn't have a cyrillic IDN on a .us domain, and you couldn't have a greek
IDN on a .ru domain.
(4) A domain registry's DRS system should take into account
homograph/pseudograph attacks.
(5) Possibly any domains containing only characters which are graphically
equivalent to latin characters should not be allowed, but I'm not sure of
this one.
I think if IDNs followed these rules they would still keep most of their
benefit, but also make it MUCH harder to have a homograph/pseudograph attack.
(1) is needed as it's still possible to make up certain words using
characters from the cyrillic and/or greek (and possibly others) character
sets that look like words from the latin character set. Having this rule
limits the scope of these possible attacks to people in Russian or Greece
(2) is needed for (hopefully) obvious reasons
(3) is needed for the same reason as (1)
(4) Obvious
This leaves (AFAICS) the only possible attacks being in Russia or Greece
(and possibly others with similar character sets) to a very few
domains. For instance EBAY.RU would still be possible using 0415, 0412,
0410, 0423 as well as 0045, 0042, 0041, 0059.
But, there is only this single combination of cyrillic characters which
could resemble the 'proper' EBAY.RU. If you don't have rule (2), then you
could have lots of different combinations, eg, 0045, 0412, 0041, 0059 etc.
So, having rule (2) means that eBay could register (or use DRS to regain) a
single extra domain to protect themselves and their customers, instead of 16.
Rule (5) would stop even this, but it could cause problems with legitimate
Russian or Greek words which contain only Latin characters. This really
needs behavioural investigation. For instance, if a Greek Internet user saw
WWW.KAPPA.GR (or www.<another greek word which can have greek or latin
letters), would they type in 'KAPPA' in Greek letters, or in Latin letters?
I suspect they'd type it in latin, in which case rule (5) would be OK, but
if they'd type it in greek, then rule (5) would not be OK.
I think limiting the protection to browsers has several problems:
- it stops IDNs from having most of their usefulness
- it makes browser development more complicated due to no fault of theirs
- in countries where IDNs are widely used, they're still open to attack as
they'd *have to* enable IDNs to be able to use the Internet adequately
- it doesn't just affect browsers, it affects email clients, chat programs,
ftp programs, news readers, etc etc etc
Even if my 'rules' were optional, well behaved registries could use them,
and advertise they use them, then browsers could warn (or un-IDN-ise) IDNs
from other registries, but show IDNs from the well behaved registries in
their proper character set.
Paul VPOP3 - Internet Email Server/Gateway
support@xxxxxxxxxx http://www.pscs.co.uk/