New gTLD "validation" problems...
I did a little Google search for e-mail validation tools and tricks,
and checked for assumptions on what TLDs look like. (Searches:
"email validation javascript", "email valid javascript", "email
validation asp", and so on.)
Looking at the results, there's an awful lot of sometimes bad,
sometimes horrible code (in JavaScript, VBScript, Perl, and friends)
and regular expressions around.
For a sample dirty dozen, see the link list in the end of this
e-mail. All of these are from either the Google top ten or top
twenty for some of the searches I did.
The most typical mistakes include assuming that:
- TLDs are 2-N characters long, with N ranging anywhere from 3 to 6
(In fact, 3 and 6 seem to be the most common upper bounds assumed;
the maybe most absurd case had 4 and a comment explicitly
referencing .info...)
- It is a good idea to have a hard-coded list of TLDs. Such lists
frequently *include* the current set of new gTLDs, so these are
good news for the current new gTLD operators, and really bad news
for the next round.
(In one case, there was at least a comment referencing ICANN and
the need to update -- but, of course, these JavaScript code
snippets are the kind of stuff which gets deployed and forgotten,
so that comment is worthless.)
Remarkably, most of the code I looked at just accepted two-letter
TLDs, with just one (probably not so popular) exception that would
only accept ".tv" and ".us".
In general terms, I'd suggest that any advisory the GNSO may
initiate on the topic of acceptance problems with respect to new
TLDs should generally take up the basic theme that the root zone is
a dynamic thing, and that operators and programmers should not make
unwarranted assumptions on what's in there.
Besides the kinds of programming errors mentioned above, that brings
up two more dangerous practices:
1. Downloading a copy of the root zone, installing that on a
resolver running bind, practically turning that resolver into a
root server. If the root zone copy isn't updated regularly,
things will break -- not just when new TLDs are added, but also
when existing TLDs migrate to different servers. (What's the
transition plan for .org, again?) I have no idea how common this
kind of setup actually is.
2. Using fake TLDs for local networks. It's not uncommon to just
use a random, unused TLD for machines on an intranet; these host
names aren't supposed to be seen on the Internet. Of course,
it's extremely easy to screw up this kind of setup, and to
inadvertently create a "local" collision with a future TLD.
Fixing setups like this might get quite costly.
At the same time, all this indicates that the "visibility" problems
for new gTLDs will persist for quite some time.
The dirty dozen address validators:
http://sageweb.sage.org/resources/publications/perl/perl17.html
http://www.hexillion.com/samples/#Regex
http://www.xs4all.nl/~ppk/js/mailcheck.html?email=webmaster%40nic.museum
http://insights.iwarp.com/advanced/javascript/validate/formemail.html?email=roessler%40does-not-exist.info&button=Submit
http://javascript.internet.com/forms/email-address-validation.html?email=roessler%40does-not-exist.info
http://members.blue.net.au/felgall/emailval.js
http://forums.devshed.com/archive/1/2002/09/4/44410
http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20572818.html
http://www.js-examples.com/example/?ex=946&mode=1
http://javascriptkit.com/script/script2/acheck.shtml?emailcheck=roessler%40does-not-exist.info
http://www.aspfree.com/examples/1574,1/examples.aspx
http://www.123aspx.com/resdetail.aspx?sfm=308&res=890
--
Thomas Roessler <roessler@xxxxxxxxxxxxxxxxxx>