[IP] more on WH censoring its website?
From: Nathan Dintenfass <nathan@xxxxxxxxxxxxxxx>
IPers might be interested to get a few more facts on this. The robots.txt
file at whitehouse.gov has a lot of lines like:
Disallow: /holiday/2002/art/iraq
That tells a search engine not to index anything under
http://www.whitehouse.gov/holiday/2002/art/iraq
But, that directory doesn't even exist! I did a little snooping and found
that Of the 1562 total "Disallow" lines:
700 end with "iraq"
855 do not end with "iraq"
The interesting part is that of those that end with "iraq" ALL of them match
another "Disallow" line except the matching line has something other than
"iraq" -- in other words, it looks like a systematic effort to add "iraq" to
existing exclusion directories. A rough glance did not make it clear what
the pattern is for the 155 entries that don't have an "iraq" partner on the
list.
Although it does show a systematic effort, it also DOES NOT show an attempt
to exclude any particular content from a directory with "iraq" (or, perhaps
it's even more elaborate than we think and most of them are a smokescreen)
in it's URL. Something is fishy about this, but it doesn't seem to be as
simple as the first impressions floating around the blogsphere suppose.
-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
http://v2.listbox.com/member/?listname=ip
Archives at: http://www.interesting-people.org/archives/interesting-people/