[IP] The perils of Googling

To: ip@xxxxxxxxxxxxxx
Subject: [IP] The perils of Googling
From: Dave Farber <dave@xxxxxxxxxx>
Date: Thu, 11 Mar 2004 07:32:05 -0500
List-help: <http://v2.listbox.com/doc/help_sub?list_name=ip@v2.listbox.com>
List-id: <ip@xxxxxxxxxxxxxx>
List-software: listbox.com v2.0
List-subscribe: <mailto:subscribe-ip@v2.listbox.com>, <http://v2.listbox.com/subscribe/?listname=ip@v2.listbox.com>
List-unsubscribe: <mailto:unsubscribe-ip@v2.listbox.com>, <http://v2.listbox.com/member/unsubscribe/?listname=ip@v2.listbox.com>
Reply-to: dave@xxxxxxxxxx
Sender: owner-ip@xxxxxxxxxxxxxx


Delivered-To: dfarber+@xxxxxxxxxxxxxxxxxx
Date: Thu, 11 Mar 2004 00:08:20 -0800
From: Dewayne Hendricks <dewayne@xxxxxxxxxxxxx>

The perils of Googling
By Scott Granneman, SecurityFocus
Posted: 10/03/2004 at 10:27 GMT
<http://www.theregister.co.uk/content/55/36142.html>

Google is in many ways most dangerous website on the Internet forthousands of individuals and organisations, writes SecurityFocus columnistScott Granneman. Most computers users still have no idea that they may berevealing far more to the world than they would want.

I'm not putting down Google. Far from it: it's a great search engine, and Iuse it all the time. I couldn't do my many jobs without Google, so I'vespent some time learning how to maximize its value, how to find exactlywhat I want, how to plumb its depths to find just the right nugget ofinformation that I need. In the same way that Google can be used for good,though, it can also be used by malevolent individuals to root outvulnerabilities, discover passwords and other sensitive data, and ingeneral find out way more about systems than they need to know. And, ofcourse, Google's not the only game in town - but it is certainly thebiggest, the most widely-used, and in many ways the easiest to use.


Throwing back the curtain

Most people just head to Google, type in the words they're looking for, andhit Google Search. Some more knowledgeable folks know that they putquotation marks around phrases, or put a "+" in front of required words ora "-" in front of words that should not appear, or even use Boolean searchterms like AND, OR, and NOT. Greater Google aficionados know about Google'sAdvanced Search page, where you get really specific.

The page that Google provides for its Advanced Search is nice, and it'scertainly easy and full of necessary tips, but if you really want to masterall the tricks that Google offers the dedicated searcher, you need to learnat least some of what is detailed on the Google Advanced Search Operatorspage. For instance, let's say you just type the word "budget" into a Googlesearch box, without the quotation marks. You're going to get over11,000,000 hits, so many that it would take a tremendously long time tofind anything troublesome from a security perspective.

Now try that same search, but include the search operator "filetype"along with it. Using the filetype operator, you can specify the kind offile you're looking for. Google's Advanced Search page lists several commonformats, including Microsoft Word, Microsoft Excel, and Adobe Acrobat PDF,but you actually search for far more than those. Let's change our searchfrom just "budget" to "budget filetype:xls" (again without the quotes; infact, just ignore the quotation marks unless I mention otherwise) and seewhat we get.


63,000 hits and counting

Hmmm ... now we're down to 63,000 hits. Still an overwhelming number, butif you start looking through the first couple of pages, you'll notice someitems of interest if you were an attacker looking for information youshouldn't have. Let's add another operator into the mix.

The "site" operator allows you to narrow down your results to aparticular subdomain, a second-level domain, or even a top-level domain.For instance, if you wanted to find out what Google has indexed atSecurityFocus on the topic of password cracking, try this search:"site:www.securityfocus.com password cracking", which gives you 449results. I often use this trick even when a site provides its own searchengine, as Google's index is often far better than the search that manysites include.

Let's try our search, but stick to the .edu top-level domain, so we'relooking for "budget filetype:xls site:edu". 15,200 hits. Not bad. Thingsare starting to look very interesting.

Let's introduce another tool into your toolbox: the ability to look only onpages that use a certain word or words in their title by incorporating the"intitle" operator into your search. At SecurityFocus, this query wouldnarrow our results list down to only five, an incredible tightening of oursearch: "site:www.securityfocus.com intitle:password cracking" (note that"password" is the only word that must be in the title; "cracking" shouldappear on the page as a search term, but not in the title, since I didn'tplace "intitle:" prior to it).


 Enter the bad guys

Bad guys know about the "intitle" operator, but they know something elsethat makes it even more powerful. Often Web servers are left configured tolist the contents of directories if there is no default Web page in thosedirectories; on top of that, those directories often contain lots of stuffthat the website owners don't actually want to be on the Web. That makessuch directory lists prime targets for snoopers. The title of thesedirectory listings almost always start with "Index of", so let's try a newquery that I guarantee will generate results that should make you sit upand worry: "intitle:"index of" site:edu password". 2,940 results, and many,if not most, would be completely useless to a potential attacker. Many,however, would yield passwords in plain text, while others could be crackedusing common tools like Crack and John the Ripper.

There are other operators, but these should be enough to make the pictureclear. Once you start to think about it, the potentially troublesome wordsand phrases that can be searched for and leveraged should begin to multiplyin your mind: passwd. htpasswd. accounts. users.pwd. web_store.cgi.finances. admin. secret. fpadmin.htm. credit card. ssn. And so on. Heck,even "robots.txt" would be useful: after all, if someone doesn't wantsearch engines to find the stuff listed in robots.txt, that stuff couldvery well be worth a look. Remember, robots.txt just indicates that thewebsite doesn't want search engines to index the files and folders listedin robots.txt; nothing inherently stops users from accessing that contentonce they know it exists.


 Sensitive information

A couple of websites have even sprung up dedicated to listing words andphrases that reveal sensitive information and vulnerabilities. My favoriteof these, Googledorks, is a treasure trove of ideas for the buddingattacker. As a protective countermeasure, all security pros should visitthis site and try out some of the suggestions on the sites that theyoversee or with whom they consult. With a little elbow grease, some Perl,and the Google Web API, you could write scripts that would automate theprocess and generate some nice reports that you could show to your clients.Of course, so could the bad guys... except I don't think your clients willever see those reports, just the end results.

Even the Google cache can aid in exposing holes in systems. Couple theoperators outlined above with Google's cache, which can provide you with alook at files that have changed or been removed, and attackers have anincredibly powerful tool at their disposal.


 Responses

As I said at the beginning of this column, the fact that it is actuallyquite easy to find dangerous information using just a search engine andsome intelligent guesses is not exactly news to people who think aboutsecurity professionally. But I'm afraid that there are many uneducatedfolks putting content onto Web servers that they think is hidden to theworld, when it is in reality anything but.

We have two seemingly opposite problems at work here: simplicity andcomplexity. On the one hand, it has become very easy for non-technicalusers to post content onto Web servers, sometimes without realizing thatthey're in fact placing that content on a Web server. It has even becomeeasier to Web-enable databases, which has led in one case to the exposureof a database containing the records of a medical college's patients (andby the way, the search terms discussed in that article are still very muchactive at Google, one year later).

Even when people do understand that their content is about to go onto theWeb, many do not fully think through what they're about to post. They don'texamine that content in light of a few simple questions: How could thisinformation be used against me? Or my organisation? And should this even goon the Web in the first place?

Well, of course ordinary users don't think to ask these questions!They're just interested in getting their content out there, and most of thetime are just pleased as punch that they could publish on the Web in thefirst place. Critically examining that content for security vulnerabilitiesis not something they've been trained to do.


Points of failure

On the other side of the coin we have complexity. For all the ease that hascome about in the past several years, no matter how simple it has becomefor Bob in Marketing to publish the company's public sales figures online,the fact remains that we're dealing with complex systems that have many,many points of potential failure. That knowledge scares the hell out of thepeople who live security, while Bob goes blithely on successfullypublishing the company's public sales figures ... and accidentallypublishing the spreadsheet containing the company's top customers, completewith contact info, sales figures, and notes about who the salespeople thinkare good for a few thousand more this year.

For instance, FrontPage is touted by Microsoft as an extremelysimple-to-use Web publishing solution that enables users to "move fileseasily between local and remote locations and publish in both directions".Unfortunately for those average Joes who buy into the hype, FrontPage isstill a very complicated program that can easily expose passwords and othersensitive data if it is not administered correctly. Don't believe me? Justsearch Google for "_vti_pvt password intitle:index.of" and take a look atwhat you find.

FrontPage is not the only offender, but it is certainly an easy one to findin abundance on our favourite search engine. Now think about all the otherprograms out there that people are using every day. Personal Web serversthat come with operating systems. Turnkey shopping cart software.Web-enabled Access databases. The list goes on and on. Take a moment andstart to think about the organisations you oversee. See the list ofpotential problems tumble off into infinity. Oy.

Sure, it's possible for the folks creating Web content to tell Google andother search engines not to index that content. O'Reilly's website has amarvellous short piece titled "Removing Your Materials From Google" thatshould be required reading for anyone who even thinks about puttinganything on or even near a Web server. Of course, as I mentioned above,relying on robots.txt to protect sensitive content is a bit like putting asign up saying "Please ignore the expensive jewels hidden inside thisshack". But at least it will get folks thinking.


Understand the threat

And really, that's what it comes down to: we have to get folks thinking.Sure, those of us responsible for security can try to shut everything downand turn everything off that could pose a threat - and we should, withinreason. But those pesky users are going to do their job: use the systems weprovide them, and some we don't provide. We need to help them understandthe threats that any Web-enabled technology can provide.

Print out this column and hand it out. Show them how easy it is to findsensitive content online. Talk to them about appropriate and inappropriatecontent. Try to get them on your side so they trust you and come to youwith requests for help beforehand instead of coming to you after the fact,when it's too late and the toothpaste is out of the tube. Finally, realisethat humans have an innate need to communicate and will seize on any toolto do so, and if that means talking to your users and setting up a wiki orbulletin board or other collaborate tool, then do so.

Google and other search tools have made the world available to us all, ifwe just know what to ask for. It's our job as security pros to help makethe folks we work and interact with aware of that fact, in all of itsfar-reaching ramifications.


Archives at: <http://Wireless.Com/Dewayne-Net>
Weblog at: <http://weblog.warpspeed.com>

-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/

Prev by Date: [IP] more on SPAM Countermeasures Risks Digest 23.25
Next by Date: [IP] Hacking tools tipped to become weapons of the state
Previous by thread: [IP] New Domain Is Proposed
Next by thread: [IP] Hacking tools tipped to become weapons of the state
Index(es):
- Date
- Thread