[IP] What can't you find on Google? Vital statistics

To: ip@xxxxxxxxxxxxxx
Subject: [IP] What can't you find on Google? Vital statistics
From: Dave Farber <dave@xxxxxxxxxx>
Date: Sat, 01 May 2004 22:13:36 -0400
List-help: <http://v2.listbox.com/doc/help_sub?list_name=ip@v2.listbox.com>
List-id: <ip@xxxxxxxxxxxxxx>
List-software: listbox.com v2.0
List-subscribe: <mailto:subscribe-ip@v2.listbox.com>, <http://v2.listbox.com/subscribe/?listname=ip@v2.listbox.com>
List-unsubscribe: <mailto:unsubscribe-ip@v2.listbox.com>, <http://v2.listbox.com/member/unsubscribe/?listname=ip@v2.listbox.com>
Reply-to: dave@xxxxxxxxxx
Sender: owner-ip@xxxxxxxxxxxxxx


Date: Sat, 01 May 2004 21:47:55 -0400
From: Claudio Gutiérrez <cgutierrez@xxxxxxxxxxxxxx>
Subject: What can't you find on Google? Vital statistics
To: dave@xxxxxxxxxx
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6) Gecko/20040113

hits=2.3 required=7.5 tests=SUBJ_HAS_Q_MARK,MSG_ID_ADDED_BY_MTA_2version=2.31

X-Spam-Level: **
X-Spam-Filtered-At: eList eXpress <http://www.elistx.com/>

*What can't you find on Google? Vital statistics*

*John Naughton
*http://observer.guardian.co.uk/business/story/0,6903,1202522,00.html

Here's a cheap trick to play on an audience - especially one drawn from thebusiness community. Ask them how many use Microsoft software. Virtuallyevery hand in the room will go up. How many use Apple Macs? One or two - atmost. How many use Linux? If the audience is drawn from corporate suits, nohands will show. Now comes the punchline: who uses Google? A forest ofhands appears. 'Ah,' you say, 'that's very interesting, because it meansyou're all Linux users.' Stunned looks all round.

The computing engine that powers Google is the largest cluster of Linuxservers in the history of the world. If you talk to computer-science folks,you find that they regard this - rather than the number of web pagesindexed - as the most interesting thing about the company. Managing such avast server-farm is a formidable task. For example, how do you implementsecurity patches and operating-system upgrades (much more frequent in Linuxthan in proprietary systems from Microsoft or Sun) on thousands of serverswithout causing disruption to service? Google manages to achieve this withsophisticated techniques for rippling changes through the cluster, yetachieves 100 per cent uptime. This is serious stuff, and there are a lot ofIT managers out there who would give their eye-teeth to be able to do ithalf as well.

Google is famous for being a confident, open company. Its clean,uncluttered search page is supposed to be a metaphor for the organisationbehind it. But when you start asking questions about its technology, thenthe water rapidly becomes murky. More than half the company's 1,000employees are techies, and they are much in demand as seminar speakers inuniversity computer-science departments, where people are curious aboutGoogle's technology. Wall Street - with its beady eye on the forthcomingIPO - wants to know what Google does (and more importantly, what it plansto do next). Computer scientists, in contrast, want to know how Google does it.

The two questions are different but increasingly, it seems, interlinked. Atany rate, the technical community has begun to realise that presentationsby Google techies have been run through some kind of corporate filterbefore they make it into Powerpoint. The operation of the filter is erratic(it's difficult for PR flacks effectively to censor geeks at the best oftimes), but it seems that the overall aim is to understate every aspect ofGoogle's technology and technical performance by several orders of magnitude.

How do we know this? Mainly because of internal inconsistencies in the dataprovided by Google employees. One university presentation, for example,claimed that Google handled 150 million queries a day, and 1,000 per secondat peak times. This prompted Simpson Garfinkel of MIT's Technology Reviewto do some simple calculations. If the system is handling a peak load of1,000 queries per second, he reasoned, that translates to a peak rate of86.4 million queries per day - or perhaps 40 million queries per day if youassume that the system spends only half its time at peak capacity. 'Nomatter how you crank the math', he concluded, 'Google's statistics are notself-consistent'.

Or take the number of servers that Google operates. The only figure thecompany will admit to is '10,000+'. They also claim to have '4+ petabytes'of disk storage, and have let slip that each server is fitted with two 80gigabyte hard drives. Now a petabyte is 10 to the power of 15 bytes, so ifGoogle had only 10,000 servers, that would come to 400 Gb per server. Soagain the numbers don't add up. I could go on, but you will get the point.But what it all comes down to is this: Google has far more computing powerat its disposal than it is letting on. In fact, there have been rumours inthe business for months that the Google cluster actually has 100,000servers - which if true means that the company's technical competencebeggars belief.

Now the interesting question raised by all this is: why the reticence? Mostcompanies lose no opportunity to brag about their technology. (Think of allthose Oracle ads.) Is this an example of Google behaving ultra-responsibly- being careful not to hype its prospects prior to an IPO? Or is it a signof a deeper commercial strategy? The latter is what Garfinkel suspects.'After all,' 'he says, 'if Google publicised how many pages it has indexedand how many computers it has in its data centres around the world, searchcompetitors such as Yahoo!, Teoma, and Mooter would know how much capitalthey had to raise in order to have a hope of displacing the king at the topof the hill.' If truth is the first casualty of war, openness is the firstcasualty of going public.


-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/

Prev by Date: [IP] here we go again -- yet another police-state extension
Next by Date: [IP] Automated Copyright Notice System Risks Digest 23.34
Previous by thread: [IP] here we go again -- yet another police-state extension
Next by thread: [IP] Automated Copyright Notice System Risks Digest 23.34
Index(es):
- Date
- Thread