[IP] more on United computer outage
Hell we did better in the 50s with ESS Telephone systems
Begin forwarded message:
From: John Levine <johnl@xxxxxxxx>
Date: January 5, 2006 4:21:57 PM EST
To: dave@xxxxxxxxxx
Subject: Re: [IP] more on United computer outage
A "processor" failure??!! djf
Yup, almost certainly processor as in CPU.
Airline systems like Galileo still run on tight clusters of IBM
mainframes. These are basically database engines with phenomenal
transaction rates. While it's not hard to do distributed searches in
parallel, updates are limited by locking, which works worse the more
computers you have contending for the locks. So the core systems are
clusters of a few mainframes, each with a couple of dozen CPUs and
shared memory, cranking away on the transactions.
Modern mainframes are designed to be very, very reliable. The CPUs
come in groups of maybe 16, with at least two of the 16 reserved as
spares, and extensive hardware checking so that if a CPU fails, one of
the spares takes over immediately. They have facilities for doing hot
add and remove of equipment which work well enough that the system
uptime is measured in years. It sounds to me like one of the CPUs
wedged in some way that the recovery hardware couldn't deal with, and
if the system is wedged, it's down. This is a big embarassment for
IBM since the main selling point for million dollar mainframes is
reliability.
I'll be interested to hear what if any reports we get about what the
problem was.
R's,
John
-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
http://v2.listbox.com/member/?listname=ip
Archives at: http://www.interesting-people.org/archives/interesting-people/