Sun M-class hardware denial of service
Sun/Fujitsu M4000-M9000 machines are very expensive multicpu sparc64
architecture machines, scaling all the way up to 64 processors, 256
cores, and 512 threads. They use the Fujitsu SPARC64 VI (and more
recently VII) processors. The smallest models are large (6U 84kg),
and the larger models are fridge sized and cost more than a house.
These machines can be split into domains. These domains are like
virtual machines which can run their own OS, except that they are not
virtual. The chassis contains actual partitioning hardware which
routes the various cpus to only see specific hardware devices. The
physical segmentation of the hardware obviously must be completely
secure and reliable to meet Sun's promises of high availability.
Sun's system partitioning domains are supposed to be the best of the
isolation schemes in the market. But perhaps even they have problems.
During the porting of OpenBSD/sparc64 to this family of machines it
was discovered that the OS kernel can trigger a fault. This fault is
caught by the systems management controller (the XSCF, Fujitsu's
version of LOM/RSC console) which then powers the domain down, marks
the mainboard in the chassis as faulty, and refuses to allow domains
relying on that mainboard to be started.
To clarify, the OS kernel does not crash; no -- the domain powers
down. Normally one uses commands in the XSCF to power domains on and
off. Those commands refuse to power up that domain again saying it
has faulted.
To repair this problem one must phone a friendly Sun support team.
After providing them with the machine's serial number, Sun will
dispatch an engineer with a generated series of codes that are valid
for a 48-hour period. These codes are used to generate a
one-time-password which enables a login to the service console within
the XSCF. The engineer then uses the service console to clear the
fault on the mainboard. That command then requires a POWERCYCLE OF
THE ENTIRE CHASSIS. This means any other domains running on the same
hardware must be shutdown to clear the fault generated in another
domain.
Please note that we have not tried to power cycle the entire chassis
without clearing the fault using the Sun procedure. However, we do
not see a difference in availability between that and a powercycle
requested by the service console.
These machines are run in mission critical environments where the
concept of 'availability' blends with the concept of 'security'. The
main customer base for these machines is apparently banks and other
financial institutes. Machine prices start at $29,000, rocket to
$180,000 (8 cpu), and continue higher to "Sun won't tell you on the
web", so one could expect that the machine should probably not fail in
such a harsh way. We do not have any information about how or why
this problem happens, but feel compelled to speculate that there might
be further problems with domain seperation. Having to power down all
the other domains is already, effectively, a big problem in domain
seperation.
The problem is triggered when OpenBSD/sparc64 spins up the additional
strands (threads) of each physical cpu in the domain. The OS
continues running for a few moments and then the fault occurs. Newer
versions of OpenBSD/sparc64 workaround this problem (diff linked
below) by not spinning up the additional strands on SPARC64 VI cpus.
But we don't really know why this workaround helps. Since we do not
have any tools to characterize the exact problem, the workaround might
be accidentally avoiding the fault, but some other action could still
cause it. The same problems do not occur on the other domain-capable
Sun machines that OpenBSD runs on, for instance, those using
UltraSPARC III IV, T1, or T2 processors.
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/cpu.c.diff?r1=1.43&r2=1.44
With the workaround in effect, the result is that this machine running
OpenBSD is using half the available cpu, all to avoid a machine
problem that might be triggered by something else. We do not yet know
if the problem is due to a bug in the cpu, the chassis, or some other
firmware component that is involved in domain partitioning.
OpenBSD/sparc64 is probably doing something wrong -- but then the OS
should crash instead of the domain.
Whatever this hardware problem is, it could also be exercised in
Solaris by loading a kernel module which does whatever OpenBSD is
doing, and thus triggers a domain fault. If an attacker can gain root
on a Solaris domain and load such a kernel module, the owner would be
forced to eventually powerdown the entire machine and take all other
domains down as well (which are running mission critical services,
obviously). At http://www.sun.com/servers/white-papers/domains.html
Sun claims that their domain technology offers "Complete isolation
from software errors in other domains" and provide the benefit that
"Mission-critical applications are not impacted by applications
running within other domains". Maybe after this bug is fixed...
Sun & Fujitsu should fix at least two things:
- Greater recoverability. Don't require a powercycle of the
chassis for such a type of domain fault, so that a failure
of one domain does not kill the availability of other domains.
- Don't fault in the first place! Find out what OS action
is causing the fault, and make the firmware/hardware accept
and handle this condition without faulting the domain.
Sun (Australia) was alerted about this problem on July 24, 2008.
Various other channels into Sun and Fujitsu were tried as well, but
unfortunately noone in "security" seemed to understand that this issue
matters, and it seems the engineering people made no progress either.
We think the Sun engineers didn't even go through the effort to
install OpenBSD in order to reproduce the problem. Any Sun / Fujitsu
engineer who wants to solve this problem can either build their own
kernel with the above patch un-applied, or can contact
<dlg@xxxxxxxxxxx>, <deraadt@xxxxxxxxxxx> or <kettenis@xxxxxxxxxxx>.