Re: Sun M-class hardware denial of service
> While having to power cycle the remainder of the frame may be a pain, the
> fact it isolates the fault to only power off the affected domain suggests to
> me that it is working as designed (the relative virtue of the design not up
> for debate). The power cycle of the remainder of the frame can be done at
> your leisure.
Didn't you read the advisory?
You don't get any crashed domain back until you power cycle the entire
machine. If you need that domain back, you have to make a very nasty
choice. It is a denial of service.
> It is for this reason I would not class this as a DoS attack,
> as the "attacker" could not affect the availability of the other domains,
> only the admin could.
The admin is forced to choose between "bring the crashed domain back
now by calling Sun and then powering the whole machine down" and
"accept that the crashed domain is down until you call Sun and power
the whole machine down". How is that not a denial of service? Do you
work for Sun?
If that is not a denial of service, I don't know what is.
> You don't state what privileges are required on the affected domain to
> initiate the fault.
This was very obvious from the advisory.
> If this is executable by unprivileged users, then I
> would agree with you that this represents a DoS issue for *that domain*.
Have you ever used vmware? I don't see how Sun domains are supposed
to be any different from vmware in that case. Obviously you are
handing sub-admins control over a domain so that they can run any OS
they need to. There hardware isolation is not supposed to be a joke. It
is serious stuff. It has to work.
Obviously you expect that what a sub-admin does in his domain should
not affect the rest of your machine; ie. force you to power it off.
But that is exactly what is required -- a power off of the whole chassis
and all the existing domains. You cannot even do the equivelant of VMotion.
The admin eventually MUST take all his other domains down. That is a
denial of service.
> It
> sounds like the XSCF is monitoring the domain for certain events, and
> mistaking legitimate operation for one of these events which leads it to
> disable a component in the domain. While I haven't worked with the M-class
> systems, I have some experience with the F15K/E25K range, and it sounds like
> the XSCF is blacklisting some component (likely a system board). Requiring
> a power cycle of the whole frame to clear a fault with a single (or even
> multiple) components is fairly poor, the most I would expect is to power
> cycle the domain components.
That's what you expect, but that is not what happens. It requires a
service call out to Sun to repair the machine, followed by a power
down of the entire machine. That is just "fairly poor"?? What the
hell do you think people are paying so much money for? A complete
illusion of reliability??
> I'm not surprised you didn't get any interest from Fujitsu/Sun security
> people, for the reasons stated above. As for engineering, I would expect
> they will only address the issue if they see a commercial or reputational
> benefit in doing so (i.e. someone wants to spend a *lot* of money on
> hardware to run OpenBSD, and this issue is a show-stopper).
As the advisory made clear, we are certain that someone could write a
Solaris kernel module that would trigger this same behaviour.
In other worse, please learn to read. You'll get further in life.