Re: Sun M-class hardware denial of service
On Tue, Sep 9, 2008 at 8:42 PM, Theo de Raadt <deraadt@xxxxxxxxxxxxxxx> wrote:
>> While having to power cycle the remainder of the frame may be a pain, the
>> fact it isolates the fault to only power off the affected domain suggests to
>> me that it is working as designed (the relative virtue of the design not up
>> for debate). The power cycle of the remainder of the frame can be done at
>> your leisure.
>
> Didn't you read the advisory?
>
> You don't get any crashed domain back until you power cycle the entire
> machine. If you need that domain back, you have to make a very nasty
> choice. It is a denial of service.
>
>> It is for this reason I would not class this as a DoS attack,
>> as the "attacker" could not affect the availability of the other domains,
>> only the admin could.
>
> The admin is forced to choose between "bring the crashed domain back
> now by calling Sun and then powering the whole machine down" and
> "accept that the crashed domain is down until you call Sun and power
> the whole machine down". How is that not a denial of service? Do you
> work for Sun?
>
> If that is not a denial of service, I don't know what is.
Firstly, I don't work for Sun.
I apologise if I'm misunderstanding you, but it seems to me that this
issue can only be initiated by a privileged user on a domain. The
only system immediately affected is that particular domain. The
removal of service from the other domains is a system/service
management decision, rather than an exploit of some kind. That's why
I don't view it as a DoS vulnerability. If you exploit this on your
own domain, which then becomes unavailable then, frankly, tough. You
wait until the frame administrators choose to power cycle the other
domains to bring you back.
You stated in your original message that this is a high-end frame, of
the kind generally used by financial institutions etc. I would
imagine any system which warrants this kind of hardware would have
some level of redundancy or DR.
>
>> You don't state what privileges are required on the affected domain to
>> initiate the fault.
>
> This was very obvious from the advisory.
>
>> If this is executable by unprivileged users, then I
>> would agree with you that this represents a DoS issue for *that domain*.
>
> Have you ever used vmware? I don't see how Sun domains are supposed
> to be any different from vmware in that case. Obviously you are
> handing sub-admins control over a domain so that they can run any OS
> they need to. There hardware isolation is not supposed to be a joke. It
> is serious stuff. It has to work.
>
> Obviously you expect that what a sub-admin does in his domain should
> not affect the rest of your machine; ie. force you to power it off.
> But that is exactly what is required -- a power off of the whole chassis
> and all the existing domains. You cannot even do the equivelant of VMotion.
>
> The admin eventually MUST take all his other domains down. That is a
> denial of service.
I view hardware isolation as not being able to do something in one
domain which can affect processes running in another domain. I see
nothing here which will do that. As I said above, the removal of
service from other domains is a management decision.
>
>> It
>> sounds like the XSCF is monitoring the domain for certain events, and
>> mistaking legitimate operation for one of these events which leads it to
>> disable a component in the domain. While I haven't worked with the M-class
>> systems, I have some experience with the F15K/E25K range, and it sounds like
>> the XSCF is blacklisting some component (likely a system board). Requiring
>> a power cycle of the whole frame to clear a fault with a single (or even
>> multiple) components is fairly poor, the most I would expect is to power
>> cycle the domain components.
>
> That's what you expect, but that is not what happens. It requires a
> service call out to Sun to repair the machine, followed by a power
> down of the entire machine. That is just "fairly poor"?? What the
> hell do you think people are paying so much money for? A complete
> illusion of reliability??
>
I'm don't disagree that this appears to be a bug of some kind in the
error handling by the system controller, what I'm arguing is that it
is not a DoS vulnerability, as the attacker cannot immediately
precipitate a lack of service for any system other than the one over
which they have administrative control.
>> I'm not surprised you didn't get any interest from Fujitsu/Sun security
>> people, for the reasons stated above. As for engineering, I would expect
>> they will only address the issue if they see a commercial or reputational
>> benefit in doing so (i.e. someone wants to spend a *lot* of money on
>> hardware to run OpenBSD, and this issue is a show-stopper).
>
> As the advisory made clear, we are certain that someone could write a
> Solaris kernel module that would trigger this same behaviour.
Yet you don't know what it is that causes the issue? What's Sun's
support arrangement for OpenBSD on SPARC? If it is reproduced in
Solaris, then I'm sure Sun would address it, but where is the benefit
for them to do so at present?
>
> In other worse, please learn to read. You'll get further in life.
>
Thanks for the tip, I'd never thought of that.....