<<< Date Index >>>     <<< Thread Index >>>

Re: Sun single-CPU DOS



:single CPU Sun microsystems system running solaris7, 8, or 9
:(haven't tested on 10). E.g. netra.
:
:if you telnet to a local router, disable nagle (on purpose
:or by accident or whatever - if nagle is turned off), and then

TCP_NODELAY by any other name, I assume.  

:ping another device with interpacket delay of 0 and a count

Define what you mean by "interpacket delay".  Are you referring to an
Ethernet-specific setting, perhaps?  Ethernet's "interpacket gap" is
really about the gap between Ethernet frames, not IP packets.  Having
"packet" in the terminology leads people to think it's an IP thing,
and ranks up their with "collisions" as far as misleading Ethernet
terminology goes.  Think of it as "interframe gap", or IFG.

For that manner, define "ping".  You're certainly not talking about
/usr/sbin/ping, but something that spews out TCP, correct?  It sounds
like you're hitting the Sun system with a TCP ping stream -from- your
router, correct?

:of somewhere above 100,000 pings, it will effectively
:DOS the machine you are telneting from.
:
:The machine becomes unusable, will not accept break on console.
:totally hung.
:
:After opening a case with Sun on this issue and going back and
:forth for 9 months, they have decided that I am manufacturing
:jabber and the appropriate course of action is to remove the
:offending device (the router in this case) from the network.

If you're talking IFG...

Having an IFG < 96 "bittimes (where the wall-clock units for bittimes
varies as a function of specific ethernet speed) leads to out-of-spec
Ethernet frames, which could reasonably be parsed as "jabber".  The 
too-short IFG could lead the other node(s) in the ethernet not knowing
when you've stopped sending any given frame.  In a shared ethernet,
you can also end up with fun conditions like the "capture effect".

There's no requirement for the networking to that particular interface
on the Sun to actually work in the face of a too-short IFG or any other
physical out-of-spec condition.  Now, that doesn't mean the -console- 
should go out to lunch (sounds like you're getting a little too much 
"The Network Is The Computer" :) ), but it's perfectly ok to simply not
listen or xmit on an ethernet that's chronically out-of-spec.  

If Sun were to tweak things so it could detect and log the out-of-spec
network and react to it by downing the interface, rather than just keep
listening and accumulating a ton of bogusly-spaced interrupts that bog
it down, that would seem to be reasonable.  Some Unixes have userspace
routing daemons that periodically look for network brokenness and will
ifconfig the interface down  But, if the system is bogged down quickly
enough where that those processes never get a chance to run, such forms
of mitigation won't work.

Oh as an important side note -- your Sun is set up where it won't hang
owing to network dependencies if its interface is ifconfig'ed up, but
the actual network it talks to is offline, right?  Otherwise, you are
making yourself DoS-prone in a whole lot of ways besides pfutzing with
out-of-spec ethernets.

:In other words, they refuse to fix the DOS issue under the assertion
:that it is a physical issue rather than an issue of the OS
:improperly handling a stream of small TCP packets.

My -suspicion- here is that it's the interrupts that the "stream of
small TCP packets" generates that is leading to the system hang, but
it'd take some kernel profiling to understand the specific impact.
If the only way to generate the particular concentration of network
interrupts along that ethernet interface involves outright breaking
the ethernet spec, I can see where Sun rejects this as bogus from a
-security- perspective.  

Have you tried with, say, tiny UDP packets, without messing around
with the IFG (and no need to mess with the TCP-only Nagle algorithm)?
That might hit the interface hard in a way that will show the problem
without an out-of-spec ethernet (or it might not -- interrupt timing
attacks can be very "fussy").  Have you tried doing a back-to-back
configuration with another Sun?  It -could- be the case that only a 
very particular flavor of interrupt load triggers this, that it's
not a terribly generic problem.  

FWIW, there's a number of old (and sometimes not-so-old) ethernet NICs
that will "seize up" and need "kicking" (typically an ifconfig down/up
at a Unix OS level) in the face of various flavors of out-of-spec events
No, I don't have a laundry list of such NICs, though I'd imagine that
the folks http://iol.unh.edu might.  As long as the OS drivers for such
NICs don't take the rest of the OS down for the ride when the NIC hangs
up in the face of the out-of-spec event, it's not a big deal DoS in my
mind.  If you have some ethernet where it's way too easy to propagate 
out-of-spec ethernet events, fix the ethernet.  

:They have closed the escalation, so I am left with no recourse but
:to report it as a bug to the rest of you.
:
:For machines with more than 1 CPU, one cpu becomes bogged down but
:the other CPU continues to handle OS tasks ok.

-- 
 Mail: mjo@xxxxxxxxxxx  WWW: http://dojo.mi.org/~mjo/  Phone: +1 650 933 9487
 =--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--=
"That's thirty minutes away.  I'll be there in ten."            -Winston Wolf