<<< Date Index >>>     <<< Thread Index >>>

Re: [OT] Re: Changing Headers in the Compose screen



On Wed, Jan 21, 2004 at 05:42:27PM EST, David Champion wrote:
> * On 2004.01.21, in <20040121074254.GM26679@xxxxxxx>,
> *     "David Yitzchak Cohen" <lists+mutt_users@xxxxxxxxxxxxxx> wrote:

> > > > a small little program:
> > > > filter <file | 
> > > > read_everything_into_memory_and_spit_back_everything_on_eof >file
> > > 
> > > Kids, don't try this at home. This depends totally on the shell in
> > > use, not on cat or r_e_i_m_a_s_b_e_o_e.
> > 
> > What's that?
> 
> It's read_everything_into_memory_and_spit_back_everything_on_eof....

LOL :-)

> > That's interesting ... I use plain old bash, and have done the above
> > successfully ... weird. . .
> 
> I'm pretty sure I have too, on occasion. But more often it's overwritten
> my input before anything happens.

Hmm. . .

> > > This means that pretty much every shell is going to
> > > overwrite your input file before anyone gets to read it. If anything
> > > else happens, it's a scheduler fluke, and certainly not reliable.
> > ...
> > 
> > For some reason, having only two cat(1)s doesn't do the trick nearly
> > as often, but you'll notice that with three cat(1)s, we get the filter
> > executed "in-place" more than 85% of the time - not too bad, eh?
> 
> Your second processor might explain that, somewhat.

Do you mind trying the same command on your system, to see what happens?
You've got me all curious now. . .

> For the command "cat <file | cat | cat >file", here's roughly what'll
> happen system-wise:
> 
> shell pipe()  = A                     # to connect stage3 to stage2
> shell pipe()  = B                     # to connect stage2 to stage1
> shell fork()  = shell2                # to create stage 3
> shell fork()  = shell3                # to create stage 2
> shell fork()  = shell4                # to create stage 1
> shell2        open()                          # to open "file" for write
> shell2        dup2()                          # to redirect stdout into "file"
> shell2        dup2()                          # to connect pipe A to stdin
> shell2        exec()                          # to execute stage 3 "cat"
> shell3        dup2()                          # to connect stdout to pipe A
> shell3        dup2()                          # to connect stdin to pipe B
> shell3        exec()                          # to execute stage 2 "cat"
> shell4        open()                          # to open "file" for read
> shell4        dup2()                          # to connect stdout to pipe B
> shell4        dup2()                          # to connect stdin to "file"
> shell4        exec()                          # to execute stage 1 "cat"

That was enlightening.  I had always just assumed shell1 would open(2)
"file" for write (and read separately) and dup2(2) the descriptors,
rather than delegate that task to the subshells.

> Once those fork()s occur, the task execution can happen in any order.
> (In theory, the shell's children (shell2-4) could signal themselves to
> stop as soon as fork() completes, and the controlling shell could signal
> them to resume at determined points, in order to control execution
> order; but this might mess with i/o in unexpected ways, too, and would
> be somewhat insane and needlessly complex, I imagine.)

...not to mention that you'd then have to deal with the 4K (or 16K,
or whatever size pipe(2) buffers your system's using) buffer problem. . .

> So having multiple CPUs could mean that some of those shell children
> get scheduled elsewhere, and shells will contend for CPU time with
> some of the other shells. Shell4, the one that finally reads "file",
> might actually execute open() before shell1 gets around to open()ing,
> especially if shell1 is queued behind shell4 while shell2 and shell3 are
> running or waiting. And if the scheduler gives priority to new processes
> over old ones (e.g., it's optimized for task concurrency rather than
> thread speed), you'll see this kind of command work more often than
> otherwise.

AFAIK, the standard Linux scheduler prefers to continue the current
process after a fork(2) until the timeslice evaporates instead of context
switching immediately.  Still, I'd be awfully curious to see the results
of my pipeline on your system. . .

> I wonder whether your 85% success rate is inflated because of that
> second CPU. If you turned it off, I expect your success rate would
> drop.

No can do ... this system is in production use.  (I just took it down
a week ago for a kernel upgrade, and my users complained.  They'd be
rather pissed if I took it down again so quickly, especially if it came
back up again at only half speed.)

> Anyway. This is rather off-topic.... :/ Mainly I wanted to warn against
> depending on that kind of command; a lot of data gets lost by people
> who think that should work.

Fair enough, point taken :-)

> If you do find yourself wanting to do this
> kind of thing often, with programs other than perl, you need a different
> kind of tool. As it happens, I've written something that can address
> this, though it was mainly designed to solve a different problem:
> see http://home.uchicago.edu/~dgc/sw/pipeline . But you can call it
> read_everything_into_memory_and_spit_back_everything_on_eof if you like.
> :)

Yup, it's a rather neat tool, if you ask me.  In a former life, I'd use
that as the base for writing a new shell, but since I've discovered a way
cool program called CINT [1], I've been hard at work creating libraries
to make it usable as a real shell.

 - Dave

[1]
http://root.cern.ch/root/Cint.html

-- 
Uncle Cosmo, why do they call this a word processor?
It's simple, Skyler.  You've seen what food processors do to food, right?

Please visit this link:
http://rotter.net/israel

Attachment: pgpM0ppYhvQO4.pgp
Description: PGP signature