On Wed, Jan 21, 2004 at 05:42:27PM EST, David Champion wrote: > * On 2004.01.21, in <20040121074254.GM26679@xxxxxxx>, > * "David Yitzchak Cohen" <lists+mutt_users@xxxxxxxxxxxxxx> wrote: > > > > a small little program: > > > > filter <file | > > > > read_everything_into_memory_and_spit_back_everything_on_eof >file > > > > > > Kids, don't try this at home. This depends totally on the shell in > > > use, not on cat or r_e_i_m_a_s_b_e_o_e. > > > > What's that? > > It's read_everything_into_memory_and_spit_back_everything_on_eof.... LOL :-) > > That's interesting ... I use plain old bash, and have done the above > > successfully ... weird. . . > > I'm pretty sure I have too, on occasion. But more often it's overwritten > my input before anything happens. Hmm. . . > > > This means that pretty much every shell is going to > > > overwrite your input file before anyone gets to read it. If anything > > > else happens, it's a scheduler fluke, and certainly not reliable. > > ... > > > > For some reason, having only two cat(1)s doesn't do the trick nearly > > as often, but you'll notice that with three cat(1)s, we get the filter > > executed "in-place" more than 85% of the time - not too bad, eh? > > Your second processor might explain that, somewhat. Do you mind trying the same command on your system, to see what happens? You've got me all curious now. . . > For the command "cat <file | cat | cat >file", here's roughly what'll > happen system-wise: > > shell pipe() = A # to connect stage3 to stage2 > shell pipe() = B # to connect stage2 to stage1 > shell fork() = shell2 # to create stage 3 > shell fork() = shell3 # to create stage 2 > shell fork() = shell4 # to create stage 1 > shell2 open() # to open "file" for write > shell2 dup2() # to redirect stdout into "file" > shell2 dup2() # to connect pipe A to stdin > shell2 exec() # to execute stage 3 "cat" > shell3 dup2() # to connect stdout to pipe A > shell3 dup2() # to connect stdin to pipe B > shell3 exec() # to execute stage 2 "cat" > shell4 open() # to open "file" for read > shell4 dup2() # to connect stdout to pipe B > shell4 dup2() # to connect stdin to "file" > shell4 exec() # to execute stage 1 "cat" That was enlightening. I had always just assumed shell1 would open(2) "file" for write (and read separately) and dup2(2) the descriptors, rather than delegate that task to the subshells. > Once those fork()s occur, the task execution can happen in any order. > (In theory, the shell's children (shell2-4) could signal themselves to > stop as soon as fork() completes, and the controlling shell could signal > them to resume at determined points, in order to control execution > order; but this might mess with i/o in unexpected ways, too, and would > be somewhat insane and needlessly complex, I imagine.) ...not to mention that you'd then have to deal with the 4K (or 16K, or whatever size pipe(2) buffers your system's using) buffer problem. . . > So having multiple CPUs could mean that some of those shell children > get scheduled elsewhere, and shells will contend for CPU time with > some of the other shells. Shell4, the one that finally reads "file", > might actually execute open() before shell1 gets around to open()ing, > especially if shell1 is queued behind shell4 while shell2 and shell3 are > running or waiting. And if the scheduler gives priority to new processes > over old ones (e.g., it's optimized for task concurrency rather than > thread speed), you'll see this kind of command work more often than > otherwise. AFAIK, the standard Linux scheduler prefers to continue the current process after a fork(2) until the timeslice evaporates instead of context switching immediately. Still, I'd be awfully curious to see the results of my pipeline on your system. . . > I wonder whether your 85% success rate is inflated because of that > second CPU. If you turned it off, I expect your success rate would > drop. No can do ... this system is in production use. (I just took it down a week ago for a kernel upgrade, and my users complained. They'd be rather pissed if I took it down again so quickly, especially if it came back up again at only half speed.) > Anyway. This is rather off-topic.... :/ Mainly I wanted to warn against > depending on that kind of command; a lot of data gets lost by people > who think that should work. Fair enough, point taken :-) > If you do find yourself wanting to do this > kind of thing often, with programs other than perl, you need a different > kind of tool. As it happens, I've written something that can address > this, though it was mainly designed to solve a different problem: > see http://home.uchicago.edu/~dgc/sw/pipeline . But you can call it > read_everything_into_memory_and_spit_back_everything_on_eof if you like. > :) Yup, it's a rather neat tool, if you ask me. In a former life, I'd use that as the base for writing a new shell, but since I've discovered a way cool program called CINT [1], I've been hard at work creating libraries to make it usable as a real shell. - Dave [1] http://root.cern.ch/root/Cint.html -- Uncle Cosmo, why do they call this a word processor? It's simple, Skyler. You've seen what food processors do to food, right? Please visit this link: http://rotter.net/israel
Attachment:
pgpM0ppYhvQO4.pgp
Description: PGP signature