<<< Date Index >>>     <<< Thread Index >>>

Re: Mark messages in mailbox as read without massive performance hit?!



On Thu, Dec 29, 2005 at 12:40:54AM -0500 I heard the voice of
Derek Martin, and lo! it spake thus:
> 
> Generally, each rename operation will result in the kernel doing
> this separately.

Oh, it doesn't have to.  The kernel could leave the directory open for
a few seconds internally to save on re-opening in cases where a lot of
renames are happening.  I don't know if any extant systems do that, of
course, but it's not a hopeless case.

There are a few things to bear in mind here, though.

- The overhead in rename() isn't near comparable to the difference
  between read()'ing a mbox and open()/read()/close()'ing in a
  maildir, even if only because you're crossing the userland<->kernel
  boundary only once and not having to serialize the steps across
  multiple boundary crossings.

- mutt write()'s out messages one at a time as far as I can tell, and
  from the behavior when I tag a bunch of messages and copy them into
  a mbox, it almost looks like it fsync()'s after each message (I've
  not verified this in the code).  With behavior like that, mbox will
  NEVER cross over to faster than maildir; it's too much serialization
  and waiting.


Now, I don't dispute that theoretically there's a point where the
lines cross and mbox wins.  But I think that point is a lot farther
out than you're suggesting it is, and that some of the realities of
how mutt does things push it even farther.  And empirically, I've
always found mbox to be orders of magnitude slower on status changes
(particularly in the degenerate case where you're changing one or a
few messages in the middle of the mailbox), on any size active mailbox
I ever use (up to maybe 10 thousand messages, though usually below 5).

Having written mbox parsing code, I deeply hate the [blend of totally
different] format[s], but it has a lot of advantages in compactness
etc.  I never let it near an active mailbox, but I use it for archives
all the time, because it's more compact, leaves my preciouss inodes
alone, and is much, MUCH faster to read.  But they fall over pretty
hard in most use-cases that aren't read-almost-entirely, IME, both in
performance, and in making me sweat about the myriad ways it can
corrupt my data and lose the history of all my favorite flamewars.


> Obviously, the user who posted this message is running into this
> problem, or else we would not be having this discussion...

Now, in THIS case, I certainly don't buy it.  The OP is suggesting
he's seeing problems on mailboxes on the order of *50* messages.  That
shouldn't break down on FAT12, much less any filesystem on a *nix
system installed in the last 15 years.  I doubt I could measure by
hand the time 50 rename()'s take on a 386SX.  Something else is
tripping him up.


-- 
Matthew Fuller     (MF4839)   |  fullermd@xxxxxxxxxxxxxxx
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.