<<< Date Index >>>     <<< Thread Index >>>

Re: mutt - slow mbox'es



On Fri, Jul 23, 2004 at 03:00:10AM EDT, Nicolas Rachinsky wrote:
> * David Yitzchak Cohen <lists+mutt_users@xxxxxxxxxxxxxx> [2004-07-22 22:49 
> -0400]:
> > On Thu, Jul 22, 2004 at 02:27:49PM EDT, Nicolas Rachinsky wrote:

> > > After removing cache the same with another Mailbox containing now
> > > 68244 emails with a total size of 300MB.

> > > /usr/bin/time -h -l mutt -F /dev/null -f small.mbox -e 'push x'
> > >         25.82s real             14.87s user             1.81s sys
> > >       4859  block input operations
> > >          0  block output operations

> > > /usr/bin/time -h -l mutt -F /dev/null -f small.mbox -e 'push x'
> > >         16.91s real             14.63s user             1.16s sys
> > >          0  block input operations
> > >          0  block output operations

> With Content-Length removed:

> /usr/bin/time -h -l mutt -F /dev/null -f small2 -e 'push x'
>         24.91s real             15.52s user             1.78s sys
>       4828  block input operations
>          0  block output operations

> /usr/bin/time -h -l mutt -F /dev/null -f small2 -e 'push x'
>         17.71s real             15.39s user             1.11s sys
>          0  block input operations
>          0  block output operations

> Not such a big difference. Not what I expected.

...not what I expected either, but here's why:

$ echo $((4500*68244))
307098000

When messages are that small on average, you end up reading the entire
message off your disk (4K, 8K, or 16K physical blocks) anyway, so whether
or not there's a Content-Length header, you're still basically reading the
entire mailbox, anyway.  You'll notice slightly higher user-mode times,
though, since Mutt does a bunch of calculations after every read to
determine where to read next (which is why it's able to save 31 reads).
Clearly, with messages of this size, you're not going to benefit much
from a Content-Length header.  That's good to know, incidentally,
since it means that for most mail, there's no point in procmail(1)ing a
Content-Length header onto incoming messages (except very large ones).
I must admit I was rather shocked by the results for a moment there,
though ;-)

> > > /usr/bin/time -h -l mutt -F /dev/null -f small.maildir -e 'push x'
> > >         39.63s real             8.73s user              7.21s sys
> > >      68963  block input operations
> > >          0  block output operations
> > 
> > It's impressive that maildir is able to keep from falling below 50% of
> > the mbox speed even with >64K mails in a directory (You're on reiserfs,
> > I assume?).
> 
> No. There is no reiserfs for FreeBSD. It's just UFS with linear lists
> as directories.

Oh, stupid me ... I just assumed you were on Linux ;-/

I don't know anything about UFS, so I'm useless here.

> > Note, though, the syscall time there.  We're talking about
> > massive amounts of work being done by your kernel, servicing about
> > 17 times as many read(2)s as the mbox.  Also worth noting is the time
> > spent idling while waiting on the disk.  The mbox waits for only about
> > 9 seconds, while maildir winds up waiting for almost 24!  Clearly, you
> > wouldn't want several users banging away on maildirs at the same time
> > on your system. . .
> 
> Don't mistake me with the 'maildir ist the best' proponents. I'm using
> maildirs for my incoming folders (*) and mbox for most archive
> boxes (**).

I was just pointing out one particular case where maildir may be
preferable for a single-user system, but definitely not for a multiuser
system.

> (*) here are maildirs IMHO more reliable and they are -- even without
> the header cache -- clearly faster, not neccessarily while opening
> but for marking/deleting/... mails.

Yup, maildirs take advantage of the filesystem, which is always organized
in some binary fashion.  The mbx format is substantially faster than
maildir, though, since only a single physical read/write pair is
necessary for most operations (i.e., no extra filesystem overhead to
first find the file in question, etc.).  Another neat little advantage
of mbx over maildirs if you're going to convert to mboxes in the future
(for archiving, for example) is that all mbox information is retained
in the mbx format, so you can have lossless conversion back to mbox in
the future.

> (**) I didn't use them for archive boxes with relativly big mails, but
> for the other boxes are mboxes faster -- without cache -- and they need less
> diskspace -- even without a header cache. And very old boxes can be
> easily compressed.

I don't understand that ... I must be too tired :-(

> > > /usr/bin/time -h -l mutt -F /dev/null -f small.maildir -e 'push x'
> > >         36.66s real             8.61s user              6.52s sys
> > >      68253  block input operations
> > >          0  block output operations
> > 
> > Now, isn't that interesting?  1GB of RAM was barely able to cache anything
> > between your opens.  (Maybe a cron job or something happened between the
> > two opens?  I wasn't expecting the second open to be _that_ bad. . .)
> > Assuming it's not a fluke, somebody on your FS team needs to be blamed:
> 
> It's no fluke.
> 
> > it looks to me like the only thing that was still cached was the directory
> > index itself (and maybe the permission structs in the inode tables).
> > With a whole gigabyte of RAM, I think you have the right to expect better.
> > The system was waiting for a whopping 21 seconds of disk time, to access
> > data that could've easily been cached if Linux only cared enough to :-(
> 
> It's FreeBSD not Linux, but you're right, it could have been better.
> Are there any FreeBSD FS gurus reading this? :)

/me ducks ;-)

> > > /usr/bin/time -h -l mutt -F /dev/null -f small.maildir -e 'set 
> > > maildir_cache=cache' -e 'unset maildir_cache_verify' -e 'push x'
> > >         3.87s real              2.35s user              0.84s sys
> > >          1  block input operations
> > >         12  block output operations
> > 
> > I'm not sure what's happening there, exactly.
> 
> 'unset maildir_cache_verify' should leaave out one stat per message,
> so it's a bit faster -- as expected.

Okay, I guess. . .

> > > BTW:
> > > 136M    cache
> > 
> > When was that number obtained?  I'd expect buffers+cache to be about
> > 300MB after the first mbox read.
> 
> Ah, this was misleading. The cache file after reading small.maildir
> with the headercache was 136MB big.

I wonder where the rest of the file went, then?

> With a page size of 16k, instead
> of the default 2k, mutt is a bit faster and the file is only 52MB big.

...and now I _certainly_ wonder where the rest of the file went. . .

Thanks again,
 - Dave

-- 
Uncle Cosmo, why do they call this a word processor?
It's simple, Skyler.  You've seen what food processors do to food, right?

Please visit this link:
http://rotter.net/israel

Attachment: pgpS09WRVJutu.pgp
Description: PGP signature