<<< Date Index >>>     <<< Thread Index >>>

Re: mutt - slow mbox'es



Hello David,

* David Yitzchak Cohen <lists+mutt_users@xxxxxxxxxxxxxx> [040725 08:10]:
> > Talking about Maildirs.

> When messages are that small on average, you end up reading the entire
> message off your disk (4K, 8K, or 16K physical blocks) anyway, so whether
> or not there's a Content-Length header, you're still basically reading the
> entire mailbox, anyway.  You'll notice slightly higher user-mode times,
> though, since Mutt does a bunch of calculations after every read to
> determine where to read next (which is why it's able to save 31 reads).
> Clearly, with messages of this size, you're not going to benefit much
> from a Content-Length header.  That's good to know, incidentally,
> since it means that for most mail, there's no point in procmail(1)ing a
> Content-Length header onto incoming messages (except very large ones).
> I must admit I was rather shocked by the results for a moment there,
> though ;-)

More or less wrong. First of all Maildirs are one file per message. That
means if you take the size of the file, subtract the size of the header
you have ... the size of the body ... wow that was easy and so fast. So
forget about the Content-Length header for Maildirs:

Looking at the sources confirmed my assumption from mh.c:

      h->content->length = st.st_size - h->content->offset;

> > Don't mistake me with the 'maildir ist the best' proponents. I'm using
> > maildirs for my incoming folders (*) and mbox for most archive
> > boxes (**).

> I was just pointing out one particular case where maildir may be
> preferable for a single-user system, but definitely not for a multiuser
> system.

Okay my *use maildir or die* thesis is a little single-minded. I do
myself use mbox for some cases: Publishing mailinglists archives in the
web. However I don't see why I shoud 'use mbox on multiuser and maildir
on singleuser systems'. That doesn't make send to me at all.

> > (*) here are maildirs IMHO more reliable and they are -- even without
> > the header cache -- clearly faster, not neccessarily while opening
> > but for marking/deleting/... mails.

> Yup, maildirs take advantage of the filesystem, which is always organized
> in some binary fashion.  The mbx format is substantially faster than
> maildir, though, since only a single physical read/write pair is
> necessary for most operations (i.e., no extra filesystem overhead to
> first find the file in question, etc.).  Another neat little advantage
> of mbx over maildirs if you're going to convert to mboxes in the future
> (for archiving, for example) is that all mbox information is retained
> in the mbx format, so you can have lossless conversion back to mbox in
> the future.

'lossless conversion' aha. Could you please tell me what information do
you loose when you convert between mbox and maildir for example?

> > (**) I didn't use them for archive boxes with relativly big mails, but
> > for the other boxes are mboxes faster -- without cache -- and they need less
> > diskspace -- even without a header cache. And very old boxes can be
> > easily compressed.

> I don't understand that ... I must be too tired :-(

He basically said that Maildirs eat more diskspace and also mboxes where
faster for small eMails without the header cache. And he uses mbox often
to archive eMails, which are seldom read in these times. A nice benefit
is, that you can just gzip a mbox and still can opening it using mutt,
which you can't with maildir. My only arguemnt against is, that disk
space is cheap this days.

> > Ah, this was misleading. The cache file after reading small.maildir
> > with the headercache was 136MB big.

> I wonder where the rest of the file went, then?

Which 'rest of the file'? header cache only caches headers, not the
whole body.

> > With a page size of 16k, instead
> > of the default 2k, mutt is a bit faster and the file is only 52MB big.

> ...and now I _certainly_ wonder where the rest of the file went. . .

Default blocksize is 2k. A usual header cache entry has about 1.2kbyte.
If you use a bigger blocksize in the database you have less
administrative overhead. More or less that way.

Hope that helps (I like this slogan - not really),
                                                Thomas