<<< Date Index >>>     <<< Thread Index >>>

Re: The future of mailboxes?



On Mon, Aug 14, 2006 at 12:20:47AM +0100, Paul Walker wrote:
> On Sun, Aug 13, 2006 at 05:39:14PM +0200, Nicolas George wrote:
> 
> > Thus, if part of your mail is in a database, then having all your mail in
> > the database is a sensible and straightforward choice.
> 
> This depends on which viewpoint you're coming from - basically, whether the
> stuff in the database is an optimisation which lets you do certain things
> [faster], or the stuff in the database *is* the mail.
> 
> From the point of view of interoperating with, well, anything else, the
> first option would seem more sensible.

Agreed, though from the perspective of maximizing performance, the
latter is definitely the way to go.  One point: if support for this
were written using good software engineering principles (modular
code), it could easily be provided as a separate library which other
mail clients could choose to adopt (or not).  I think they would
adopt, if people started using it.  I always hated the idea that
Outlook does this, because it doesn't play nice with others...  but if
there were a standard way to do it which were well implemented and
freely available (i.e.  GPL or some such license), that would kick
ass.  As pointed out already though, Mutt doesn't exactly have a
modular mailbox driver API, so integrating that into mutt might be
difficult.  Probably still worth doing right (if done at all), and
might even pave the way for Mutt's mailbox code to get cleaned up.

Storing mail in files is fine, but a database definitely can speed up
a lot of operations.  For example:

People always point to maildir being faster to delete (expunge) a
message from a folder than mbox, which is true... FOR A SINGLE
MESSAGE.  But the reverse becomes true if you are deleting a large
number of messages -- a fact not often mentioned by the maildir
people.  This is because there will be fewer messages for mbox to
rewrite, but all those unlink operations will require a lot of
overhead.  I routinely see the effects of this at work when I return
from my weekend, where I will often have >1000 messages which I don't
care about in a particular folder, and deleting them all using maildir
takes forever.  The same operation in an mbox folder is quite fast by
comparison.

But if you're using a database, it's always fast, no matter how many
messages you are deleting.  You just remove a record from a table,
essentially, and let the database worry about cleaning up the data.
Likewise where maildir is quite slow reading message indexes for very
large folders (sans caching) mbox performs much better (but still
slow, though caching improves the performance of both), but since
you'll be putting all of the common headers into their own fields in
some table, which you can index, displaying the message index will
stay fast (compared to any file-based format), no matter how many
messages you have, as will searching on message headers.  This is a
big win for the user.  You'll need to use a database backend which
supports regular expression searching though, if you want to maintain
the power of Mutt.

And, as previously pointed out, this makes implementing virtual
folders insanely easy.  That's a Good Thing.

> (This is starting from the assumption that sticking this metadata in
> databases is a good plan, which I'm unconvinced of.)

There are performance benefits, but it doesn't compare to using a
database wholesale.  Besides, isn't that what hcache already does?

Also, note that the virtual folder aspects of this can be implemented
using either mbox or maildir plus caching/indexing, though you don't
get the same performance benefits regarding the other operations.  You
don't even need to keep all the messages in one folder; you just need
to keep a cache/index of where the messages are.  Complex?  Yeah, but
absolutely worth doing, I think, if people want to stay with
mail-in-files folders.  In addition to making virtual folders
possible, it will also make things generally faster (because headers
and such will necessarily be cached and indexed, for all supported
folder formats).

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail.  Sorry for the inconvenience.  Thank the spammers.

Attachment: pgpyUdM0szALx.pgp
Description: PGP signature