<<< Date Index >>>     <<< Thread Index >>>

Re: The future of mailboxes?



Le quartidi 24 thermidor, an CCXIV, Chris a écrit :
>  It's almost serendipitous that I posted a message immediately before
> this thread was posted (subject "Implementation of virtual folders in
> mutt") in which I said:

I thought the very same when I found your mail in mail incoming mailbox just
after having posted my own. I take it as a sign that this issue is a
relevant one.

>  This vfs could then be mounted to the actual filesystem, and from
> mutt's perspective, it's a message like any other.  The actual message
> remains in the filesystem, so it avoids a number of the problems
> mentioned in the "email + database = bad" document mentioned in this
> thread.
> 
>  I must admit I have not researched exactly *how* to mount this as a fs,
> but I have heard mention on the WMII mailing list of v9fs/ixp being
> potentially mountable by the kernel so I'm sure it is doable somehow.
> The program would have to apply any filename changes to the created
> symlink to the original file, and trying to move the symlink sounds like
> a big complication (move the target file rather than the symlink,
> perhaps?).
> 
>  This solution would not be of any help for mbox-format messages, but it
> could be used for things other than mail (such as an mp3 library, for
> example), so I kind of like it.

I briefly thought of the virtual filesystem, but discarded it. But now I
take a closer look at it, I see it could be a very interesting solution if
done properly. Here are a few fast thought:

- It should be a libc-level virtual filesystem, not a kernel-level one:
  first, that means it can work on systems without FUSE-like systems, or
  where the sysadmin disabled them; second, that means that the kernel does
  not see the myriad of files or symlinks, so there is no trashing of the
  inode and dentries cache. Furthermore, it only needs to be visible in
  mutt, and maybe in an imap server and a few tools too, so there is no
  complex synchronization problems.

- The view shown to mutt does not need to be in the same format as the
  actual source for the spool. In read-only mode, it is quite trivial to do
  a virtual filesystem that shows a mbox file as a maildir tree. Read-write
  access needs a little bit more careful examination, but it is doable too.

- Predefined views can be defined in the configuration/database of the
  virtual filesystem, and not in mutt. They would then appear as
  directories, and thus as folders in mutt's interface.

I like that idea, and I will think further about it.

> I tried sniffing Coke once, but the ice cubes got stuck in my nose.

LOL :)


Le quartidi 24 thermidor, an CCXIV, Seth Arnold a écrit :
> Heh, there's already four or more filesystems implemented with FUSE to
> do similar tasks:
> http://fuse.sourceforge.net/wiki/index.php/FileSystems
> 
> Perhaps one of these would satisfy my needs. :) Thanks for the thought.

Which one do you think in particular?


Le quintidi 25 thermidor, an CCXIV, Kyle Wheeler a écrit :
> You can also do things like order by size without reading headers.

To sort by size, you need to stat() each file, which means to load the
inodes: that is the beginning of reading headers for me. It's done in the
kernel, but it has a cost nonetheless, and a fairly high one too. Here are a
lot of context-switch for you.

>                                                                    In 
> any case, sorting by delivery order is a very common sort order (for 
> example, for INBOXes). In this case, a mail reader can quickly 
> identify the 20 most recently delivered messages, and parse just them 
> (for display of the index).

I have this feeling that delivery time sorting is something that people are
very fond of when they learned mail with limited user agents that could do
no better.

And that is, I think, the major flaw in your arguments: you focus on the
efficiency of actually reading the mail, while disregarding the choice of
which mail to read.

To ease that choice is the whole point of virtual folders, so it should be
the central focus in this thread.

>                             Additionally, parsing is easy to do in 
> parallel; let’s say I want to spawn five threads to do all the 
> parsing: dividing work among them is far easier with maildir than with 
> mbox. Doing multiple access of a single file is frequently more 
> difficult (it can be done, of course), and dividing work among many 
> threads is hard to do evenly.

If the limiting factor is CPU power, then you will need a five-CPU box. If
the limiting factor is disk bandwidth, which is more likely, then you will
need a five-disks RAID just to avoid losing performance.

Parallel parsing of folders does not seem a big point to me.

Oh, and you would need locks to merge the result of partial parsing :-Þ

> Please do. I’m always interested in being more accurate, but don’t 
> expect me to bow down before your mighty intellect.

I will start on it as I finish this mail.

> I have found both the performance and the reliability to be better. 
> Performance-wise, compare sendfile() to mysql_fetch_row() plus 
> fwrite(): sendfile goes into the kernel once and makes one copy. The 
> other goes into the kernel four times, and makes four copies.

That is the cost of isolation. I think someone interested in qmail should
know it very well: a database engine embedded in the server itself allows to
directly do a write from the mmap()ed data, which is roughly identical to
sendfile (and, unlike sendfile, is standard Unix). On the other hand, a
separate database server will provide an additional level of protection
against security issues, and can more easily moved to dedicated server (or
server*s*) if the speed becomes an issue.

Which one is better is not something that could be decided once for all.

>                                                               As for 
> reliability, I agree that if you throw sufficient hardware (known as 
> “big iron”) at a database, it can take advantage of it and be quite 
> reliable. On the other hand, with a simple RAID you can make a 
> filesystem extremely reliable,

I was not thinking about hardware failures, they have exactly the same
consequences on databases and filesystems. I was thinking about fsck:
filesystems have bugs as well as databases.

>                                and with network filesystems such as 
> AFS and GFS you can take further advantage of having multiple 
> (geographically disjoint) systems.

Databases can do that too, since they are mostly servers.

>                                    Backing up a filesystem is as easy 
> as an rsync, backing up a database (reliably) is much harder.

Databases engine come with backup tools. They are fairly complex, of course,
but rsync itself is not really simple, and may be much less reliable
(imagine rsyncing a tree while a file is being moved from one directory to
another; if rsync scans the target directory before the move, and the source
directory after, this particular file will never be seen; databases would
not have this problem, thanks to transactions).

> So? No locks. Databases use locks. So does mbox (a trivial database).

No locks? Maybe you should look closer at your favorite filesystem's source
code. Filesystems are full of locks. Moreover, they are kernel-space locks,
which are somewhat trickier than userspace locks.

>                                        all collections of data are 
> databases, and the distinction is essentially implementation detail. 

The implementation details are the same too: look at what filesystems look
like on their block devices, you will find exactly the same data structures
than in database engines.

> And it’s the local spools where you start demanding read-only access?

Yes, of course. Last week's mail needs read-write access, but last year's
mostly don't.

That is exactly what I said in my first mail: I want my mail archives, which
are properly tagged and sorted and do not require write access, somewhere
where no harm can come to it and where it does not eat too much of the
system's resources; I want my current, live mail somewhere where it can be
accessed, for both read and write, in an efficient way. And I want access to
both at the same time, transparently.

This is asking a lot, but nothing impossible.

Regards,

-- 
  Nicolas George

Attachment: signature.asc
Description: Digital signature