<<< Date Index >>>     <<< Thread Index >>>

Re: header cache for all folder types?



* On 2007.02.05, in <20070205174845.GT24582@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
*       "Thomas Roessler" <roessler@xxxxxxxxxxxxxxxxxx> wrote:
> I'm inclined to suggest that we turn on the header cahce for *all*
> folder types, not just for maildirs -- or at least to do some
> performance testing as to whether there's a way to activate it for
> mbox folders that would make parsing these much faster.

I would like to see that.  I'm afraid I can't spare time now -- many
projects open at once -- but perhaps I could at some point if it's not
already adopted.  Nonetheless, maybe it's meaningful to talk about
strategy.

T. Glanzmann has implied that caching mbox is not doable[1] because of
cache consistency concerns, presumably because mboxes messages aren't
discrete and associated to a unique filestore object that carries change
metadata.

I think a "good enough" solution can be reached by storing message byte
offsets in the cache db with a checksum/hash of the N bytes following
that offset.  From offset deltas you can deduce the message length in
the real mbox file (the cache may already know the length) and the
hash over length N gives you a probabilty of N/length that the message
has not been externally modified.

If N is equal to the header length, then that's equal to Maildir's
confidence, but N can vary (at cost of confidence) if it improves
performance.  N can be different for each message, and cached.

Performance of an mbox cached in this way is probably not notably
greater (if at all) than uncached where messages are small (under one
or two read blocks), but in an mbox folder with large messages, the
seek should improve performance somewhat.  The test data would be
interesting, anyway.


[1] Message-id: <20040722064752.GA2727@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

-- 
 -D.    dgc@xxxxxxxxxxxx        NSIT    University of Chicago