<<< Date Index >>>     <<< Thread Index >>>

fadvise WILLNEED for tokyocabinet header cache



Hello,

Probably for the first time I feel I'm getting satisfying performance
while opening large folders using tokyocabinet and headercache on
maildir format on cold caches (compression option enabled by default,
and maildir_header_cache_verify=no).

But with vmstat 1 I noticed that when tokyocabinet loads its data from
disk, the bandwidth is way below what disk could deliver (<10M/sec),
so I fixed it with this patch. With this patch the headercache file is
read from disk at >30M/sec (with readahead, large DMA commands, and
few seeks, well as filesystem permits at least). There's no need of
expensive SSD to force mutt to hit the disk on cold cache open at
>30M/sec, so I think it's pretty important to fix this.

I'd like if you could add a maildir_header_cache_fadvise=200m option
to the .muttrc file, disabled (=0) by default but easy to enable with
a one liner in .muttrc. This is to enable on systems where you know
the kernel pagecache is much larger than 200m, if pagecache is smaller
than 200m it would only trash and cause more I/O. When the data is
already in cache the slowdown is minimum. With this the header cache
file is read over 30M/sec, not below 10M/sec. With compression
tokyocabinet generates maildir well below 100M even for hundred
thousand of msg in the folder. So 200m as setting seems quite enough
and most systems will have more than 200m of pagecache. In addition to
making the preload size configurable in .muttrc, the patch should also
be changed not to fadivse if the header cache file is larger than the
parameter of the option in .muttrc. If there is interest I can do both
changes myself and repost a proper patch, this is a local hack for
myself so far.

On a side note, a next step for further optimization, would be to
eliminate the seeking low-disk-bandwidth getdents/readdir on the very
large 'cur' directory, I'm unsure why there's that getdents and why
running a getdents on the tiny 'new' directory (plus of course reading
the header cache representing the 'cur' directory but at >30m/sec with
below patch) isn't enough. The maildir_header_cache_verify at least
eliminates a flood of stats but I hope the getdents on 'cur' can also
be removed eventually!

Hope this helps.
Andrea

Index: mutt-1.5.19/hcache.c
--- mutt-1.5.19/hcache.c.orig   2009-06-21 03:31:11.000000000 +0200
+++ mutt-1.5.19/hcache.c        2009-06-21 03:39:52.000000000 +0200
@@ -894,9 +894,15 @@ mutt_hcache_delete(header_cache_t *h, co
 static int
 hcache_open_tc (struct header_cache* h, const char* path)
 {
+  int fd;
   h->db = tcbdbnew();
   if (option(OPTHCACHECOMPRESS))
     tcbdbtune(h->db, 0, 0, 0, -1, -1, BDBTDEFLATE);
+  fd = open(path, O_RDONLY);
+  if (fd >= 0) {
+    posix_fadvise(fd, 0, 200*1024*1024, POSIX_FADV_WILLNEED);
+    close(fd);
+  }
   if (tcbdbopen(h->db, path, BDBOWRITER | BDBOCREAT))
     return 0;
   else