I'm once again full-text-indexing all my email, and it's working great. This has only limited relation to mutt; I use mutt, but most any MUA could be plugged in place of it. In the rest of this message I'll describe what I'm doing and how. Last time I did full-text indexing, I used glimpse; but that codebase has gone down a road I didn't want to follow (restricted-use commercial code). For a long time I did without. Recently I took another look over the available full-text indexers; Freshmeat gave me: Lupy, glimpse, harvest, holmes, namazu, swish, swish++, and yase. The first one I looked at closely was swish++, and I stopped there. Perhaps some of the others would have worked better; I didn't check. Given swish++, the whole job was almost perfectly trivial. I find that on my platform, the memory scaling of index is somewhat different from the author's; on Red Hat 8 I found -W100000 climbed to 76MB before finishing my email archives, where the author was seeing 64MB per 250Kwords. Aside from that everything is slick. My email I archive in Maildirs; that's critical to this strategy. I built the initial index, didn't take all that long, and I incrementally re-index periodically and that's _really_ quick; I re-index with: #!/bin/sh -e cd $HOME/archive/Mail find */??? -type f -newer swish++.index | index -W100000 -I - mv swish++.index.new swish++.index With a current index, I can do keyword searches for email with the attached perl script; invoked with keywords (actually, search takes boolean relations of keywords) it will build a tmp maildir populated with links to the matching messages, and invoke mutt on it. Very, very fast. -Bennett
#!/usr/bin/perl -w use strict; use IO::File; use File::Basename; my $nothing = <<'EoF'; Lucy Locket lost her pocket; Kitty Fisher found it. Nothing in it, nothing in it, but the binding 'round it. EoF my $tmpbox = $ENV{HOME} . '/.mailsearch' . $$; END { exec "rm", "-rf", $tmpbox; } mkdir $tmpbox, 0700 or die; mkdir "$tmpbox/$_" or die for qw(tmp new cur); my $cur = "$tmpbox/cur"; chdir $ENV{HOME} . '/archive/Mail' or die; my $gotsome = 0; my $fi = IO::File->new("search @ARGV|") or die; while (defined($_ = $fi->getline)) { next if /^#/; my $fn = (split)[1]; link $fn, "$cur/@{[basename($fn)]}" or die; $gotsome = 1; } die $nothing unless $gotsome; system "mutt", "-f", $tmpbox;
Attachment:
pgpDYXHDknjKL.pgp
Description: PGP signature