Re: [Mutt] #2870: color index foo foo ~h pattern causes many
#2870: color index foo foo ~h pattern causes many fileops on resync only
Changes (by brendan):
* status: new => closed
* resolution: => duplicate
Old description:
> {{{
> Executive summary:
> I am using header_cache="~/Maildir/" and set maildir_header_cache_verify
> = no
> Each "color index ~h" rule does not cause any stats in the Maildir/.snd/
> directory when I _open_ the folder (apparently, the header cache is used
> for that).
> However each of my sent messages is stated and opened four times when
> mutt resyncs the already opened folder, that is 4 times per ~h line that
> is processed from my muttrc
>
> Details:
> I had a bunch of rules like these:
> color index brightgreen black '! ~p'
> color index black brightblue '~C afscv'
> color index brightwhite black '~p'
>
> Those rules are ok, mutt processes them quickly enough (a few seconds)
> whether I open a folder, or mutt resyncs it.
>
> However, rules like these are causing significant delays on auto resyncs
> (i.e. when an open folder is externally modified). 12 rules like the
> ones below caused a resync time of 5mn for a 40,000 message folder
> color index black brightyellow '~h List-Id:.*bugtraq'
> color index brightblue black '~h hits\=3'
> color index brightblue brightyellow '~h post\ from'
>
> mutt can process ~f, ~C and ~p from the header cache, but not ~h. Or at
> least it can't do it on resync apparently (unless the colors are cached
> in the header cache file, which makes for a fast open, but a slow
> resync?)
>
> There is still a clear problem. Just adding a single:
> > color index black brightyellow '~h List-Id:.*bugtraq'
>
> causes 4 stats and opens for _each_ message (again, on resync only)
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
>
> If I add a second ~h line, I get 8 fstat and open calls _per_ message.
> Now, that explains why I was seeing 5mn rsync times.
>
> The unanswered questions are:
> 1) why can mutt use the header cache to run the ~h rules when the folder
> is opened, but not when it's resynced
>
> 2) Why does adding a message to a folder cause mutt to ignore its entire
> cache and rebuild a brand new one at resync time, scanning all
> messages
> multiple times, when at folder open time, mutt looks smart enough to
> use
> the cache for already cached messages, and only open/index/scan
> whatever
> few messages that weren't in the cache already?
>
> 3) 4 fstat and open calls per message in the folder? That's obviously at
> least 3 too many, but most likely 4 times too many.
>
> >How-To-Repeat:
> Take 40,000 message ~/Maildir/.snd folder, open from mutt, send a random
> email that gets fcced to that same folder, see how long mutt takes to
> reopen the .snd folder.
> Repeat operation by adding 10 "color index red black ~h fooheader" rules
> in muttrc
> Verify fileops with strace -e trace=file mutt
> >Fix:
> my current workaround has been to remove all ~h color rules.
>
> Fixes would be:
> 1) mutt should never open a stat the same file 4 times in a row
> 2) mutt resync code looks very suboptimal compared to folder open code.
> If it were to only parse new messages in the folder (messages not in
> the cache), it would effectively fix this problem.
> }}}
New description:
{{{
Executive summary:
I am using header_cache="~/Maildir/" and set maildir_header_cache_verify =
no
Each "color index ~h" rule does not cause any stats in the Maildir/.snd/
directory when I _open_ the folder (apparently, the header cache is used
for that).
However each of my sent messages is stated and opened four times when
mutt resyncs the already opened folder, that is 4 times per ~h line that
is processed from my muttrc
Details:
I had a bunch of rules like these:
color index brightgreen black '! ~p'
color index black brightblue '~C afscv'
color index brightwhite black '~p'
Those rules are ok, mutt processes them quickly enough (a few seconds)
whether I open a folder, or mutt resyncs it.
However, rules like these are causing significant delays on auto resyncs
(i.e. when an open folder is externally modified). 12 rules like the
ones below caused a resync time of 5mn for a 40,000 message folder
color index black brightyellow '~h List-Id:.*bugtraq'
color index brightblue black '~h hits\=3'
color index brightblue brightyellow '~h post\ from'
mutt can process ~f, ~C and ~p from the header cache, but not ~h. Or at
least it can't do it on resync apparently (unless the colors are cached
in the header cache file, which makes for a fast open, but a slow resync?)
There is still a clear problem. Just adding a single:
> color index black brightyellow '~h List-Id:.*bugtraq'
causes 4 stats and opens for _each_ message (again, on resync only)
> open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
If I add a second ~h line, I get 8 fstat and open calls _per_ message.
Now, that explains why I was seeing 5mn rsync times.
The unanswered questions are:
1) why can mutt use the header cache to run the ~h rules when the folder
is opened, but not when it's resynced
2) Why does adding a message to a folder cause mutt to ignore its entire
cache and rebuild a brand new one at resync time, scanning all messages
multiple times, when at folder open time, mutt looks smart enough to
use
the cache for already cached messages, and only open/index/scan
whatever
few messages that weren't in the cache already?
3) 4 fstat and open calls per message in the folder? That's obviously at
least 3 too many, but most likely 4 times too many.
>How-To-Repeat:
Take 40,000 message ~/Maildir/.snd folder, open from mutt, send a random
email that gets fcced to that same folder, see how long mutt takes to
reopen the .snd folder.
Repeat operation by adding 10 "color index red black ~h fooheader" rules
in muttrc
Verify fileops with strace -e trace=file mutt
>Fix:
my current workaround has been to remove all ~h color rules.
Fixes would be:
1) mutt should never open a stat the same file 4 times in a row
2) mutt resync code looks very suboptimal compared to folder open code.
If it were to only parse new messages in the folder (messages not in
the cache), it would effectively fix this problem.
}}}
Comment:
This looks like a duplicate of #1216, and should be fixed by
[73894f3f1943].
--
Ticket URL: <http://dev.mutt.org/trac/ticket/2870#comment:4>