<<< Date Index >>>     <<< Thread Index >>>

Re: [Mutt] #2870: color index foo foo ~h pattern causes many



#2870: color index foo foo ~h pattern causes many fileops on resync only

Changes (by brendan):

  * status:  new => closed
  * resolution:  => duplicate

Old description:

> {{{
> Executive summary:
> I am using header_cache="~/Maildir/" and set maildir_header_cache_verify
> = no
> Each "color index ~h" rule does not cause any stats in the Maildir/.snd/
> directory when I _open_ the folder (apparently, the header cache is used
> for that).
> However each of my sent messages is stated and opened four times when
> mutt resyncs the already opened folder, that is 4 times per ~h line that
> is processed from my muttrc
>
> Details:
> I had a bunch of rules like these:
> color index     brightgreen     black           '! ~p'
> color index     black           brightblue      '~C afscv'
> color index     brightwhite     black           '~p'
>
> Those rules are ok, mutt processes them quickly enough (a few seconds)
> whether I open a folder, or mutt resyncs it.
>
> However, rules like these are causing significant delays on auto resyncs
> (i.e. when an open folder is externally modified). 12 rules like the
> ones below caused a resync time of 5mn for a 40,000 message folder
> color index    black           brightyellow    '~h List-Id:.*bugtraq'
> color index    brightblue      black           '~h hits\=3'
> color index    brightblue      brightyellow    '~h post\ from'
>
> mutt can process ~f, ~C and ~p from the header cache, but not ~h. Or at
> least it can't do it on resync apparently (unless the colors are cached
> in the header cache file, which makes for a fast open, but a slow
> resync?)
>
> There is still a clear problem. Just adding a single:
> > color index    black           brightyellow    '~h List-Id:.*bugtraq'
>
> causes 4 stats and opens for _each_ message (again, on resync only)
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
> > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
> O_RDONLY|O_LARGEFILE) = 3
> > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
>
> If I add a second ~h line, I get 8 fstat and open calls _per_ message.
> Now, that explains why I was seeing 5mn rsync times.
>
> The unanswered questions are:
> 1) why can mutt use the header cache to run the ~h rules when the folder
>    is opened, but not when it's resynced
>
> 2) Why does adding a message to a folder cause mutt to ignore its entire
>    cache and rebuild a brand new one at resync time, scanning all
> messages
>    multiple times, when at folder open time, mutt looks smart enough to
> use
>    the cache for already cached messages, and only open/index/scan
> whatever
>    few messages that weren't in the cache already?
>
> 3) 4 fstat and open calls per message in the folder? That's obviously at
>    least 3 too many, but most likely 4 times too many.
>
> >How-To-Repeat:
> Take 40,000 message ~/Maildir/.snd folder, open from mutt, send a random
> email that gets fcced to that same folder, see how long mutt takes to
> reopen the .snd folder.
> Repeat operation by adding 10 "color index red black ~h fooheader" rules
> in muttrc
> Verify fileops with strace -e trace=file mutt
> >Fix:
> my current workaround has been to remove all ~h color rules.
>
> Fixes would be:
> 1) mutt should never open a stat the same file 4 times in a row
> 2) mutt resync code looks very suboptimal compared to folder open code.
>    If it were to only parse new messages in the folder (messages not in
>    the cache), it would effectively fix this problem.
> }}}

New description:

 {{{
 Executive summary:
 I am using header_cache="~/Maildir/" and set maildir_header_cache_verify =
 no
 Each "color index ~h" rule does not cause any stats in the Maildir/.snd/
 directory when I _open_ the folder (apparently, the header cache is used
 for that).
 However each of my sent messages is stated and opened four times when
 mutt resyncs the already opened folder, that is 4 times per ~h line that
 is processed from my muttrc

 Details:
 I had a bunch of rules like these:
 color index     brightgreen     black           '! ~p'
 color index     black           brightblue      '~C afscv'
 color index     brightwhite     black           '~p'

 Those rules are ok, mutt processes them quickly enough (a few seconds)
 whether I open a folder, or mutt resyncs it.

 However, rules like these are causing significant delays on auto resyncs
 (i.e. when an open folder is externally modified). 12 rules like the
 ones below caused a resync time of 5mn for a 40,000 message folder
 color index    black           brightyellow    '~h List-Id:.*bugtraq'
 color index    brightblue      black           '~h hits\=3'
 color index    brightblue      brightyellow    '~h post\ from'

 mutt can process ~f, ~C and ~p from the header cache, but not ~h. Or at
 least it can't do it on resync apparently (unless the colors are cached
 in the header cache file, which makes for a fast open, but a slow resync?)

 There is still a clear problem. Just adding a single:
 > color index    black           brightyellow    '~h List-Id:.*bugtraq'

 causes 4 stats and opens for _each_ message (again, on resync only)
 > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
 O_RDONLY|O_LARGEFILE) = 3
 > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
 > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
 O_RDONLY|O_LARGEFILE) = 3
 > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
 > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
 O_RDONLY|O_LARGEFILE) = 3
 > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0
 > open("/home/merlin/Maildir//.snd/cur/1162956827.15796_57.magic:2,S",
 O_RDONLY|O_LARGEFILE) = 3
 > fstat64(3, {st_mode=S_IFREG|0600, st_size=1914, ...}) = 0

 If I add a second ~h line, I get 8 fstat and open calls _per_ message.
 Now, that explains why I was seeing 5mn rsync times.

 The unanswered questions are:
 1) why can mutt use the header cache to run the ~h rules when the folder
    is opened, but not when it's resynced

 2) Why does adding a message to a folder cause mutt to ignore its entire
    cache and rebuild a brand new one at resync time, scanning all messages
    multiple times, when at folder open time, mutt looks smart enough to
 use
    the cache for already cached messages, and only open/index/scan
 whatever
    few messages that weren't in the cache already?

 3) 4 fstat and open calls per message in the folder? That's obviously at
    least 3 too many, but most likely 4 times too many.

 >How-To-Repeat:
 Take 40,000 message ~/Maildir/.snd folder, open from mutt, send a random
 email that gets fcced to that same folder, see how long mutt takes to
 reopen the .snd folder.
 Repeat operation by adding 10 "color index red black ~h fooheader" rules
 in muttrc
 Verify fileops with strace -e trace=file mutt
 >Fix:
 my current workaround has been to remove all ~h color rules.

 Fixes would be:
 1) mutt should never open a stat the same file 4 times in a row
 2) mutt resync code looks very suboptimal compared to folder open code.
    If it were to only parse new messages in the folder (messages not in
    the cache), it would effectively fix this problem.
 }}}

Comment:

 This looks like a duplicate of #1216, and should be fixed by
 [73894f3f1943].

-- 
Ticket URL: <http://dev.mutt.org/trac/ticket/2870#comment:4>