<<< Date Index >>>     <<< Thread Index >>>

[Mutt] #3396: Spam command is unpredictable



#3396: Spam command is unpredictable
-------------------------+--------------------------------------------------
 Reporter:  matthijs     |       Owner:  mutt-dev
     Type:  enhancement  |      Status:  new     
 Priority:  minor        |   Milestone:          
Component:  mutt         |     Version:  1.5.20  
 Keywords:               |  
-------------------------+--------------------------------------------------
 It took me a while to find this one out. I have a pretty complex spam
 header matching setup, since I want nice aligned output (and thus have
 different matching rules for negative spamscores, spamscores with > 1
 digit and rules without any spamscore at all), but I can't do that
 reliably currently. Here goes.

 The manual says "If the $spam_separator variable is unset, then each spam
 pattern match supersedes the previous one. Instead of getting joined
 format strings, you'll get only the last one to match."

 I've always interpreted this as "You'll get the result of the last pattern
 that matches any header in the message". However, after reading the
 source, I've understood this means "You'll get the result of the last
 _header_ in the message that matches any pattern". Furthermore, if
 multiple patterns match a single header, you'll get the first one to
 match, not the last one.

 In other words, this means the order of headers in the messages greatly
 influences the match result and you can't affect that by reordering the
 spam commands.

 Thinking about this, it seems that, given a header occurs twice in a
 message, it's better to use the last one that matches. This ensures that
 you always use the email scanner that is "closest" to you, e.g., your own
 scanner overrides the one of your ISP. However, when you apply this to two
 different headers, stuff gets confusing.

 So the current situation is: Use the last header that matches a pattern
 and use the first pattern in case there are multiple patterns that match
 that last header.

 I would propose the following: Use the first pattern that matches any
 header. If it matches multiple headers, use the last header that matches.

 This will change nothing in the common case of a single pattern. When
 multiple patterns matching _different_ headers are used, one can now
 prioritize them using the order of the patterns. For multiple patterns
 matching the same header nothing will change either, they will still use
 the first pattern that matches the last header.

 This is only in the case spam_separator is unset. If spam_separator is
 set, I would leave the behaviour completely unchanged. It might make sense
 to change the ordering of matches, so it is more obvious to say "If
 spam_separator is unset, it simply takes the last (or first) part of the
 value that you get when spam_separator is set". However, this would
 require significantly changing the implementation by keeping all the
 (matching) header lines in memory, whereas the current implementation just
 processes them line by line.

 Note that the implementation needs only minor changes for my proposal in
 the spam_separator is unset case: In addition to keeping the matching spam
 tag, you also keep track of the pattern (index) that it resulted from.
 Now, when you find a new match in a subsequent header, only overwrite your
 previous match when the pattern index of the match is lower than the
 pattern index of the previous match. AFAICS, this should implement above
 proposal with minimal changes.

 If you think this is a good idea, let me know and I'll prepare a patch for
 this. If not, please add some clarification about how it works now and
 close this ticket :-)

-- 
Ticket URL: <http://dev.mutt.org/trac/ticket/3396>
Mutt <http://www.mutt.org/>
The Mutt mail user agent