[Mutt] #3396: Spam command is unpredictable
#3396: Spam command is unpredictable
-------------------------+--------------------------------------------------
Reporter: matthijs | Owner: mutt-dev
Type: enhancement | Status: new
Priority: minor | Milestone:
Component: mutt | Version: 1.5.20
Keywords: |
-------------------------+--------------------------------------------------
It took me a while to find this one out. I have a pretty complex spam
header matching setup, since I want nice aligned output (and thus have
different matching rules for negative spamscores, spamscores with > 1
digit and rules without any spamscore at all), but I can't do that
reliably currently. Here goes.
The manual says "If the $spam_separator variable is unset, then each spam
pattern match supersedes the previous one. Instead of getting joined
format strings, you'll get only the last one to match."
I've always interpreted this as "You'll get the result of the last pattern
that matches any header in the message". However, after reading the
source, I've understood this means "You'll get the result of the last
_header_ in the message that matches any pattern". Furthermore, if
multiple patterns match a single header, you'll get the first one to
match, not the last one.
In other words, this means the order of headers in the messages greatly
influences the match result and you can't affect that by reordering the
spam commands.
Thinking about this, it seems that, given a header occurs twice in a
message, it's better to use the last one that matches. This ensures that
you always use the email scanner that is "closest" to you, e.g., your own
scanner overrides the one of your ISP. However, when you apply this to two
different headers, stuff gets confusing.
So the current situation is: Use the last header that matches a pattern
and use the first pattern in case there are multiple patterns that match
that last header.
I would propose the following: Use the first pattern that matches any
header. If it matches multiple headers, use the last header that matches.
This will change nothing in the common case of a single pattern. When
multiple patterns matching _different_ headers are used, one can now
prioritize them using the order of the patterns. For multiple patterns
matching the same header nothing will change either, they will still use
the first pattern that matches the last header.
This is only in the case spam_separator is unset. If spam_separator is
set, I would leave the behaviour completely unchanged. It might make sense
to change the ordering of matches, so it is more obvious to say "If
spam_separator is unset, it simply takes the last (or first) part of the
value that you get when spam_separator is set". However, this would
require significantly changing the implementation by keeping all the
(matching) header lines in memory, whereas the current implementation just
processes them line by line.
Note that the implementation needs only minor changes for my proposal in
the spam_separator is unset case: In addition to keeping the matching spam
tag, you also keep track of the pattern (index) that it resulted from.
Now, when you find a new match in a subsequent header, only overwrite your
previous match when the pattern index of the match is lower than the
pattern index of the previous match. AFAICS, this should implement above
proposal with minimal changes.
If you think this is a good idea, let me know and I'll prepare a patch for
this. If not, please add some clarification about how it works now and
close this ticket :-)
--
Ticket URL: <http://dev.mutt.org/trac/ticket/3396>
Mutt <http://www.mutt.org/>
The Mutt mail user agent