Re: [PATCH] generic spam detection
v3 is attached.
* On 2004.07.14, in <20040715012321.GW24127@xxxxxxxxxxxxxxxxx>,
* "David Champion" <dgc@xxxxxxxxxxxx> wrote:
>
> I think that "" should not be the default, but I'm split evenly on
> whether it should be unset or "," (or the like).
I changed the default $spam_separator to "," for this version of the
patch -- just for variety, in case anyone is trying this and feels that
it makes a better default.
> It's surprising to me that anyone would want to folder-hook these --
> my original thought was that spam patterns would remain the same for
> all folders, and it seems strange that one folder might use different
> external spam engines than another. But perhaps this is for a non-spam
> application of the functionality. More below.
Another note on this topic: notice that HEADER->env->spam is only
updated during folder reads. This means that folder-hook is the *only*
circumstance in which it makes any sense to change/add/remove spam or
nospam lists, or $spam_separator, during runtime. I overlooked this
before, in fact: for $spam_separator, R_NONE is a suitable flag because
changing $spam_separator won't update any spam tags until the mailbox is
reread anyway, and that already implies a full redraw.
This isn't strictly necessary, it's just an optimization against
assembling lists into strings on the fly, as the index is rendered. But
it seems like a wise one.
> > It would be consistent with behavior of other list management
> > functions in mutt if spam would go through the nospam list and
> > remove a possibly identical regular expression (in addition to
> > "spamming" something), and if "nospam" would go through the spam
> > list and remove anything matching from that one.
>
> Agreed. This is doable, and I'm willing to extend the patch to cover
> this. It shouldn't be hard.
This is done. Suppose the following four rules:
spam aaa aaa # if "aaa" matches, add "aaa" into the spam tag
spam bbb bbb # if "bbb" matches, add "bbb" into the spam tag
nospam aaabbb # a special-case exception to "spam aaa"
nospam bbbaaa # a special-case exception to "spam bbb"
Then:
spam bbbaaa # removes the second "nospam" above
nospam aaa # removes the first "spam" above
Additionally, adding a "spam" rule whose pattern already exists will now
update it to use the new template. (Formerly this was a silent no-op.)
So:
spam bbb ccc # changes "spam" rule #2.
The result is effectively:
spam bbb ccc # if "bbb" matches, add "ccc" into the spam tag
nospam aaabbb # an exception to "spam bbb"
Also, "nospam *" removes all spam and nospam entries. (No point leaving
nospam rules when removing all spam rules.) This will be useful in
default folder-hooks, for resetting both spam and nospam lists to empty.
> > That said, I'm wondering if the spam/nospam stuff shouldn't reuse
> > the current hook framework; we'd then have "unhook" as a
> > coarse-grained mechanism to clean up the situation.
> >
> > spam-hook, ham-hook?
>
> The hook framework would need to be extended to handle backreferences,
> I think, but this seems like it would work. It has the advantage you
> mention, but I think it still requires a new datatype (e.g. spam_list_t)
> to map the hook pattern to a spam tag template, and most of the
> functionality in parse_spam_list() would be duplicated anyway inside
> mutt_parse_hook(). It seems like the primary advantage of changing
> it would be in the extent to which it makes the user experience
> more consistent. But it's not clear to me whether it would be more
> connsistent, particularly. I wonder what others think of this.
I don't mean to remove this option from discussion, I just felt that the
other approach was something I could address today.
--
-D. dgc@xxxxxxxxxxxx NSIT::ENSS
--- mutt-1.5.6/PATCHES~ never
+++ mutt-1.5.6/PATCHES Thu Jul 15 01:46:16 CDT 2004
@@ -1,0 +1 @@
+patch-1.5.6.dgc.hormel.3
diff -ur mutt-1.5.6-base/commands.c mutt-1.5.6-hormel.3/commands.c
--- mutt-1.5.6-base/commands.c Sun Feb 1 11:10:57 2004
+++ mutt-1.5.6-hormel.3/commands.c Sun Jul 11 22:19:55 2004
@@ -501,9 +501,9 @@
int method = Sort; /* save the current method in case of abort */
switch (mutt_multi_choice (reverse ?
- _("Rev-Sort
(d)ate/(f)rm/(r)ecv/(s)ubj/t(o)/(t)hread/(u)nsort/si(z)e/s(c)ore?: ") :
- _("Sort
(d)ate/(f)rm/(r)ecv/(s)ubj/t(o)/(t)hread/(u)nsort/si(z)e/s(c)ore?: "),
- _("dfrsotuzc")))
+ _("Rev-Sort
(d)ate/(f)rm/(r)ecv/(s)ubj/t(o)/(t)hread/(u)nsort/si(z)e/s(c)ore/s(p)am?: ") :
+ _("Sort
(d)ate/(f)rm/(r)ecv/(s)ubj/t(o)/(t)hread/(u)nsort/si(z)e/s(c)ore/s(p)am?: "),
+ _("dfrsotuzcp")))
{
case -1: /* abort - don't resort */
return -1;
@@ -542,6 +542,10 @@
case 9: /* s(c)ore */
Sort = SORT_SCORE;
+ break;
+
+ case 10: /* s(p)am */
+ Sort = SORT_SPAM;
break;
}
if (reverse)
diff -ur mutt-1.5.6-base/doc/manual.sgml.head
mutt-1.5.6-hormel.3/doc/manual.sgml.head
--- mutt-1.5.6-base/doc/manual.sgml.head Sun Feb 1 11:49:53 2004
+++ mutt-1.5.6-hormel.3/doc/manual.sgml.head Thu Jul 15 01:08:27 2004
@@ -1492,6 +1492,106 @@
removed. The pattern ``*'' is a special token which means to clear the list
of all score entries.
+<sect1>Spam detection<label id="spam">
+<p>
+Usage: <tt/spam/ <em/pattern/ <em/format/
+Usage: <tt/nospam/ <em/pattern/
+
+Mutt has generalized support for external spam-scoring filters.
+By defining your spam patterns with the <tt/spam/ and <tt/nospam/
+commands, you can <em/limit/, <em/search/, and <em/sort/ your
+mail based on its spam attributes, as determined by the external
+filter. You also can display the spam attributes in your index
+display using the <tt/%H/ selector in the <ref id="index_format"
+name="$index_format"> variable. (Tip: try <tt/%?H?[%H] ?/
+to display spam tags only when they are defined for a given message.)
+
+Your first step is to define your external filter's spam patterns using
+the <tt/spam/ command. <em/pattern/ should be a regular expression
+that matches a header in a mail message. If any message in the mailbox
+matches this regular expression, it will receive a ``spam tag'' or
+``spam attribute'' (unless it also matches a <tt/nospam/ pattern -- see
+below.) The appearance of this attribute is entirely up to you, and is
+governed by the <em/format/ parameter. <em/format/ can be any static
+text, but it also can include back-references from the <em/pattern/
+expression. (A regular expression ``back-reference'' refers to a
+sub-expression contained within parentheses.) <tt/%1/ is replaced with
+the first back-reference in the regex, <tt/%2/ with the second, etc.
+
+If you're using multiple spam filters, a message can have more than
+one spam-related header. You can define <tt/spam/ patterns for each
+filter you use. If a message matches two or more of these patterns, and
+the $spam_separator variable is set to a string, then the
+message's spam tag will consist of all the <em/format/ strings joined
+together, with the value of $spam_separator separating
+them.
+
+For example, suppose I use DCC, SpamAssassin, and PureMessage. I might
+define these spam settings:
+<tscreen><verb>
+spam "X-DCC-.*-Metrics:.*(....)=many" "90+/DCC-%1"
+spam "X-Spam-Status: Yes" "90+/SA"
+spam "X-PerlMX-Spam: .*Probability=([0-9]+)%" "%1/PM"
+set spam_separator=", "
+</verb></tscreen>
+
+If I then received a message that DCC registered with ``many'' hits
+under the ``Fuz2'' checksum, and that PureMessage registered with a
+97% probability of being spam, that message's spam tag would read
+<tt>90+/DCC-Fuz2, 97/PM</tt>. (The four characters before ``=many'' in a
+DCC report indicate the checksum used -- in this case, ``Fuz2''.)
+
+If the $spam_separator variable is unset, then each
+spam pattern match supercedes the previous one. Instead of getting
+joined <em/format/ strings, you'll get only the last one to match.
+
+The spam tag is what will be displayed in the index when you use
+<tt/%H/ in the <tt/$index_format/ variable. It's also the
+string that the <tt/~H/ pattern-matching expression matches against for
+<em/search/ and <em/limit/ functions. And it's what sorting by spam
+attribute will use as a sort key.
+
+That's a pretty complicated example, and most people's actual
+environments will have only one spam filter. The simpler your
+configuration, the more effective mutt can be, especially when it comes
+to sorting.
+
+Generally, when you sort by spam tag, mutt will sort <em/lexically/ --
+that is, by ordering strings alphnumerically. However, if a spam tag
+begins with a number, mutt will sort numerically first, and lexically
+only when two numbers are equal in value. (This is like UNIX's
+<tt/sort -n/.) A message with no spam attributes at all -- that is, one
+that didn't match <em/any/ of your <tt/spam/ patterns -- is sorted at
+lowest priority. Numbers are sorted next, beginning with 0 and ranging
+upward. Finally, non-numeric strings are sorted, with ``a'' taking lower
+priority than ``z''. Clearly, in general, sorting by spam tags is most
+effective when you can coerce your filter to give you a raw number. But
+in case you can't, mutt can still do something useful.
+
+The <tt/nospam/ command can be used to write exceptions to <tt/spam/
+patterns. If a header pattern matches something in a <tt/spam/ command,
+but you nonetheless do not want it to receive a spam tag, you can list a
+more precise pattern under a <tt/nospam/ command.
+
+If the <em/pattern/ given to <tt/nospam/ is exactly the same as the
+<em/pattern/ on an existing <tt/spam/ list entry, the effect will be to
+remove the entry from the spam list, instead of adding an exception.
+Likewise, if the <em/pattern/ for a <tt/spam/ command matches an entry
+on the <tt/nospam/ list, that <tt/nospam/ entry will be removed. If the
+<em/pattern/ for <tt/nospam/ is ``*'', <em/all entries on both lists/
+will be removed. This might be the default action if you use <tt/spam/
+and <tt/nospam/ in conjunction with a <tt/folder-hook/.
+
+You can have as many <tt/spam/ or <tt/nospam/ commands as you like.
+You can even do your own primitive spam detection within mutt -- for
+example, if you consider all mail from <tt/MAILER-DAEMON/ to be spam,
+you can use a <tt/spam/ command like this:
+
+<tscreen><verb>
+spam "^From: .*MAILER-DAEMON" "999"
+</verb></tscreen>
+
+
<sect1>Setting variables<label id="set">
<p>
Usage: <tt/set/ [no|inv]<em/variable/[=<em/value/] [
<em/variable/ ... ]<newline>
@@ -1759,6 +1859,7 @@
~f USER messages originating from USER
~g cryptographically signed messages
~G cryptographically encrypted messages
+~H EXPR messages with a spam attribute matching EXPR
~h EXPR messages which contain EXPR in the message header
~k message contains PGP key material
~i ID message which match ID in the ``Message-ID'' field
@@ -2390,7 +2491,7 @@
<sect1>Start a WWW Browser on URLs (EXTERNAL)<label id="urlview">
<p>
-If a message contains URLs (<em/unified ressource locator/ = address in the
+If a message contains URLs (<em/unified resource locator/ = address in the
WWW space like <em>http://www.mutt.org/</em>), it is efficient to get
a menu with all the URLs and start a WWW browser on one of them. This
functionality is provided by the external urlview program which can be
@@ -3053,6 +3154,10 @@
<tt><ref id="set" name="unset"></tt> <em/variable/ [<em/variable/ ...
]
<item>
<tt><ref id="source" name="source"></tt> <em/filename/
+<item>
+<tt><ref id="spam" name="spam"></tt> <em/pattern/ <em/format/
+<item>
+<tt><ref id="spam" name="nospam"></tt> <em/pattern/
<item>
<tt><ref id="lists" name="subscribe"></tt> <em/address/ [ <em/address/
... ]
<item>
diff -ur mutt-1.5.6-base/doc/muttrc.man.head
mutt-1.5.6-hormel.3/doc/muttrc.man.head
--- mutt-1.5.6-base/doc/muttrc.man.head Sun Feb 1 11:15:18 2004
+++ mutt-1.5.6-hormel.3/doc/muttrc.man.head Mon Jul 12 01:15:49 2004
@@ -336,6 +336,15 @@
\fBsource\fP \fIfilename\fP
The given file will be evaluated as a configuration file.
.TP
+.nf
+\fBspam\fP \fIpattern\fP \fIformat\fP
+\fBnospam\fP \fIpattern\fP
+.fi
+These commands define spam-detection patterns from external spam
+filters, so that mutt can sort, limit, and search on
+``spam tags'' or ``spam attributes'', or display them
+in the index. See the Mutt manual for details.
+.TP
\fBunhook\fP [\fB * \fP | \fIhook-type\fP ]
This command will remove all hooks of a given type, or all hooks
when \(lq\fB*\fP\(rq is used as an argument. \fIhook-type\fP
@@ -384,6 +393,7 @@
~f \fIEXPR\fP messages originating from \fIEXPR\fP
~g PGP signed messages
~G PGP encrypted messages
+~H \fIEXPR\fP messages with spam tags matching \fIEXPR\fP
~h \fIEXPR\fP messages which contain \fIEXPR\fP in the message header
~k message contains PGP key material
~i \fIEXPR\fP message which match \fIEXPR\fP in the \(lqMessage-ID\(rq field
diff -ur mutt-1.5.6-base/globals.h mutt-1.5.6-hormel.3/globals.h
--- mutt-1.5.6-base/globals.h Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/globals.h Sun Jul 11 02:16:57 2004
@@ -102,6 +102,7 @@
WHERE char *Signature;
WHERE char *SimpleSearch;
WHERE char *Spoolfile;
+WHERE char *SpamSep;
#if defined(USE_SSL) || defined(USE_NSS)
WHERE char *SslCertFile INITVAL (NULL);
WHERE char *SslEntropyFile INITVAL (NULL);
@@ -125,6 +126,8 @@
WHERE RX_LIST *Alternates INITVAL(0);
WHERE RX_LIST *MailLists INITVAL(0);
WHERE RX_LIST *SubscribedLists INITVAL(0);
+WHERE SPAM_LIST *SpamList INITVAL(0);
+WHERE RX_LIST *NoSpamList INITVAL(0);
/* bit vector for boolean variables */
#ifdef MAIN_C
diff -ur mutt-1.5.6-base/hdrline.c mutt-1.5.6-hormel.3/hdrline.c
--- mutt-1.5.6-base/hdrline.c Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/hdrline.c Sun Jul 11 02:16:57 2004
@@ -433,6 +433,18 @@
optional = 0;
break;
+ case 'H':
+ /* (Hormel) spam score */
+ if (optional)
+ optional = hdr->env->spam ? 1 : 0;
+
+ if (hdr->env->spam)
+ mutt_format_s (dest, destlen, prefix, NONULL (hdr->env->spam->data));
+ else
+ mutt_format_s (dest, destlen, prefix, "");
+
+ break;
+
case 'i':
mutt_format_s (dest, destlen, prefix, hdr->env->message_id ?
hdr->env->message_id : "<no.id>");
break;
diff -ur mutt-1.5.6-base/init.c mutt-1.5.6-hormel.3/init.c
--- mutt-1.5.6-base/init.c Sun Feb 1 12:21:00 2004
+++ mutt-1.5.6-hormel.3/init.c Thu Jul 15 01:11:04 2004
@@ -366,6 +365,112 @@
}
+static int add_to_spam_list (SPAM_LIST **list, const char *pat, const char
*templ, BUFFER *err)
+{
+ SPAM_LIST *t = NULL, *last = NULL;
+ REGEXP *rx;
+ int n;
+ const char *p;
+
+ if (!pat || !*pat || !templ)
+ return 0;
+
+ if (!(rx = mutt_compile_regexp (pat, REG_ICASE)))
+ {
+ snprintf (err->data, err->dsize, _("Bad regexp: %s"), pat);
+ return -1;
+ }
+
+ /* check to make sure the item is not already on this list */
+ for (last = *list; last; last = last->next)
+ {
+ if (ascii_strcasecmp (rx->pattern, last->rx->pattern) == 0)
+ {
+ /* Already on the list. Formerly we just skipped this case, but
+ * now we're supporting removals, which means we're supporting
+ * re-adds conceptually. So we probably want this to imply a
+ * removal, then do an add. We can achieve the removal by freeing
+ * the template, and leaving t pointed at the current item.
+ */
+ t = last;
+ safe_free(&t->template);
+ break;
+ }
+ if (!last->next)
+ break;
+ }
+
+ /* If t is set, it's pointing into an extant SPAM_LIST* that we want to
+ * update. Otherwise we want to make a new one to link at the list's end.
+ */
+ if (!t)
+ {
+ t = mutt_new_spam_list();
+ t->rx = rx;
+ if (last)
+ last->next = t;
+ else
+ *list = t;
+ }
+
+ /* Now t is the SPAM_LIST* that we want to modify. It is prepared. */
+ t->template = strdup(templ);
+
+ /* find highest match number in template string */
+ t->nmatch = 0;
+ for (p = templ; *p;)
+ {
+ if (*p == '%')
+ {
+ n = atoi(++p);
+ if (n > t->nmatch)
+ t->nmatch = n;
+ while (*p && isdigit((int)*p))
+ ++p;
+ }
+ else
+ ++p;
+ }
+ t->nmatch++; /* match 0 is always the whole expr */
+
+ return 0;
+}
+
+static int remove_from_spam_list (SPAM_LIST **list, const char *pat)
+{
+ SPAM_LIST *spam, *prev;
+ int nremoved = 0;
+
+ /* Being first is a special case. */
+ spam = *list;
+ if (spam->rx && !mutt_strcmp(spam->rx->pattern, pat))
+ {
+ *list = spam->next;
+ mutt_free_regexp(&spam->rx);
+ safe_free(&spam->template);
+ safe_free(&spam);
+ return 1;
+ }
+
+ prev = spam;
+ for (spam = prev->next; spam;)
+ {
+ if (!mutt_strcmp(spam->rx->pattern, pat))
+ {
+ prev->next = spam->next;
+ mutt_free_regexp(&spam->rx);
+ safe_free(&spam->template);
+ safe_free(&spam);
+ spam = prev->next;
+ ++nremoved;
+ }
+ else
+ spam = spam->next;
+ }
+
+ return nremoved;
+}
+
static void remove_from_list (LIST **l, const char *str)
{
LIST *p, *last = NULL;
@@ -502,6 +607,76 @@
while (MoreArgs (s));
return 0;
+}
+
+static int parse_spam_list (BUFFER *buf, BUFFER *s, unsigned long data, BUFFER
*err)
+{
+ BUFFER templ;
+
+ memset(&templ, 0, sizeof(templ));
+
+ /* Insist on at least one parameter */
+ if (!MoreArgs(s))
+ {
+ if (data == M_SPAM)
+ strfcpy(err->data, _("spam: no matching pattern"), err->dsize);
+ else
+ strfcpy(err->data, _("nospam: no matching pattern"), err->dsize);
+ return -1;
+ }
+
+ /* Extract the first token, a regexp */
+ mutt_extract_token (buf, s, 0);
+
+ /* data should be either M_SPAM or M_NOSPAM. M_SPAM is for spam commands. */
+ if (data == M_SPAM)
+ {
+ /* If there's a second parameter, it's a template for the spam tag. */
+ if (MoreArgs(s))
+ {
+ mutt_extract_token (&templ, s, 0);
+
+ /* Add to the spam list. */
+ if (add_to_spam_list (&SpamList, buf->data, templ.data, err) != 0)
+ return -1;
+ }
+
+ /* If not, try to remove from the nospam list. */
+ else
+ {
+ remove_from_rx_list(&NoSpamList, buf->data);
+ }
+
+ return 0;
+ }
+
+ /* M_NOSPAM is for nospam commands. */
+ else if (data == M_NOSPAM)
+ {
+ /* nospam only ever has one parameter. */
+
+ /* "*" is a special case. */
+ if (!mutt_strcmp(buf->data, "*"))
+ {
+ mutt_free_spam_list (&SpamList);
+ mutt_free_rx_list (&NoSpamList);
+ return 0;
+ }
+
+ /* If it's on the spam list, just remove it. */
+ if (remove_from_spam_list(&SpamList, buf->data) != 0)
+ return 0;
+
+ /* Otherwise, add it to the nospam list. */
+ if (add_to_rx_list (&NoSpamList, buf->data, REG_ICASE, err) != 0)
+ return -1;
+
+ return 0;
+ }
+
+ /* This should not happen. */
+ strfcpy(err->data, "This is no good at all.", err->dsize);
+ return -1;
}
static int parse_unlist (BUFFER *buf, BUFFER *s, unsigned long data, BUFFER
*err)
diff -ur mutt-1.5.6-base/init.h mutt-1.5.6-hormel.3/init.h
--- mutt-1.5.6-base/init.h Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/init.h Wed Jul 14 20:32:46 2004
@@ -901,6 +901,7 @@
** .dt %E .dd number of messages in current thread
** .dt %f .dd entire From: line (address + real name)
** .dt %F .dd author name, or recipient name if the message is from you
+ ** .dt %H .dd spam attribute(s) of this message
** .dt %i .dd message-id of the current message
** .dt %l .dd number of lines in the message (does not work with maildir,
** mh, and possibly IMAP folders)
@@ -2314,6 +2315,7 @@
** . mailbox-order (unsorted)
** . score
** . size
+ ** . spam
** . subject
** . threads
** . to
@@ -2379,6 +2381,15 @@
** the message whether or not this is the case, as long as the
** non-``$$reply_regexp'' parts of both messages are identical.
*/
+ { "spam_separator", DT_STR, R_NONE, UL &SpamSep, UL "," },
+ /*
+ ** .pp
+ ** ``$spam_separator'' controls what happens when multiple spam headers
+ ** are matched: if unset, each successive header will overwrite any
+ ** previous matches value for the spam label. If set, each successive
+ ** match will append to the previous, using ``$spam_separator'' as a
+ ** separator.
+ */
{ "spoolfile", DT_PATH, R_NONE, UL &Spoolfile, 0 },
/*
** .pp
@@ -2678,6 +2689,7 @@
{ "threads", SORT_THREADS },
{ "to", SORT_TO },
{ "score", SORT_SCORE },
+ { "spam", SORT_SPAM },
{ NULL, 0 }
};
@@ -2696,6 +2708,7 @@
*/
{ "to", SORT_TO },
{ "score", SORT_SCORE },
+ { "spam", SORT_SPAM },
{ NULL, 0 }
};
@@ -2728,6 +2741,7 @@
static int parse_list (BUFFER *, BUFFER *, unsigned long, BUFFER *);
static int parse_rx_list (BUFFER *, BUFFER *, unsigned long, BUFFER *);
+static int parse_spam_list (BUFFER *, BUFFER *, unsigned long, BUFFER *);
static int parse_unlist (BUFFER *, BUFFER *, unsigned long, BUFFER *);
static int parse_rx_unlist (BUFFER *, BUFFER *, unsigned long, BUFFER *);
@@ -2793,6 +2807,8 @@
{ "send-hook", mutt_parse_hook, M_SENDHOOK },
{ "set", parse_set, 0 },
{ "source", parse_source, 0 },
+ { "spam", parse_spam_list, M_SPAM },
+ { "nospam", parse_spam_list, M_NOSPAM },
{ "subscribe", parse_subscribe, 0 },
{ "toggle", parse_set, M_SET_INV },
{ "unalias", parse_unalias, 0 },
diff -ur mutt-1.5.6-base/mutt.h mutt-1.5.6-hormel.3/mutt.h
--- mutt-1.5.6-base/mutt.h Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/mutt.h Wed Jul 14 20:59:15 2004
@@ -220,6 +220,7 @@
M_ID,
M_BODY,
M_HEADER,
+ M_HORMEL,
M_WHOLE_MSG,
M_SENDER,
M_MESSAGE,
@@ -312,6 +313,9 @@
#define M_SEL_MULTI (1<<1)
#define M_SEL_FOLDER (1<<2)
+/* flags for parse_spam_list */
+#define M_SPAM 1
+#define M_NOSPAM 2
/* boolean vars */
enum
@@ -405,6 +409,7 @@
OPTSIGDASHES,
OPTSIGONTOP,
OPTSORTRE,
+ OPTSPAMSEP,
OPTSTATUSONTOP,
OPTSTRICTTHREADS,
OPTSUSPEND,
@@ -512,10 +517,20 @@
struct rx_list_t *next;
} RX_LIST;
+typedef struct spam_list_t
+{
+ REGEXP *rx;
+ int nmatch;
+ char *template;
+ struct spam_list_t *next;
+} SPAM_LIST;
+
#define mutt_new_list() safe_calloc (1, sizeof (LIST))
#define mutt_new_rx_list() safe_calloc (1, sizeof (RX_LIST))
+#define mutt_new_spam_list() safe_calloc (1, sizeof (SPAM_LIST))
void mutt_free_list (LIST **);
void mutt_free_rx_list (RX_LIST **);
+void mutt_free_spam_list (SPAM_LIST **);
int mutt_matches_ignore (const char *, LIST *);
/* add an element to a list */
@@ -550,6 +565,7 @@
char *supersedes;
char *date;
char *x_label;
+ BUFFER *spam;
LIST *references; /* message references (in reverse order) */
LIST *in_reply_to; /* in-reply-to header content */
LIST *userhdrs; /* user defined headers */
diff -ur mutt-1.5.6-base/muttlib.c mutt-1.5.6-hormel.3/muttlib.c
--- mutt-1.5.6-base/muttlib.c Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/muttlib.c Sun Jul 11 02:16:57 2004
@@ -1283,6 +1283,60 @@
sleep (s);
}
+/*
+ * Creates and initializes a BUFFER*. If passed an existing BUFFER*,
+ * just initializes. Frees anything already in the buffer.
+ *
+ * Disregards the 'destroy' flag, which seems reserved for caller.
+ * This is bad, but there's no apparent protocol for it.
+ */
+BUFFER * mutt_buffer_init(BUFFER *b)
+{
+ if (!b)
+ {
+ b = malloc(sizeof(BUFFER));
+ if (!b)
+ return NULL;
+ }
+ else
+ {
+ safe_free(b->data);
+ }
+ memset(b, 0, sizeof(BUFFER));
+ return b;
+}
+
+/*
+ * Creates and initializes a BUFFER*. If passed an existing BUFFER*,
+ * just initializes. Frees anything already in the buffer. Copies in
+ * the seed string.
+ *
+ * Disregards the 'destroy' flag, which seems reserved for caller.
+ * This is bad, but there's no apparent protocol for it.
+ */
+BUFFER * mutt_buffer_from(BUFFER *b, char *seed)
+{
+ int n;
+
+ if (!seed)
+ return NULL;
+
+ b = mutt_buffer_init(b);
+ b->data = strdup(seed);
+ b->dsize = strlen(seed);
+ b->dptr = (char *)((int)b->data + b->dsize);
+ return b;
+}
+
+void mutt_buffer_free(BUFFER **b)
+{
+ if (!b)
+ return;
+ if ((*b)->data)
+ safe_free(&((*b)->data));
+ safe_free(b);
+}
+
void mutt_buffer_addstr (BUFFER* buf, const char* s)
{
mutt_buffer_add (buf, s, mutt_strlen (s));
@@ -1379,6 +1433,21 @@
}
}
+void mutt_free_spam_list (SPAM_LIST **list)
+{
+ SPAM_LIST *p;
+
+ if (!list) return;
+ while (*list)
+ {
+ p = *list;
+ *list = (*list)->next;
+ mutt_free_regexp (&p->rx);
+ safe_free(&p->template);
+ FREE (&p);
+ }
+}
+
int mutt_match_rx_list (const char *s, RX_LIST *l)
{
if (!s) return 0;
@@ -1388,6 +1457,57 @@
if (regexec (l->rx->rx, s, (size_t) 0, (regmatch_t *) 0, (int) 0) == 0)
{
dprint (5, (debugfile, "mutt_match_rx_list: %s matches %s\n", s,
l->rx->pattern));
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+int mutt_match_spam_list (const char *s, SPAM_LIST *l, char *text, int x)
+{
+ static regmatch_t *pmatch = NULL;
+ static int nmatch = 0;
+ int i, n, tlen;
+ char *p;
+
+ if (!s) return 0;
+
+ tlen = 0;
+
+ for (; l; l = l->next)
+ {
+ /* If this pattern needs more matches, expand pmatch. */
+ if (l->nmatch > nmatch)
+ {
+ safe_realloc ((void**) &pmatch, l->nmatch * sizeof(regmatch_t));
+ nmatch = l->nmatch;
+ }
+
+ /* Does this pattern match? */
+ if (regexec (l->rx->rx, s, (size_t) l->nmatch, (regmatch_t *) pmatch,
(int) 0) == 0)
+ {
+ dprint (5, (debugfile, "mutt_match_spam_list: %s matches %s\n", s,
l->rx->pattern));
+ dprint (5, (debugfile, "mutt_match_spam_list: %d subs\n",
l->rx->rx->re_nsub));
+
+ /* Copy template into text, with substitutions. */
+ for (p = l->template; *p;)
+ {
+ if (*p == '%')
+ {
+ n = atoi(++p); /* find pmatch index */
+ while (isdigit(*p))
+ ++p; /* skip subst token */
+ for (i = pmatch[n].rm_so; (i < pmatch[n].rm_eo) && (tlen < x); i++)
+ text[tlen++] = s[i];
+ }
+ else
+ {
+ text[tlen++] = *p++;
+ }
+ }
+ text[tlen] = '\0';
+ dprint (5, (debugfile, "mutt_match_spam_list: \"%s\"\n", text));
return 1;
}
}
diff -ur mutt-1.5.6-base/parse.c mutt-1.5.6-hormel.3/parse.c
--- mutt-1.5.6-base/parse.c Wed Nov 5 03:41:33 2003
+++ mutt-1.5.6-hormel.3/parse.c Sun Jul 11 02:16:57 2004
@@ -1267,6 +1267,7 @@
long loc;
int matched;
size_t linelen = LONG_STRING;
+ char buf[LONG_STRING+1];
if (hdr)
{
@@ -1308,6 +1309,49 @@
fseek (f, loc, 0);
break; /* end of header */
+ }
+
+ *buf = '\0';
+
+ if (mutt_match_spam_list(line, SpamList, buf, sizeof(buf)))
+ {
+ if (!mutt_match_rx_list(line, NoSpamList))
+ {
+
+ /* if spam tag already exists, figure out how to amend it */
+ if (e->spam && *buf)
+ {
+ /* If SpamSep defined, append with separator */
+ if (SpamSep)
+ {
+ mutt_buffer_addstr(e->spam, SpamSep);
+ mutt_buffer_addstr(e->spam, buf);
+ }
+
+ /* else overwrite */
+ else
+ {
+ e->spam->dptr = e->spam->data;
+ *e->spam->dptr = '\0';
+ mutt_buffer_addstr(e->spam, buf);
+ }
+ }
+
+ /* spam tag is new, and match expr is non-empty; copy */
+ else if (!e->spam && *buf)
+ {
+ e->spam = mutt_buffer_from(NULL, buf);
+ }
+
+ /* match expr is empty; plug in null string if no existing tag */
+ else if (!e->spam)
+ {
+ e->spam = mutt_buffer_from(NULL, "");
+ }
+
+ if (e->spam && e->spam->data)
+ dprint(5, (debugfile, "p822: spam = %s\n", e->spam->data));
+ }
}
*p = 0;
diff -ur mutt-1.5.6-base/pattern.c mutt-1.5.6-hormel.3/pattern.c
--- mutt-1.5.6-base/pattern.c Wed Nov 5 03:41:33 2003
+++ mutt-1.5.6-hormel.3/pattern.c Sun Jul 11 02:16:57 2004
@@ -58,6 +58,7 @@
{ 'g', M_CRYPT_SIGN, 0, NULL },
{ 'G', M_CRYPT_ENCRYPT, 0, NULL },
{ 'h', M_HEADER, M_FULL_MSG, eat_regexp },
+ { 'H', M_HORMEL, 0, eat_regexp },
{ 'i', M_ID, 0, eat_regexp },
{ 'k', M_PGP_KEY, 0, NULL },
{ 'L', M_ADDRESS, 0, eat_regexp },
@@ -1045,6 +1046,8 @@
return (pat->not ^ ((h->security & APPLICATION_PGP) && (h->security &
PGPKEY)));
case M_XLABEL:
return (pat->not ^ (h->env->x_label && regexec (pat->rx,
h->env->x_label, 0, NULL, 0) == 0));
+ case M_HORMEL:
+ return (pat->not ^ (h->env->spam && h->env->spam->data && regexec
(pat->rx, h->env->spam->data, 0, NULL, 0) == 0));
case M_DUPLICATED:
return (pat->not ^ (h->thread && h->thread->duplicate_thread));
}
diff -ur mutt-1.5.6-base/protos.h mutt-1.5.6-hormel.3/protos.h
--- mutt-1.5.6-base/protos.h Sun Feb 1 11:15:17 2004
+++ mutt-1.5.6-hormel.3/protos.h Sun Jul 11 02:16:57 2004
@@ -32,6 +32,9 @@
HEADER *, format_flag);
int mutt_extract_token (BUFFER *, BUFFER *, int);
+BUFFER * mutt_buffer_init (BUFFER *);
+BUFFER * mutt_buffer_from (BUFFER *, char *);
+void mutt_buffer_free(BUFFER **);
void mutt_buffer_add (BUFFER*, const char*, size_t);
void mutt_buffer_addstr (BUFFER*, const char*);
void mutt_buffer_addch (BUFFER*, char);
@@ -291,6 +294,7 @@
int mutt_is_valid_mailbox (const char *);
int mutt_lookup_mime_type (BODY *, const char *);
int mutt_match_rx_list (const char *, RX_LIST *);
+int mutt_match_spam_list (const char *, SPAM_LIST *, char *, int);
int mutt_messages_in_thread (CONTEXT *, HEADER *, int);
int mutt_multi_choice (char *prompt, char *letters);
int mutt_needs_mailcap (BODY *);
diff -ur mutt-1.5.6-base/sort.c mutt-1.5.6-hormel.3/sort.c
--- mutt-1.5.6-base/sort.c Sun Feb 1 11:10:58 2004
+++ mutt-1.5.6-hormel.3/sort.c Sun Jul 11 23:53:59 2004
@@ -149,6 +149,57 @@
return (SORTCODE ((*ha)->index - (*hb)->index));
}
+int compare_spam (const void *a, const void *b)
+{
+ HEADER **ppa = (HEADER **) a;
+ HEADER **ppb = (HEADER **) b;
+ char *aptr, *bptr;
+ int ahas, bhas;
+ int result = 0;
+
+ /* Firstly, require spam attributes for both msgs */
+ /* to compare. Determine which msgs have one. */
+ ahas = (*ppa)->env && (*ppa)->env->spam;
+ bhas = (*ppb)->env && (*ppb)->env->spam;
+
+ /* If one msg has spam attr but other does not, sort the one with first. */
+ if (ahas && !bhas)
+ return (SORTCODE(1));
+ if (!ahas && bhas)
+ return (SORTCODE(-1));
+
+ /* Else, if neither has a spam attr, presume equality. Fall back on aux. */
+ if (!ahas && !bhas)
+ {
+ AUXSORT(result, a, b);
+ return (SORTCODE(result));
+ }
+
+
+ /* Both have spam attrs. */
+
+ /* preliminary numeric examination */
+ result = (strtoul((*ppa)->env->spam->data, &aptr, 10) -
+ strtoul((*ppb)->env->spam->data, &bptr, 10));
+
+ /* If either aptr or bptr is equal to data, there is no numeric */
+ /* value for that spam attribute. In this case, compare lexically. */
+ if ((aptr == (*ppa)->env->spam->data) || (bptr == (*ppb)->env->spam->data))
+ return (SORTCODE(strcmp(aptr, bptr)));
+
+ /* Otherwise, we have numeric value for both attrs. If these values */
+ /* are equal, then we first fall back upon string comparison, then */
+ /* upon auxiliary sort. */
+ if (result == 0)
+ {
+ result = strcmp(aptr, bptr);
+ if (result == 0)
+ AUXSORT(result, a, b);
+ }
+
+ return (SORTCODE(result));
+}
+
sort_t *mutt_get_sort_func (int method)
{
switch (method & SORT_MASK)
@@ -169,6 +220,8 @@
return (compare_to);
case SORT_SCORE:
return (compare_score);
+ case SORT_SPAM:
+ return (compare_spam);
default:
return (NULL);
}
diff -ur mutt-1.5.6-base/sort.h mutt-1.5.6-hormel.3/sort.h
--- mutt-1.5.6-base/sort.h Mon Jan 6 04:25:35 2003
+++ mutt-1.5.6-hormel.3/sort.h Sun Jul 11 21:58:17 2004
@@ -29,9 +29,12 @@
#define SORT_ADDRESS 11
#define SORT_KEYID 12
#define SORT_TRUST 13
-#define SORT_MASK 0xf
-#define SORT_REVERSE (1<<4)
-#define SORT_LAST (1<<5)
+#define SORT_SPAM 14
+/* dgc: Sort & SortAux are shorts, so I'm bumping these bitflags up from
+ * bits 4 & 5 to bits 8 & 9 to make room for more sort keys in the future. */
+#define SORT_MASK 0xff
+#define SORT_REVERSE (1<<8)
+#define SORT_LAST (1<<9)
typedef int sort_t (const void *, const void *);
sort_t *mutt_get_sort_func (int);