<<< Date Index >>>     <<< Thread Index >>>

Re: sort-mailbox by spam tag score sorting strangeness



David Champion wrote:
> When the spam marking and sorting was implemented, I had not used
> SpamAssassin except for testing, and it never occurred to me that a
> message might ever receive a numeric spam score less than 0.  (I'm still
> not sure why that makes sense, but it's not for me to rule on -- the
> whole point of mutt's spam design is to deal with the spam software that
> exists without imposing rules and behaviors.)

Well, numbers of just numbers and while SpamAssassin triggers at 5
there is nothing that prevents a trigger at 0 using positive and
negative numbers.  I thought some Bayes engines did that.  Using zero
as the trigger point actually makes the most sense to me but that is
neither here nor there.

For SA in particular most well known published rules give positive
points only.  If the message triggers a rule then points are added.
If negative points were awarded to a rule then spammers would exploit
that case.  (This is actually as they have done with such things as
the old Habeas rule and things like that.  It just did not work in
practice.)  So all negative rules now need to be either very small or
unknown to the world.  Local rules are one example.  The Bayes engine
being dynamic is another.  In neither case can the spammer deduce a
pattern that will trigger the negative rule.  But still the negative
points are useful, especially in the case of dynamic engines such as
the Bayes.

> But the result is that the spam sorting code only sorts positive
> numbers as numbers.  Anything else is a string.

Okay.  That explains it.  That makes sense.

> The spam stuff tries to accomodate a variety of counterspam systems,
> including those which give non-numeric spam scores.  As such it actually
> tries to sort first numerically and then alphabetically.  That's why
> you're seeing this behavior only in certain number ranges -- negative
> numbers are in fact getting alpha-sorted against the lexical value of
> whatever they're compared to, while positive numbers are numerically
> sorted only against other positive numbers, and lexically sorted against
> negative ones.  So in the final order, if two adjacent numbers are
> both positive, they'll arrange themselves numerically, but the full
> list is fundamentally lexical since it contains negative values.  It's
> complicated.

A clever heuristic.  It did not quite work out here.  But just the
same I appreciate the effort you put into trying to make it work
across multiple different types of input.

> There's no way to fix this in configuration -- it requires a code
> change.  If you're comfortable recompiling mutt yourself, the change is
> trivial: in sort.c, change both occurrences of "strtoul" to "strtol".

This did not work to fix the problem for me.  I applied the following
patch to the mutt-1.5.13 source base from Debian sarge-backports.  But
unfortunately there was no visible change to the sorting.  Perhaps
more is needed?  Should I file a flee or put something in gnats
database?

Thanks anyway!

Bob

--- original/sort.c 2006-11-21 22:08:21.000000000 -0700
+++ new/sort.c      2006-11-21 22:31:58.000000000 -0700
@@ -183,8 +183,8 @@
   /* Both have spam attrs. */
 
   /* preliminary numeric examination */
-  result = (strtoul((*ppa)->env->spam->data, &aptr, 10) -
-            strtoul((*ppb)->env->spam->data, &bptr, 10));
+  result = (strtol((*ppa)->env->spam->data, &aptr, 10) -
+            strtol((*ppb)->env->spam->data, &bptr, 10));
 
   /* If either aptr or bptr is equal to data, there is no numeric    */
   /* value for that spam attribute. In this case, compare lexically. */