<<< Date Index >>>     <<< Thread Index >>>

Re: [Mutt] #2995: Mutt is folding subject line folds using a tab,



Hi,

* Mutt wrote:

>  I don't see why this is so difficult.  Granted the function is hairy
>  (which is a problem in and of itself), but it shouldn't be difficult
>  to simply insert a newline before some whitespace.  However, that's
>  not what it does... it does this:
> 
>        /* Space padding, but only if necessary */
>        if (!first && *cp != '\t' && *cp != ' ')
>        {
>          if (fputc ('\t', fp) == EOF)
>            return -1;
>          col += 8 - (col % 8);
>        }
>      }
> 
>  If you read RFC (2)822, this seems clearly wrong.  The only character
>  you are allowed to insert for line wrapping is a newline, which MUST
>  be inserted before whitespace.

ACK. The algorithm to me seems quite simple and clear (simply insert
CRLF before the word that would make line too long, just as you
described) which is why I don't understand why the function is that
complicated in the first place. Whenever I stumble about code that seems
to complicated for me, I assume it's there for a reason.

> Looking at this function, it appears to me that it has at least 4
> fairly serious problems:

>  1. It's long and hairy.  Everyone who's looked at it has said they
>  would be loathe to try to modify it.  This is a pretty good sign that
>  the function needs to be rewritten and probably broken down into
>  smaller logical pieces.

ACK. I still hesitate to touch it because automated testing is hard in
mutt's current state. Sure, the current code is broken but a solution
should be well-tested before it appears in some official release.

>  2. It uses iswspace() to determine if the next character is a
>  whitespace character.  This is almost certainly wrong; RFC 2822
>  defines WSP in headers as either ' ' or '\t'.  No other characters are
>  considered WSP.  However, iswspace() probably evaluates to true in
>  many other instances in various locales.  Probably a function
>  is_rfc822_wsp() is needed (or a macro, though I really hate them).

ACK. See #2956, iswspace()/isspace() is another bigger issue that needs
to be solved. I vote for simply testing for the characters in question
since the allowed tokens are defined formally.

>  It does not say "a CRLF and arbitrary additional WSP may be inserted
>  before any WSP."   It absolutely doesn't say that you can change
>  whitespace with arbitrary other whitespace.  It says you can insert a
>  CRLF.  That's it.  Wherever WSP exists in the header, you can insert a
>  CRLF before it.  That's not what mutt is doing, and what mutt
>  interacts badly with other software that folds and unfolds correctly.

ACK.

>  This code is wrong...  For the love of all that is good and decent in
>  the world, please fix it, or let someone else fix it. :)

Sure, just go ahead... :) Any help in this area is highly appreciated.

>  FWIW, if I wrote this (which I could possibly do, maybe as soon as
>  this weekend), I think the algorithm should (very) roughly be:
> 
>    1. start at the beginning of value (which is also called cp)
>    2. get the next token (either WSP or MB-character), with the number
>       bytes consumed from value, if there's one to get.
>    3. If the token causes the line length to be greater than wraplen,
>       and there are still chars left in value, or if there are no more
>       characters left in value, insert '\n'.
>    4. Add the token to the ouput buffer (if there was one).
>    5. move the pointer that points to value up by the number of bytes
>       consumed.
>    6. If there are no more characters, print the buffer.

Yes, combined with a flag for display/sending mail.

Rocco