Re: [Mutt] #2995: Mutt is folding subject line folds using a tab,
#2995: Mutt is folding subject line folds using a tab, which appears to be
Reporter: frnkblk | Owner: mutt-dev
Type: defect | Status: new
Priority: minor | Milestone: 1.6
Component: mutt | Version: 1.5.17
Resolution: | Keywords: header folding white-space
Comment(by Derek Martin):
It is hard to believe this ticket has caused as much difficulty as it
has... or that this bug has existed for almost exactly two years. :)
On Tue, Mar 17, 2009 at 02:18:59PM -0000, Mutt wrote:
This function was added in Mutt-1.5.14, which was released in April of
2007. As far as I can tell, Mutt did not insert a tab before that.
This bug was reported in November of 2007, roughly 7 months later.
Rocco asked why no one had noticed this before then... I think the
most likely explanation is a combination of the fact that until
roughly then, (relatively) not many people were running those releases
of Mutt, with the fact that most humans tend to write short subject
lines (the subject is the header where this matters most, AFAICT). I
didn't spend the time to track it down, but I'd be willing to bet that
the bug was reported shortly after Debian took a version of mutt
containing this change...
I don't see why this is so difficult. Granted the function is hairy
(which is a problem in and of itself), but it shouldn't be difficult
to simply insert a newline before some whitespace. However, that's
not what it does... it does this:
/* Space padding, but only if necessary */
if (!first && *cp != '\t' && *cp != ' ')
if (fputc ('\t', fp) == EOF)
col += 8 - (col % 8);
If you read RFC (2)822, this seems clearly wrong. The only character
you are allowed to insert for line wrapping is a newline, which MUST
be inserted before whitespace. It doesn't say anything about
*changing* existing whitespace. Looking at this function, it appears
to me that it has at least 4 fairly serious problems:
1. It's long and hairy. Everyone who's looked at it has said they
would be loathe to try to modify it. This is a pretty good sign that
the function needs to be rewritten and probably broken down into
smaller logical pieces.
2. It uses iswspace() to determine if the next character is a
whitespace character. This is almost certainly wrong; RFC 2822
defines WSP in headers as either ' ' or '\t'. No other characters are
considered WSP. However, iswspace() probably evaluates to true in
many other instances in various locales. Probably a function
is_rfc822_wsp() is needed (or a macro, though I really hate them).
3. It is inserting a tab, where there was none, and where no RFC says
you should insert one. It is also *removing a space* in many cases.
Who said you could do that?
4. If the newline which was inserted for line wrapping was inserted
correctly in the first place, this code should NEVER execute, since
the next character after the inserted newline (which was from the
original header) MUST be either ' ' or '\t'. RFC 2822 says this:
Each header field is logically a single line of characters
comprising the field name, the colon, and the field body. For
convenience however, and to deal with the 998/78 character
limitations per line, the field body portion of a header field can
be split into a multiple line representation; this is called
"folding".The general rule is that wherever this standard allows
for folding white space (not simply WSP characters), a CRLF may be
inserted before any WSP.
It does not say "a CRLF and arbitrary additional WSP may be inserted
before any WSP." It absolutely doesn't say that you can change
whitespace with arbitrary other whitespace. It says you can insert a
CRLF. That's it. Wherever WSP exists in the header, you can insert a
CRLF before it. That's not what mutt is doing, and what mutt
interacts badly with other software that folds and unfolds correctly.
FWIW Rocco's comments in earlier ticket updates about spaces being
removed between encoded words in RFC 2047 are interesting, but
irrelevant. RFC 2047 says white space between two encoded words must
be ignored to account for the limit of 75 characters of any one
encoded word, i.e. you need to be able to put them together into a
single decoded encoded-word, if that is what they are to represent.
Space between an encoded word and any unencoded word must not be
ignored, as it is genuine space. But we are not dealing with encoded
words in this ticket, so RFC 2047 is not relevant.
This code is wrong... For the love of all that is good and decent in
the world, please fix it, or let someone else fix it. :)
FWIW, if I wrote this (which I could possibly do, maybe as soon as
this weekend), I think the algorithm should (very) roughly be:
1. start at the beginning of value (which is also called cp)
2. get the next token (either WSP or MB-character), with the number
bytes consumed from value, if there's one to get.
3. If the token causes the line length to be greater than wraplen,
and there are still chars left in value, or if there are no more
characters left in value, insert '\n'.
4. Add the token to the ouput buffer (if there was one).
5. move the pointer that points to value up by the number of bytes
6. If there are no more characters, print the buffer.
I think the function should be divided in two: one that does step 2
above (essentially the stuff in the big if block in the exiting
function), and one that does the rest. The block of code I quoted
above should be removed entirely.
A couple of other things to note:
It looks to me like calling foldingstrfcpy() may be meant to fix
this problem. However, the '\t' is written directly to the file,
instead of into the buffer, so it can't fix it. Otherwise, it's not
clear to me why the function is called (strncpy() would seem to
be just as good here). In fact, this seems to do the wrong thing; any
tabs which the user *wanted* (ignoring the fact that this will
essentially never be true) will be converted to spaces.
Someone commented that this code may be shared by both the compose and
the display code. If that's the case, fixing it right probably
requires an option to the function that controls whether the
folded line is indented with a tab or not, for pretty display.
If we're in an encoded word, but we find a '\n' (which is not allowed
in an encoded word), we just assume that we were not really in an
encoded word. This seems possibly wrong, though maybe it isn't. At
the least, it seems like maybe mutt ought to warn the user that the
header may be malformed (though I suppose if it is, they'll figure
it out). Is it possible this could be a worse bug? Space is also not
allowed in an encoded word, but there is no checking of this condition
in that section of the function...
Also while I can see why one might want to pass in the header's value
as a const, there's no need to create the alias cp to value. You can
still increment the pointer even though the stuff it points to is
constant. Once the alias is created, the variable value is never
referred to again... In fact, the way the code is written is
detrimental: Generally, if you try to modify *cp the compiler will
not complain, but your program will either segfault or let you do it!
If you try to modify *value, you will get a compiler error. [At
least, this is how gcc behaves on my system.]
Lastly the variables last and wrapped appear to be initialized
twice; once in the variable declaration section, and again a bit
further down in the code.
Ticket URL: <http://dev.mutt.org/trac/ticket/2995#comment:>
The Mutt mail user agent