<<< Date Index >>>     <<< Thread Index >>>

Re: bug#1876: mutt-1.5.6i: Mutt doesn't handle invalid characters when replying to a mail



    [ BTS to mutt-dev forwarding seems broken: Half mails ]
    [don't appear (as your last 3). Hence the CC mutt-dev.]

 On Saturday, May 29, 2004 at 11:24:12 AM +0200, Vincent Lefèvre wrote:

    [raw headers]
> On 2004-05-27 21:45:02 +0200, Alain Bench wrote:
>> L1 and L2 are definitely undistinguishable when unlabelled.
> I don't want the assumed_charset here, but question marks (or similar
> valid characters).

    What do you mean by « similar valid characters »?


> how could I *automatically* know the correct encoding?

    No way. Perhaps some form of external hint can help guessing with
good probability: Domain, mailing list, MUA, whatever. But guessing is
not knowing.


 On Monday, May 31, 2004 at 11:56:06 AM +0200, Vincent Lefèvre wrote:

    [1252 mislabelled Latin-1]
> On 2004-05-22 16:00:11 +0200, Alain Bench wrote:
>> Is there really a problem? Here I get the expected question marks
> It is a problem, because I get invalid characters (instead of question
> marks), and emacs is confused by them.

    Understood. This depends on terminal's $charset. If iconv can't
convert the char (from MIME charset to $charset), you get question mark.
If iconv can, you get the converted char in editor.

    Latin-1 chars in the zone 0x80-0x9F are not invalid, but defined as
control chars. And they are perfectly convertable to UTF-8. Example with
Latin-1 0x80 U+0080 PADDING CHARACTER (PAD).

| $ echo -ne "\200" | iconv -f iso-8859-1 -t utf-8 | hex
| C2 80

    And glibc-2.3.2/localedata/charmaps/UTF-8 table confirms:

| <U0080>     /xc2/x80     PADDING CHARACTER (PAD)

    Iconv tells Mutt it's convertable. Mutt gives the valid UTF-8
converted char to the editor. Why is editor confused?


> iso-8859-1 isn't windows-1252; the problem needs to be reported to the
> sender.

    I fully agree. You seem to consider Mutt as a problem detector. Some
may prefer to use Mutt as an efficient MUA, workarounding or even hiding
probs as much as possible, and use another tool to detect problems. The
nice thing in Mutt is it mostly permits both approaches. Sorry to have
sent you perhaps valid but misdirected advices (MS .HLP bad influence).


    [raw headers]
>> with $assumed_charset=us-ascii a raw byte 0xE9 in "Subject:" should
>> be displayed exactly as an "=?us-ascii?q?=E9?=": As a question mark.
>> But you're right, raw unconvertable bytes in headers are just printed
>> raw unconverted: Bad.
> This is even worse: Mutt stops the parsing at the first invalid
> character (at least when this is an ISO-8859-1 character and Mutt was
> started with UTF-8 locales), and in the case of an invalid "From:"
> header, an incorrect address is generated when replying, because of
> that.

    Confirmed. But someone has made a yet unreleased patch for these
problems that works perfectly so far: Unconvertables from first
$assumed_charset are ?-masked, and thus no more parsing problems. And
this doesn't break possibility to set multiple charsets.


Bye!    Alain.
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?