<<< Date Index >>>     <<< Thread Index >>>

Re: bug#1876: mutt-1.5.6i: Mutt doesn't handle invalid characters when replying to a mail



On 2004-10-14 09:38:07 +0200, Alain Bench wrote:
>     [match any unknown charset]
[...]
>     But: The first day some new version of official charset appears, be
> it UTF-9 or ISO-8859-17, you may prefer to notice it so you can upgrade
> iconv, and turn unknown to new one. That argument is not against the
> feature, but for it being optional.

Well, this is an argument in favor of the feature, since aliasing
any unknown charset to us-ascii will mask any non-ascii character
and prevent Mutt from doing incorrect things like mixing up two
similar charsets (such as iso-8859-1 and iso-8859-15) though the
user isn't aware of that.

>     OTOH what real every day problem would it solve? Unknown charsets
> are not so numerous, and most cases can be dealt with a pile of hooks.
> The day a new label appears, you notice it, and add a charset-hook: A
> constraint, but doable. A feature with so little benefit has limited
> chances to be accepted.

I agree that this can be done with hooks, but without these hooks,
Mutt should never propagate incorrect sequences, such as iso-8859-1
characters in utf-8 text. Moreover adding a hook requires to restart
Mutt to reread the .muttrc file. This may be annoying.

>     [bytes 80-9F with label Latin-1]

>     What chars would be masked? U+0080-U+009F only, or all controls? If
> the later, some need to be excluded: Which ones? User configurable?

All controls except the ones used by the mail standard. Also, when
replying to a message, it would be a good idea to convert TABs to
space characters *before* the message is quoted, so that the message
is really indented by a fixed amount of columns.

> > user would not propagate broken chars in his reply.
> 
>     They are not broken, but annoyingly valid.

They are valid, but unwanted (as control characters). This is what
I meant by "broken".

>     BTW I tried three editors: One displays "\200", the next a
> replacement dot, and last "~@". All save back original control char.

Probably because these editors are not specifically mail editors.
A generic editor shouldn't corrupt the initial file.

Regards,

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA