Re: $assumed_charset settings (was: special chars)
Salut Alain,
On 2007-03-25 20:21:35 +0200, Alain Bench wrote:
> Bonjour Vincent,
>
> On Sunday, March 25, 2007 at 4:28:36 +0200, Vincent Lefèvre wrote:
>
> > On 2007-03-24 16:05:08 +0100, Alain Bench wrote:
> >> Setting UTF-8 after ISO-8859-1 is useless. Any string is always
> >> valid Latin-1.
> > Shouldn't characters 128-159 be regarded as invalid?
>
> No, I don't think so, for a number of reasons:
>
> - Mutt doesn't decide valid/invalid; It asks to iconv, which replies
> that Latin-1 128-159 are valid and convertable.
>
> - 128-159 are (part of) printable characters in some charsets.
Yes, I meant in ISO-8859-1, as being *non-printable* characters.
> - If avoidable, we prefer to not hardcode special cases in Mutt.
I agree, but testing printability should be sufficient.
> - We would not get a clean benefit anyway: Many UTF-8 strings would
> still be wrongly detected as Latin-1. Not all, but many.
>
> - To properly distinguish UTF-8 from a 256 chars charset (like
> Latin-1), we really need to set UTF-8 first. In this order, invalidating
> 128-159 buys us nothing.
Concerning these two points, I was thinking about files that contain
both ISO-8859-1 and UTF-8, to let the user decide. Note that this is
not necessarily an error. For instance, it can happen in diffs where
some files are encoded in ISO-8859-1 and others in UTF-8.
--
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)