<<< Date Index >>>     <<< Thread Index >>>

Re: $assumed_charset settings (was: special chars)



Salut Alain,

On 2007-03-25 20:21:35 +0200, Alain Bench wrote:
> Bonjour Vincent,
> 
>  On Sunday, March 25, 2007 at 4:28:36 +0200, Vincent Lefèvre wrote:
> 
> > On 2007-03-24 16:05:08 +0100, Alain Bench wrote:
> >> Setting UTF-8 after ISO-8859-1 is useless. Any string is always
> >> valid Latin-1.
> > Shouldn't characters 128-159 be regarded as invalid?
> 
>     No, I don't think so, for a number of reasons:
> 
>  - Mutt doesn't decide valid/invalid; It asks to iconv, which replies
> that Latin-1 128-159 are valid and convertable.
> 
>  - 128-159 are (part of) printable characters in some charsets.

Yes, I meant in ISO-8859-1, as being *non-printable* characters.

>  - If avoidable, we prefer to not hardcode special cases in Mutt.

I agree, but testing printability should be sufficient.

>  - We would not get a clean benefit anyway: Many UTF-8 strings would
> still be wrongly detected as Latin-1. Not all, but many.
> 
>  - To properly distinguish UTF-8 from a 256 chars charset (like
> Latin-1), we really need to set UTF-8 first. In this order, invalidating
> 128-159 buys us nothing.

Concerning these two points, I was thinking about files that contain
both ISO-8859-1 and UTF-8, to let the user decide. Note that this is
not necessarily an error. For instance, it can happen in diffs where
some files are encoded in ISO-8859-1 and others in UTF-8.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)