Re: $assumed_charset settings (was: special chars)

To: Mutt dev ml <mutt-dev@xxxxxxxx>
Subject: Re: $assumed_charset settings (was: special chars)
From: Vincent Lefevre <vincent@xxxxxxxxxx>
Date: Mon, 26 Mar 2007 16:39:36 +0200
In-reply-to: <20070325182134.GA1245@xxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?Subject="Unsubscribe Mutt Dev"?body=unsubscribe>
Mail-followup-to: Mutt dev ml <mutt-dev@xxxxxxxx>
References: <20070318204518.GA24380@xxxxxxxxxxxxxxxxxxxxx> <20070319011040.GA13794@xxxxxxxxxxxxxxxxxxxxxxxxxx> <20070324150508.GA486@xxxxxxx> <20070325022836.GB12795@xxxxxxxxxxxxxxxxxxx> <20070325182134.GA1245@xxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.14-vl-r16324 (2007-03-20)

Salut Alain,

On 2007-03-25 20:21:35 +0200, Alain Bench wrote:
> Bonjour Vincent,
> 
>  On Sunday, March 25, 2007 at 4:28:36 +0200, Vincent Lefèvre wrote:
> 
> > On 2007-03-24 16:05:08 +0100, Alain Bench wrote:
> >> Setting UTF-8 after ISO-8859-1 is useless. Any string is always
> >> valid Latin-1.
> > Shouldn't characters 128-159 be regarded as invalid?
> 
>     No, I don't think so, for a number of reasons:
> 
>  - Mutt doesn't decide valid/invalid; It asks to iconv, which replies
> that Latin-1 128-159 are valid and convertable.
> 
>  - 128-159 are (part of) printable characters in some charsets.

Yes, I meant in ISO-8859-1, as being *non-printable* characters.

>  - If avoidable, we prefer to not hardcode special cases in Mutt.

I agree, but testing printability should be sufficient.

>  - We would not get a clean benefit anyway: Many UTF-8 strings would
> still be wrongly detected as Latin-1. Not all, but many.
> 
>  - To properly distinguish UTF-8 from a 256 chars charset (like
> Latin-1), we really need to set UTF-8 first. In this order, invalidating
> 128-159 buys us nothing.

Concerning these two points, I was thinking about files that contain
both ISO-8859-1 and UTF-8, to let the user decide. Note that this is
not necessarily an error. For instance, it can happen in diffs where
some files are encoded in ISO-8859-1 and others in UTF-8.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

Follow-Ups:
- Re: $assumed_charset settings (was: special chars)
  - From: Alain Bench

References:
- special chars
  - From: Elimar Riesebieter
- Re: special chars
  - From: TAKAHASHI Tamotsu
- $assumed_charset settings (was: special chars)
  - From: Alain Bench
- Re: $assumed_charset settings (was: special chars)
  - From: Vincent Lefevre
- Re: $assumed_charset settings (was: special chars)
  - From: Alain Bench

Prev by Date: Re: [PATCH] Remove absolute paths from gpg.rc
Next by Date: Re: [PATCH] Remove absolute paths from gpg.rc
Previous by thread: Re: $assumed_charset settings (was: special chars)
Next by thread: Re: $assumed_charset settings (was: special chars)
Index(es):
- Date
- Thread