<<< Date Index >>>     <<< Thread Index >>>

Re: [Mutt] #3040: charset difference between index browser and



On 2008-03-27 11:21:15 -0400, Derek Martin wrote:
> On Thu, Mar 27, 2008 at 01:00:29AM +0100, Vincent Lefevre wrote:
> > On 2008-03-22 16:28:43 +0100, Thomas Roessler wrote:
> > > The user isn't assumed to run under UTF-8.  The user is assumed to
> > > run in a consistent environment in which the terminal, the file
> > > system, and local files share a single character set which is
> > > inferred from the user's locale settings.
> > 
> > This is absolute non-sense! 
> 
> That statement is stupid, annoying, and inflammatory.  What Thomas
> described is how Unix is designed to work.  It's not nonsense.

This is non-sense: download various files from the web, you'll see
that they have different encodings (note: web browser don't do any
charset conversion when saving files). In fact, no need to download
any file: files provided by the OS distribution already have some
encoding, which may differ from the user's. The assumption that
"local files share a single character set" is just stupid.

> If your locale settings are not consistent, you should expect to
> have problems.

My locale settings are consistent and I don't have any problem with
them. At least one major editor can accept files in encodings other
than the one specified by the locales: Emacs. And I have no problems
with that too. This is all benefit for the users.

> > In particular nowadays, where files (e.g. mail messages) often come
> > from remote people, who may use various encodings.
> 
> That's what iconv is for.

iconv has a major problem: it loses information in some conversions,
e.g. when concerting a file from UTF-8 into ISO-8859-1. Also, the
user is not expected to run iconv on every file he wants to read.
Encoding conversion is better done by the editor itself: it can be
automatical and there is no information loss (the editor keeps
original data internally).

> If you're not using a unicode environment, you're making a concious
> choice that only data encoded in the locale you've selected matters
> to you.

Many applications do not work that way, in particular graphical ones.

> If you're a native French speaker, concerned that someone will send
> you a mail in EUC-JP and you won't be able to read the characters in
> it,

With Emacs, I can, even in ISO-8859-1 locales. But anyway you missed
the point that not being able to read them (e.g. in the terminal) is
not the main problem. The problem is that these characters will be
munged by Mutt when contents are transmitted to remote users. This
is not acceptable.

> If you only care to be able to read ISO-8559-1 encoded data, then
> using that environment, and configuring your locale consistently for
> it, works perfectly fine.  If you've chosen to use ISO-8859-1, and you
> care about other encodings, then you're clearly either uninformed
> about how locales work in Unix, have constraints placed upon you by
> outside factors (i.e. work reqirements), or you're just an idiot.

No, you didn't understand anything, and *you* are the idiot.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)