Re: [Mutt] #3040: charset difference between index browser and

To: mutt-dev@xxxxxxxx, fleas@xxxxxxxx
Subject: Re: [Mutt] #3040: charset difference between index browser and
From: Derek Martin <invalid@xxxxxxxxxxxxxx>
Date: Sat, 5 Apr 2008 22:31:43 -0400
In-reply-to: <20080405151327.GF3478@xxxxxxxxxxxxx>
List-post: <mailto:mutt-dev@mutt.org>
List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-dev"
Mail-followup-to: mutt-dev@xxxxxxxx, fleas@xxxxxxxx
References: <20080321104159.GA546@xxxxxxx> <20080321141241.GG28333@xxxxxxxxxxxxxxxxxxx> <20080321142025.GA577@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080321154651.GK28333@xxxxxxxxxxxxxxxxxxx> <20080322152843.GE255@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080327000028.GT17011@xxxxxxxxxxxxxxxxxxx> <20080327152115.GO26870@xxxxxxxxxxxxx> <20080328091430.GC17011@xxxxxxxxxxxxxxxxxxx> <20080328134802.GP26870@xxxxxxxxxxxxx> <20080405151327.GF3478@xxxxxxxxxxxxx>
Reply-to: mutt-dev@xxxxxxxx
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.13 (2006-11-28)

On Sat, Apr 05, 2008 at 05:13:27PM +0200, Vincent Lefevre wrote:
> On 2008-03-28 09:48:02 -0400, Derek Martin wrote:
> > switch your environment to UTF-8.  So who's the idiot?
> 
> You're not living in the real world. 

I beg to differ, see below.

> There are also many bugs related to UTF-8, e.g. Debian bug 391452
> (that's the most important one), 

It seems likely that this is not a bug at all, but an improper
expectation of the user.  The font string
"-*-helvetica-medium-o-*-*-12-*-*-*-*-*-*-* " almost certainly first
matches a font which is not an ISO-10646 Unicode font, and as such is
not compatible with UTF-8 environments.  This is probably due to
limitations of the original methods of rendering fonts in the X
Windows System, and is not actually a bug, but merely a
misconfiguration.  As I (and others) have said, if your settings are
not consistent, you should not expect things to work correctly.

Fixing this *could* be as simple as changing the font string to
"-*-helvetica-medium-o-*-*-12-*-*-*-*-*-iso-10646-*" though I can't
say, and have no desire to attempt to reproduce this problem since I
don't use fvwm and don't have it installed anywhere.  It may also be
that there are no helvetica fonts on your system which are compatible
with Unicode.  That also is not a bug, it's a misconfiguration.

Another possible solution is to use a more modern window manager which
makes use of modern font rendering techniques.

In some cases, upgrading to newer versions of various applications
than Debian provides may also solve your problem (compile from source,
or whatever).  You may not like your choices, but you *do* have
choices which work...  Mutt should not be redesigned in such a way as
to be inconsistent with the current Unix paradigm for handling
encodings, just because you don't like your options...

> Debian bug 254507, cooked mode
[...]
> I've started to switch to UTF-8, but I'm fed up with all these bugs
> I encounter, so that I'm still mainly under ISO-8859-1 locales.

You're running Debian Stable, the slowest-updating OS in the history
of the universe (OK, I exaggerate *a little*).  You're also using
decades-old software (like FVWM) that still uses technologies that
predate Unicode by many years.  You're living in the past, and
expecting to be able to cope with the modern world, and you want the
rest of the world to bend to your ways, instead of using modern,
well-established solutions that work for many, many people.  That's
crazy (but very typical of you).

Your comments above seem more like an argument against using Debian
Stable, or Mac OS X, than against using Unicode.  I've been using
Unicode exclusively since about the beginning of 2004 on Red
Hat/Fedora and (more recently) Ubuntu systems, and I have not
experienced *any* of the problems you're complaining about...  Just as
if you choose to misconfigure your system, also if you choose to run
broken software, there is no helping you.

I can't say much about Mac OS X as I've never used it... but I do know
others who do use it and need Unicode support, and they have not
mentioned such problems to me.  I'm willing to believe it has bugs,
but I'm also willing to believe that by choosing different software,
you would not have such problems, as in the case of FVWM.

> Another main problem: Most of my files are in ISO-8859-1. And things
> like grep wouldn't work under UTF-8 (is there any wrapper?).

That is unfortunate, but there are solutions.  Use iconv to convert
them, where possible.  If you need a particular document, review it
and hand-edit any broken characters before you send it off. This is no
different than if you had been previously working on an IBM mainframe
using EBSIDIC, and then decided to migrate to Windows XP (or, well,
anything).  You'd need to convert your data.  Here again, your
expectation is unrealistic.

> > > With Emacs, I can, even in ISO-8859-1 locales. 
> > 
> > You can view Japanese characters in Emacs running in an xterm with
> > ISO-8859-1 settings and fonts?  I sincerely doubt that...
> 
> Most of the time, I use Emacs under ISO-8859-1 in its own window.

So, such usage is irrelevant to the case of Mutt.  In that mode, Emacs
is a graphical program which does not have the same limitations mutt
has.  Because of this, and because it attempted to solve these
problems well before Unicode was created, it has implemented a wide
variety of functionality to help solve these problems, which no sane
program should ever need to implement ever again.  The reason for that
is that we now have Unicode, which solves the problem nicely and
completely, so long as your system is configured properly to use it.

> So, I can view non-ISO-8859-1 characters without any problem.
> Otherwise, I don't mind not being able to view them. But it is
> important that these characters are not lost when replying to a
> message (e.g. one gets the same subject, starting with "Re: ");
> and Mutt currently cannot do this in locales different from UTF-8
> (contrary to other MUA's).

Can other *TEXT-MODE* (only) MUA's do it?  None that I know of...
Mutt lives entirely in the confines of your terminal.  It *MUST*
convert data which is not in your current locale, because:

 - It needs to be able to display it on your terminal.  

   Mutt provides the ability to view your edited message, and (in some
   cases) also your attachments, right within Mutt.  It therefore must
   convert the data to an encoding consistent with your terminal
   settings.  Attempting to display data in a different encoding than
   your terminal is configured for can *break your terminal*, and
   should not be attempted.

   But more importantly:

 - It relies on any number of external tools to do things.

   Externally spawned tools will generally not be able to determine
   what encoding a file is encoded with.  All they can do is assume
   that your locale settings are consistent, and use those.  In
   general, this is how Unix works.  This means that any data mutt
   sends to them MUST be encoded in your locale's encoding, or else
   the tools will very likely break, and render you effectively unable
   to process your mail.

   Mutt does not know whether or not your applications can cope with
   broken locales, and in general, they can't.  It must therefore
   assume that they can't, and convert the data.  There is no other
   logical choice.

You don't have to like it, but that is the world we live in, by
design.  If you don't like it, you're free to use other software.
It's true, that Mutt could implement all manner of conversion code
directly inside Mutt, like Emacs does, and leave the data intact
internally.  But in the generic case, the only reliable means Mutt has
to determine what the correct encoding for a given data stream is, is
by checking your locale settings, and assuming that is correct.  This
is the philosophy behind handling locales which is prevalent
throughout the Unix world (Emacs notwithstanding), and as such mutt
should follow it.  Furthermore, if Mutt did that, then when it
interfaced with other external tools, that interaction would very
likely break, due to the data not being consistent with the user's
configured locale.  Even if a few clever programs can manage it, the
overwhelming majority can not.

If this is a problem for you, all one can suggest is for you to switch
to Emacs...

> Emacs can work with UTF-8 files under ISO-8859-1 without any
> problem. 

Emacs implements a great many things, some of which are necessary for
this to work, which make it horribly bloated.  Many people, myself
included, don't run it for that very reason.  Mutt is designed to work
with a vast array of external tools, not just your particular
favorite.  It can't assume that all tools are equally silly. :)

> > I didn't miss the point.  Switch to UTF-8 and you don't have this
> > problem.  That's the point.
> 
> So, Mutt is assuming that to be able to work correctly, the user
> must be under UTF-8 locales. A limitation that some other MUA's
> don't have.

No, that's not right.  As we've said multiple times, Mutt is assuming
that to be able to work correctly, your data and your locale are
consistent.  In any Unix work-alike, if you happen to need to work
with data in multiple encodings, then using Unicode is the only
sensible choice, BY DESIGN.  Not Mutt's design, but Unix/POSIX's
design (and that of the creators of Unicode, where different).

If you refuse to configure your environment properly, or if you choose
to run broken software, that's not Mutt's problem, it's yours.  If you
communicate with such people, there's only so much Mutt can be
expected to do.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: pgpgtEZRaoEAe.pgp
Description: PGP signature

Follow-Ups:
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre

References:
- Re: [Mutt] #3040: charset difference between index browser and pager
  - From: Alain Bench
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Thomas Roessler
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Thomas Roessler
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Derek Martin
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Derek Martin
- Re: [Mutt] #3040: charset difference between index browser and
  - From: Vincent Lefevre

Prev by Date: [Mutt] #3044: mutt dereferenced pointer and strict-aliases
Next by Date: Re: [Mutt] #3040: charset difference between index browser and pager
Previous by thread: Re: [Mutt] #3040: charset difference between index browser and
Next by thread: Re: [Mutt] #3040: charset difference between index browser and
Index(es):
- Date
- Thread