Re: [Mutt] #3040: charset difference between index browser and pager

To: miek@xxxxxxx, brendan@xxxxxxxxxx
Subject: Re: [Mutt] #3040: charset difference between index browser and pager
From: Mutt <fleas@xxxxxxxx>
Date: Sun, 06 Apr 2008 02:31:50 -0000
Cc: mutt-dev@xxxxxxxx
In-reply-to: <037.b4d3a5e597da310ee107a426b2fc4a0e@xxxxxxxx>
List-post: <mailto:mutt-dev@mutt.org>
List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-dev"
Mail-followup-to: fleas@xxxxxxxx
References: <037.b4d3a5e597da310ee107a426b2fc4a0e@xxxxxxxx>
Reply-to: fleas@xxxxxxxx
Sender: owner-mutt-dev@xxxxxxxx

#3040: charset difference between index browser and pager

Comment (by Derek Martin):

 {{{
 On Sat, Apr 05, 2008 at 05:13:27PM +0200, Vincent Lefevre wrote:
 > On 2008-03-28 09:48:02 -0400, Derek Martin wrote:
 > > switch your environment to UTF-8.  So who's the idiot?
 >
 > You're not living in the real world.

 I beg to differ, see below.

 > There are also many bugs related to UTF-8, e.g. Debian bug 391452
 > (that's the most important one),

 It seems likely that this is not a bug at all, but an improper
 expectation of the user.  The font string
 "-*-helvetica-medium-o-*-*-12-*-*-*-*-*-*-* " almost certainly first
 matches a font which is not an ISO-10646 Unicode font, and as such is
 not compatible with UTF-8 environments.  This is probably due to
 limitations of the original methods of rendering fonts in the X
 Windows System, and is not actually a bug, but merely a
 misconfiguration.  As I (and others) have said, if your settings are
 not consistent, you should not expect things to work correctly.

 Fixing this *could* be as simple as changing the font string to
 "-*-helvetica-medium-o-*-*-12-*-*-*-*-*-iso-10646-*" though I can't
 say, and have no desire to attempt to reproduce this problem since I
 don't use fvwm and don't have it installed anywhere.  It may also be
 that there are no helvetica fonts on your system which are compatible
 with Unicode.  That also is not a bug, it's a misconfiguration.

 Another possible solution is to use a more modern window manager which
 makes use of modern font rendering techniques.

 In some cases, upgrading to newer versions of various applications
 than Debian provides may also solve your problem (compile from source,
 or whatever).  You may not like your choices, but you *do* have
 choices which work...  Mutt should not be redesigned in such a way as
 to be inconsistent with the current Unix paradigm for handling
 encodings, just because you don't like your options...

 > Debian bug 254507, cooked mode
 [...]
 > I've started to switch to UTF-8, but I'm fed up with all these bugs
 > I encounter, so that I'm still mainly under ISO-8859-1 locales.

 You're running Debian Stable, the slowest-updating OS in the history
 of the universe (OK, I exaggerate *a little*).  You're also using
 decades-old software (like FVWM) that still uses technologies that
 predate Unicode by many years.  You're living in the past, and
 expecting to be able to cope with the modern world, and you want the
 rest of the world to bend to your ways, instead of using modern,
 well-established solutions that work for many, many people.  That's
 crazy (but very typical of you).

 Your comments above seem more like an argument against using Debian
 Stable, or Mac OS X, than against using Unicode.  I've been using
 Unicode exclusively since about the beginning of 2004 on Red
 Hat/Fedora and (more recently) Ubuntu systems, and I have not
 experienced *any* of the problems you're complaining about...  Just as
 if you choose to misconfigure your system, also if you choose to run
 broken software, there is no helping you.

 I can't say much about Mac OS X as I've never used it... but I do know
 others who do use it and need Unicode support, and they have not
 mentioned such problems to me.  I'm willing to believe it has bugs,
 but I'm also willing to believe that by choosing different software,
 you would not have such problems, as in the case of FVWM.

 > Another main problem: Most of my files are in ISO-8859-1. And things
 > like grep wouldn't work under UTF-8 (is there any wrapper?).

 That is unfortunate, but there are solutions.  Use iconv to convert
 them, where possible.  If you need a particular document, review it
 and hand-edit any broken characters before you send it off. This is no
 different than if you had been previously working on an IBM mainframe
 using EBSIDIC, and then decided to migrate to Windows XP (or, well,
 anything).  You'd need to convert your data.  Here again, your
 expectation is unrealistic.

 > > > With Emacs, I can, even in ISO-8859-1 locales.
 > >
 > > You can view Japanese characters in Emacs running in an xterm with
 > > ISO-8859-1 settings and fonts?  I sincerely doubt that...
 >
 > Most of the time, I use Emacs under ISO-8859-1 in its own window.

 So, such usage is irrelevant to the case of Mutt.  In that mode, Emacs
 is a graphical program which does not have the same limitations mutt
 has.  Because of this, and because it attempted to solve these
 problems well before Unicode was created, it has implemented a wide
 variety of functionality to help solve these problems, which no sane
 program should ever need to implement ever again.  The reason for that
 is that we now have Unicode, which solves the problem nicely and
 completely, so long as your system is configured properly to use it.

 > So, I can view non-ISO-8859-1 characters without any problem.
 > Otherwise, I don't mind not being able to view them. But it is
 > important that these characters are not lost when replying to a
 > message (e.g. one gets the same subject, starting with "Re: ");
 > and Mutt currently cannot do this in locales different from UTF-8
 > (contrary to other MUA's).

 Can other *TEXT-MODE* (only) MUA's do it?  None that I know of...
 Mutt lives entirely in the confines of your terminal.  It *MUST*
 convert data which is not in your current locale, because:

  - It needs to be able to display it on your terminal.

    Mutt provides the ability to view your edited message, and (in some
    cases) also your attachments, right within Mutt.  It therefore must
    convert the data to an encoding consistent with your terminal
    settings.  Attempting to display data in a different encoding than
    your terminal is configured for can *break your terminal*, and
    should not be attempted.

    But more importantly:

  - It relies on any number of external tools to do things.

    Externally spawned tools will generally not be able to determine
    what encoding a file is encoded with.  All they can do is assume
    that your locale settings are consistent, and use those.  In
    general, this is how Unix works.  This means that any data mutt
    sends to them MUST be encoded in your locale's encoding, or else
    the tools will very likely break, and render you effectively unable
    to process your mail.

    Mutt does not know whether or not your applications can cope with
    broken locales, and in general, they can't.  It must therefore
    assume that they can't, and convert the data.  There is no other
    logical choice.

 You don't have to like it, but that is the world we live in, by
 design.  If you don't like it, you're free to use other software.
 It's true, that Mutt could implement all manner of conversion code
 directly inside Mutt, like Emacs does, and leave the data intact
 internally.  But in the generic case, the only reliable means Mutt has
 to determine what the correct encoding for a given data stream is, is
 by checking your locale settings, and assuming that is correct.  This
 is the philosophy behind handling locales which is prevalent
 throughout the Unix world (Emacs notwithstanding), and as such mutt
 should follow it.  Furthermore, if Mutt did that, then when it
 interfaced with other external tools, that interaction would very
 likely break, due to the data not being consistent with the user's
 configured locale.  Even if a few clever programs can manage it, the
 overwhelming majority can not.

 If this is a problem for you, all one can suggest is for you to switch
 to Emacs...

 > Emacs can work with UTF-8 files under ISO-8859-1 without any
 > problem.

 Emacs implements a great many things, some of which are necessary for
 this to work, which make it horribly bloated.  Many people, myself
 included, don't run it for that very reason.  Mutt is designed to work
 with a vast array of external tools, not just your particular
 favorite.  It can't assume that all tools are equally silly. :)

 > > I didn't miss the point.  Switch to UTF-8 and you don't have this
 > > problem.  That's the point.
 >
 > So, Mutt is assuming that to be able to work correctly, the user
 > must be under UTF-8 locales. A limitation that some other MUA's
 > don't have.

 No, that's not right.  As we've said multiple times, Mutt is assuming
 that to be able to work correctly, your data and your locale are
 consistent.  In any Unix work-alike, if you happen to need to work
 with data in multiple encodings, then using Unicode is the only
 sensible choice, BY DESIGN.  Not Mutt's design, but Unix/POSIX's
 design (and that of the creators of Unicode, where different).

 If you refuse to configure your environment properly, or if you choose
 to run broken software, that's not Mutt's problem, it's yours.  If you
 communicate with such people, there's only so much Mutt can be
 expected to do.
 }}}

-- 
Ticket URL: <http://dev.mutt.org/trac/ticket/3040#comment:>

References:
- [Mutt] #3040: charset difference between index browser and pager
  - From: Mutt

Prev by Date: Re: [Mutt] #3040: charset difference between index browser and
Next by Date: Re: [Mutt] #2747: imap_keepalive ignored when less than timeout and
Previous by thread: Re: [Mutt] #3040: charset difference between index browser and pager
Next by thread: Re: [Mutt] #3040: charset difference between index browser and pager
Index(es):
- Date
- Thread