<<< Date Index >>>     <<< Thread Index >>>

Re: [Mutt] #3040: charset difference between index browser and



On 2008-04-05 22:31:43 -0400, Derek Martin wrote:
> On Sat, Apr 05, 2008 at 05:13:27PM +0200, Vincent Lefevre wrote:
> > There are also many bugs related to UTF-8, e.g. Debian bug 391452
> > (that's the most important one), 
> 
> It seems likely that this is not a bug at all, but an improper
> expectation of the user.  The font string
> "-*-helvetica-medium-o-*-*-12-*-*-*-*-*-*-* " almost certainly first
> matches a font which is not an ISO-10646 Unicode font, and as such is
> not compatible with UTF-8 environments.

I disagree. It seems that you don't know how fvwm works. From the
tests I did, the correct font is chosen, e.g. accented characters
are correctly displayed. In short, fvwm can automatically choose
the correct encoding. This is important, so that the user can have
a config file that doesn't depend on the locales.

> Fixing this *could* be as simple as changing the font string to
> "-*-helvetica-medium-o-*-*-12-*-*-*-*-*-iso-10646-*" though I can't

Note: there should be no dash here: iso10646.

> say, and have no desire to attempt to reproduce this problem since I
> don't use fvwm and don't have it installed anywhere.

I've tried by explicitly telling the encoding and this doesn't solve
the problem.

> It may also be that there are no helvetica fonts on your system
> which are compatible with Unicode. That also is not a bug, it's a
> misconfiguration.

Helvetica fonts are present and can be displayed (confirmed by xfontsel).

> Another possible solution is to use a more modern window manager which
> makes use of modern font rendering techniques.

Do you have any serious argument or is this just FUD?

> In some cases, upgrading to newer versions of various applications
> than Debian provides may also solve your problem (compile from source,
> or whatever).  You may not like your choices, but you *do* have
> choices which work...  Mutt should not be redesigned in such a way as
> to be inconsistent with the current Unix paradigm for handling
> encodings, just because you don't like your options...

No, that's not the current Unix paradigm. This is a fact that files
can be in various encodings, and software needs to take that into
account. You wouldn't assume that all HTML and XML files are in UTF-8,
would you?

> You're running Debian Stable, the slowest-updating OS in the history
> of the universe (OK, I exaggerate *a little*).

Now you blame all users that run Debian stable. That's ridiculous.
FYI, on my own Debian machines, I run Debian/unstable. But I have
no choice on machines where I'm just a user.

> You're also using decades-old software (like FVWM) that still uses
> technologies that predate Unicode by many years. You're living in
> the past, and expecting to be able to cope with the modern world,
> and you want the rest of the world to bend to your ways,

No, that's not just my way. Look at how Emacs works, for instance.

> Your comments above seem more like an argument against using Debian
> Stable, or Mac OS X, than against using Unicode.

My comments is about what happens in the real world: Debian stable,
Mac OS X, and so on. Unicode is the future, but currently, there are
reasons to use other encodings.

>  I've been using Unicode exclusively since about the beginning of
> 2004 on Red Hat/Fedora and (more recently) Ubuntu systems, and I
> have not experienced *any* of the problems you're complaining
> about... Just as if you choose to misconfigure your system, also if
> you choose to run broken software, there is no helping you.

Again, I didn't misconfigure my system. Just because something doesn't
work, it is really stupid to say that the system is misconfigured.

> I can't say much about Mac OS X as I've never used it... but I do know
> others who do use it and need Unicode support, and they have not
> mentioned such problems to me.  I'm willing to believe it has bugs,
> but I'm also willing to believe that by choosing different software,
> you would not have such problems, as in the case of FVWM.

That's incredible! Cooked mode may be not very modern and has its
limitations, but this is a *standard* Unix feature. And it works
correctly under Debian (even on Debian stable).

> > Another main problem: Most of my files are in ISO-8859-1. And things
> > like grep wouldn't work under UTF-8 (is there any wrapper?).
> 
> That is unfortunate, but there are solutions.  Use iconv to convert
> them, where possible.

Doing that every time manually would be tedious. Doing that once and
for all has its own problems too. One of them is that I manage my
files with Subversion, and things like "svn ann" would return useless
information.

BTW, I've discovered two other bugs related to UTF-8 today:

1. Emacs on the N810 sets locale-coding-system to iso-8859-1, though
   the N810 only has UTF-8 locales installed.

2. zsh 4.3.6 (i.e. the latest version) has incorrectly prompt alignment
   when a prompt contains non-ASCII characters.

> > > > With Emacs, I can, even in ISO-8859-1 locales. 
> > > 
> > > You can view Japanese characters in Emacs running in an xterm with
> > > ISO-8859-1 settings and fonts?  I sincerely doubt that...
> > 
> > Most of the time, I use Emacs under ISO-8859-1 in its own window.
> 
> So, such usage is irrelevant to the case of Mutt.  In that mode, Emacs
> is a graphical program which does not have the same limitations mutt
> has.  Because of this, and because it attempted to solve these
> problems well before Unicode was created,

No, this is still necessary, e.g. to edit files that have been
obtained from other people. Not everyone is using UTF-8... Even
Mutt still supports various charsets. If Mutt has chosen to be
able to work with various charsets (independent from the locales),
why not Emacs?

> > So, I can view non-ISO-8859-1 characters without any problem.
> > Otherwise, I don't mind not being able to view them. But it is
> > important that these characters are not lost when replying to a
> > message (e.g. one gets the same subject, starting with "Re: ");
> > and Mutt currently cannot do this in locales different from UTF-8
> > (contrary to other MUA's).
> 
> Can other *TEXT-MODE* (only) MUA's do it?  None that I know of...

The fact that it is text-mode should only concern what is related
to the terminal. For instance, when copying a UTF-8 mail from one
mailbox to another, all the characters are preserved.

> Mutt lives entirely in the confines of your terminal.  It *MUST*
> convert data which is not in your current locale, because:
> 
>  - It needs to be able to display it on your terminal.

For the particular case of sending data to an editor, Mutt doesn't
display anything.

But I agree that it must convert data for the terminal I/O (I've
never said anything contrary).

>    But more importantly:
> 
>  - It relies on any number of external tools to do things.

Yes.

>    Externally spawned tools will generally not be able to determine
>    what encoding a file is encoded with.

The important word is: "generally". Genarally, but not always.

>    All they can do is assume that your locale settings are
>    consistent, and use those. In general, this is how Unix works.

This is just a default. If there is a way to determine the encoding,
there is no problem to send data in another encoding.

>    This means that any data mutt sends to them MUST be encoded in
>    your locale's encoding, or else the tools will very likely break,
>    and render you effectively unable to process your mail.

The "MUST" is incorrect. Because if there is a way to determine what
encoding is used, there's no problem to use other encodings (see below).

>    Mutt does not know whether or not your applications can cope with
>    broken locales, and in general, they can't.

Mutt can't, but the *user* can. If Mutt has an option to allow the user
to choose some encoding (this is what I suggested) and the editor chosen
by the user allows the user to configure the file encoding (e.g. Emacs),
then everything works fine.

BTW, in some way, it is better if the editor can recognize the encoding
(see Mutt's <edit> command).

> > So, Mutt is assuming that to be able to work correctly, the user
> > must be under UTF-8 locales. A limitation that some other MUA's
> > don't have.
> 
> No, that's not right.  As we've said multiple times, Mutt is assuming
> that to be able to work correctly, your data and your locale are
> consistent.

You didn't understand the point. Even if they are consistent, it does
NOT work correctly: if the user has chosen ISO-8859-1 (everything
consistent), then Mutt will not preserve non-ISO-8859-1 characters
in a reply. And the fact that Mutt is a text-base MUA is no excuse.
There are various solutions (the option I suggested, together with
a few other changes, could be one, reversible transliteration could
be another one), but they are not implemented.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)