Re: wrong charset

To: mutt-users@xxxxxxxx
Subject: Re: wrong charset
From: "Luis A. Florit" <mutt-users2008@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 16 May 2009 10:54:12 -0300
In-reply-to: <20090512182727.GM20214@xxxxxxxxxxxxx>
List-post: <mailto:mutt-users@mutt.org>
List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-users"
Mail-followup-to: mutt-users@xxxxxxxx
References: <20090504202345.GU2575@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20090508004000.GA21269@xxxxxxx> <20090508042450.GA95541@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20090508210826.GB3655@xxxxxxx> <20090508224739.GZ2443@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20090512175611.GB14956@xxxxxxx> <20090512182727.GM20214@xxxxxxxxxxxxx>
Sender: owner-mutt-users@xxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)

* El 12/05/09 a las 13:27, Kyle Wheeler chamullaba:

> On Tuesday, May 12 at 02:56 PM, quoth Luis A. Florit:
> > I did it, and mutt sets charset=utf-8.
>
> On the Nokia? Then the Nokia's locales must all be UTF-8-only.
>
> > Because ':set ?charset' gives 'charset=utf-8', and because the
> > accented characters appear as garbage.
>
> Okay, so, it would appear that the correct character set (the only
> one your locales support) is utf-8. The next question is: why do the
> accented characters appear as garbage?
>
> But let's first be clear: when LANG is correctly set to pt_PT and
> mutt has correctly detected $charset to be 'utf-8', do the accented
> characters show up as \123 or do they show up as random characters,
> such as: âEUR(TM) ?

It depends on what you set in the drop-down menu of the terminal.
If you set it as utf-8, characters are shown properly regardless of
LANG value, if set to ISO-8859-1 they are shown as random characters.

> If it shows up as random characters, then it would seem to me that
> the Nokia is *lying*; that its terminal is, in fact, incapable of
> understanding UTF-8, because it's being sent valid UTF8 character
> codes and is treating them as Windows-1252 characters instead.
>
> However, if it shows up as \123, then for some reason, mutt thinks
> that those characters are non-printable, and is attempting to mask
> them. In this case, what may be happening is that the underlying
> libraries that mutt relies on are broken and/or unreliable. To work
> around these problems, you may need to recompile mutt (and
> reconfigure it). Specifically, when you run mutt's ./configure
> program, add the --without-wc-funcs and maybe add the
> --enable-locales-fix. Here's what mutt's build documentation has to
> say about these:
>
> --enable-locales-fix on some systems, the result of isprint() can't
> be used reliably to decide which characters are printable, even if
> you set the LANG environment variable. If you set this option, Mutt
> will assume all characters in the ISO-8859-* range are printable. If
> you leave it unset, Mutt will attempt to use isprint() if either of
> the environment variables LANG, LC_ALL or LC_CTYPE is set, and will
> revert to the ISO-8859-* range if they aren't. If you need
> --enable-locales-fix then you will probably need --without-wc-funcs
> too. However, on a correctly configured modern system you shouldn't
> need either (try setting LANG, LC_CTYPE, or LC_ALL instead).
>
> --without-wc-funcs by default Mutt uses the functions mbrtowc(),
> wctomb() and wcwidth() provided by the system, when they are
> available. With this option Mutt will use its own version of those
> functions, which should work with 8-bit display charsets, UTF-8,
> euc-jp or shift_jis, even if the system doesn't normally support
> those multibyte charsets.
>
> If you find Mutt is displaying non-ascii characters as octal escape
> sequences (e.g. \243), even though you have set LANG and LC_CTYPE
> correctly, then you might find you can solve the problem with either
> or both of --enable-locales-fix and --without-wc-funcs.

\234 are shown when you :set charset=iso-8859-1, regardless of the
encoding and the value of LANG...

> > > The terminal shouldn't matter in this case.
> >
> > Perhaps I should have said this before, but I use the very same
> > .muttrc in Fedora and Nokia. Both Fedora's iso-8859-1 rxvt and
> > xterm show chars perfectly. It's the Nokia that doesn't. And both
> > have the same locales: everything as en_US. What could it be, but
> > the console?
>
> More likely than not, it's the system's string manipulation
> libraries.
>
> There are, of course, more things that can go wrong. The terminal
> may be the problem (but it's usually not), you may also not have the
> right fonts for the terminal, etc. etc. But those are unusual
> problems these days, and library and/or locale issues are far more
> common. I'm operating on the zebra principle here: if you hear
> hoofbeats, think horses, not zebras. I suppose it *could* be the
> terminal, but let's eliminate the other options first.

Yes, I agree.

> > Perhaps all the Nokia locales are UTF-8 based...?
>
> Probably. Which would make it all the more annoying if their string
> manipulation libraries cannot handle UTF-8.

My problem is with iso-8859-1. They seems to understand only UTF-8.

    L.

Follow-Ups:
- Re: wrong charset
  - From: Kyle Wheeler

References:
- Re: wrong charset
  - From: Kyle Wheeler
- Re: wrong charset
  - From: Luis A. Florit
- Re: wrong charset
  - From: Kyle Wheeler
- Re: wrong charset
  - From: Luis A. Florit
- Re: wrong charset
  - From: Kyle Wheeler
- Re: wrong charset
  - From: Luis A. Florit
- Re: wrong charset
  - From: Kyle Wheeler

Prev by Date: save indirect
Next by Date: Re: save indirect
Previous by thread: Re: wrong charset
Next by thread: Re: wrong charset
Index(es):
- Date
- Thread