<<< Date Index >>>     <<< Thread Index >>>

Re: locale problem



Hello, Derek.
Sorry for taking so much time to reply to your mail. I've been quite
busy lately trying to solve this problem, and I have partly resolved it.
See below.

* Derek Martin <invalid@xxxxxxxxxxxxxx> [19-08-2004 12:33]:
> Can you be more specific about when the problem occurs?   Can you
> provide a link to a mailbox which exhibits the problem?

I have debugged it down to find out that the messages that display
accents as a space in the index are the ones that are iso8859-1, but
have attachments.
The content-type shows as "multipart/alternative", or something else
like that, and then don't define it's charset anywhere in the headers.
In the pager view, the accent in this case show as "\341".
However, look at this:

From: =?iso-8859-1?q?Patr=EDcia=20Rosa?= <plsrosa2002@xxxxxxxxxxxx>
Subject: Lim ozinho_safado!!!!!!!!!!!!!!
Content-Type: multipart/mixed; boundary="0-1219368190-1092794218=:86288"

The "From" is displayed correctly in the pager, but not in the index.
The accent (the name is Patrícia Rosa) comes as a space. I think mutt
should display it correctly in both views, as it seems to come correctly
quoted.
The subject comes as a space in the index and as \343 in the pager.
However, it is not quoted in anyway, so I guess mutt has no way to know
that space should be parsed as either utf-8 or as iso8859-1. In fact,
when I copy/paste it into mozilla, that space shows as a square with FF
F0 inside (which is clearly utf-8).

> I'm not sure that there is...  You could pipe it through iconv, but
> you will probably break the mail.

Yes... I haven't even tested it.
What I have done that made some things work more correctly was recompile
ncurses, passing "--enable-widec" to its configure script to enable wide
characters support (utf-8).
This initially broke some things, because when compiled this way, the
library is named libncursesw.so, and not libncurses.so. A simpĺe symlink
solved this problem, but I was told that this will break some things, as
the two libraries are not binary compatible.

Recompiling libncurses this way, and making mutt use it instead of slang
resolved the problem with glitches in the index view. Before that,
every line that had an accent would be displayed a bit to the left (as
if internally the library was counting 2 characters from utf-8, and
displaying just one). So, the From name would appear correctly, but the
remaining fields would be displayed some spaces to the left.
When I would press page-down to go to next screen, mutt wouldn't redraw
the screen correctly, requiring a ctrl-l.

However, linking mutt against this new libncurses don't solve the
problem of the pager view. I'm quite sure this must be either a mutt
error or a configuration error.

> Ok, that's fine, but how are you starting gnome terminal?  You didn't
> actually answer my question...  Even if you set your locale properly
> in your .bashrc or whatever, the gnome-terminal may still be started
> with a different locale if it is started by your window manager.  For
> example, if your system's default locale is different from that which
> you have set in your .bashrc, the windowing system may not (and
> probably won't) read your .bashrc before it starts, so programs
> started from it will have the system's default locale, NOT the one you
> defined in your .bashrc file.
> 
> The above depends on your default system locale, as well as how the
> windowing system was started...  If the system's default locale is
> en_US, and the windowing system is started at boot time, then programs
> started by it may actually be started with a locale of en_US.  This
> will cause problems.

Whole system is started as utf-8. GDM sets my locale to utf-8. All gnome
applications seem to work fine with utf-8. The terminal's locale is
en_US.UTF-8, and I'm not setting it in any script.

> The problem here is that the characters are being output using an
> encoding which is different from your locale.  In other words, the
> script is sending iso-8859-1 characters to your terminal, but the
> terminal is interpreting them as UTF-8 characters.  Any codes which
> don't map the same in both locales will be errors.

I know that... vim does the conversion well. Running that script shows
spaces instead of characters, as expected.
What bothers me is mutt not doing the conversion.

> For characters to be displayed properly, all of the following must
> match locale:
> 
>  - the locale which your terminal was started with
>  - the actual characters being sent to the terminal
>  - the locale of the shell being run by your terminal
>  - the font used to display the characters

The font doesn't seem to be a problem here, as I can see the characters
if I type them.
The locale also seems to be right. All files I write are encoded in
utf-8 by default, and every other things work happily.

However, the characters being sent to the terminal seem to be wrong. I'm
not sure wether this is a configuration problem or a mutt internal
problem, but there surely is a problem somewhere.
It doesn't seem to be a configuration error, as I have already tried to
run mutt using 'mutt -nF /dev/null', and the problem is still the same.

> In the case of e-mail, the received e-mail's MIME type must also
> match.  If the charset of the data being written to the terminal is
> different from the charset of your terminal, then you need to use
> iconv to convert it.  However, if you have your settings set properly,
> mutt should do this for you.  
> 
> So, please identify a specific message which exhibits the problem, and
> then answer the following questions:
> 
> 1. What is the character set named in the Content-Type field of the
>    message?
> 
> 2. What is the complete output of the locale command on your terminal
>    prior to starting

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

> 3. What is the system's default locale?

Same as above.

> 4. After you start mutt, what is the value of $charset (in mutt)?

utf-8

> 5. Are you sure your font can display all the relevant glyphs?

Yes, I can see the characters: ç ã á é ó í
They all display fine here (c cedilla, a tilde, acute in a, e, o and i).

> 6. What is the vaule of mutt's $send_charset?

send_charset="us-ascii:iso-8859-1:utf-8"

This isn't set by config file, it seems to be the default.

The display problem regarding the index view was resolved by changing
libncurses to libncursesw. However, the ncurses maintainer tolde me that
mutt must take into account that utf-8 characters are 2 or 3 bytes wide,
to display things properly aligned.

-- 
Bruno Lustosa, aka Lofofora          | Email: bruno@xxxxxxxxxxx
Network Administrator/Web Programmer | ICQ: 1406477
Rio de Janeiro - Brazil              |

Attachment: pgpOP1mkj9vKT.pgp
Description: PGP signature