<<< Date Index >>>     <<< Thread Index >>>

problem with utf-8 encoding using mutt + vim



Hi all.
Yes, i know this is a known problem. But i searched the list and
readed the FAQ and the man and i didn't solve it. This mail is
forwarded from the gentoo users list, i send there first looking for a
solutions, but maybe the problem is with mutt, so i claim for help in
here now.

The mail:

El Wed, Jul 27, 2005 at 09:04:29AM +0300, Moshe Kaminsky me decía:
> * Fernando Canizo <conan@xxxxxxxxxxxxx> [27/07/05 07:15]:
> > 
> > Hi all.
> > 
> > I'm having trouble with my encoding using mutt + vim + utf-8,
> > basically mi emails are sent with wrong encoding when *replying*. I've
> > tracked the problem, searched, readed FAQs and i found that maybe my
> > problem is this: that while mutt is linked to libncursesw (wide
> > library) vim is to libncurses (normal), this is the output of ldd:
> 
> I find it hard to believe that this is the problem. You say that you can 
> use utf8 when you are composing (or writing some other stuff), right? 
> What are the values of 'encoding' and 'fileencoding' in vim when 
> replying?
> Moshe

Like i said to Richard, maybe you're right. I mean: i can write an
utf-8 file from scratch using vim alone, so why would not when
invoking vim from mutt? Maybe is that mutt is telling vim something
incorrect when they communicate.

Well, i'll give more information, but this gonna grow large ;)

Reading the mutt FAQ (http://wiki.mutt.org/index.cgi?MuttFaq/Charset)
and checking everything is ok:

locale seems to be ok:
~$ locale
LANG=es_AR.utf-8
LC_CTYPE="es_AR.utf-8"
LC_NUMERIC="es_AR.utf-8"
LC_TIME="es_AR.utf-8"
LC_COLLATE="es_AR.utf-8"
LC_MONETARY="es_AR.utf-8"
LC_MESSAGES="es_AR.utf-8"
LC_PAPER="es_AR.utf-8"
LC_NAME="es_AR.utf-8"
LC_ADDRESS="es_AR.utf-8"
LC_TELEPHONE="es_AR.utf-8"
LC_MEASUREMENT="es_AR.utf-8"
LC_IDENTIFICATION="es_AR.utf-8"
LC_ALL=es_AR.utf-8

the locales settings are supported:
~$ locale -a
C
es_AR
es_AR.utf8
POSIX

checking if the locales work with perl:
~$ perl -e ""
ok doesn't show anything

checking if perl is doing the right things by setting an erroneous
locale:
~$ env LC_ALL=nocharset perl -e ""
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "es_AR.utf-8",
        LC_ALL = "nocharset",
        LC_CTYPE = "es_AR.utf-8",
        LANG = "es_AR.utf-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

ok, it cries, so it's working ok

my ~/.signature is in utf-8, my ~/.alias is too

i got this in my ~/.vimrc:
set encoding=utf-8
set fileencoding=utf-8
set termencoding=utf-8

and when mutt invokes vim i re-check that this is ok, and is ok (i
mean i check in runtime and it obbeys the configuration)

i got this in my ~/.muttrc:
set send_charset="us-ascii:utf-8"
set charset="utf-8"
set locale="es_AR.utf8"

from the mutt man i know this settings should not be necessary, since
the system is configured ok, but i try with and without and get no
difference.

Ok, that's all concerning configuration. Now i tell you how the
problem works: in mutt, if i compose a mail from scratch, without
anything, not even signature, and put a LATIN SMALL LETTER A WITH
ACUTE (got that name from unicode chart), and then send it to myself,
and to a friend, my friend sees it ok and i too.

But if now i reply to this same mail, when vim comes with the quoted
text that mutt passes to it y see garbage.

So mutt is ok seeing and sending utf-8, vim is ok writing and reading
utf-8, but when both "cooperate", things get screwed.

I investigate what was in the archives, so i saved a copy (using 'C'
command from mutt) of the first message (the one i receive from me)
and file says: 'UTF-8 Unicode mail text', check what's inside with
hexedit and see that LATIN SMALL LETTER A WITH ACUTE is encoded with
this hex: C3 A1 (which is not 00 E1 from unicode chart from
http://www.unicode.org/charts/)

Then i got that same mail and press 'r' from mutt to respond, comes
vim with garbled text, and without touching anything i save it under
some other name, and then cancel message, file in this saved text
gives me: 'UTF-8 Unicode text', but when i see inside with hexedit, i
got this hex for the same letter: C3 83 C2 A1, so now i have 4 bytes
instead of the too before. So vim-mutt (?) is re-encoding the stuff.

Like i said before, checking in vim when called from mutt for enc, and
fenc gives utf-8 like it should.

I create a file with vim to check this differences with the unicode
chart and i got C3 A1 too, so maybe the problem is with vim, it should
put 00 E1 for LATIN SMALL LETTER A WITH ACUTE.

Well, that's all i remember to have done for this problem. I'm with
this for 4 months now, i think, i took the problem, get tired of
searching and testing stuff, leave for a month, get to charge again,
and so on. But now i really want to solve it.

I think i'm going to crosspost this to vim and mutt mailing lists. But
if someone here knows how to solve this, i appreciate any help, tip,
direction to look or search, or maybe praying would do the job. 

Linking after build vim to libncursesw the way Richard say a couple of
mails before didn't solve it. If there is a way to say to emerge that
link vim to this library from the beginning i would like to try it.

Besides if anyone knows how to get which programs are using a library
(i have done it before, but at this time my brain is screwed).

What i didn't tried yet is 'the redmond way', i want to stay away from
that metod, if possible.

Thanks in advance.
-- 
Fernando Canizo - LUGMen: www.lugmen.org.ar - A8N: a8n.lugmen.org.ar
QOTD:
        "What I like most about myself is that I'm so understanding
        when I mess things up."