<<< Date Index >>>     <<< Thread Index >>>

Re: problem with utf-8 encoding using mutt + vim



Hi Fernando, *very* nice report.

 On Wednesday, July 27, 2005 at 8:21:10 AM -0300, Fernando Canizo wrote:

> El Wed, Jul 27, 2005 at 09:04:29AM +0300, Moshe Kaminsky me decía:
> my ~/.signature is in utf-8, my ~/.alias is too

    In your muttrc the $attribution string contains a "í" i acute U+00ED
coded in Latin-1 ED, instead of in UTF-8 C3 AD. This Latin-1 "í" is
given as-is to vim, autosensed as Latin-1, and converted to UTF-8. You
see a correct "í" on screen. But the quoted chars that were UTF-8 from
the beginning are converted too. To garbage.

    Solution: Either convert once for all the full muttrc to UTF-8, or
insert appropriately "set config_charset=iso-8859-1" before Latin-1
section, and "set config_charset=utf-8" before UTF-8 section (especially
where you source ~/.alias).

    Note that to avoid such oddities, it is also better to disable
editor's charset autosensing when called from Mutt.


>| LC_ALL=es_AR.utf-8

    Drop it.


>| i got this in my ~/.muttrc:
>| set send_charset="us-ascii:utf-8"

    Highly suboptimal: Either drop it, or use the one in infosig.


>| set charset="utf-8"

    Drop it.


>| set locale="es_AR.utf8"

    Keep it, and use it: An English date and time in the middle of a
Spanish attribution is not so nice.


> got this hex for the same letter: C3 83 C2 A1, so now i have 4 bytes
> instead of the too before. So vim-mutt (?) is re-encoding the stuff.

    Exactly:

| $ printf "\xE1" | iconv -f l1 -t utf-8 | iconv -f l1 -t utf-8 | hex
| C3 83 C2 A1


> i got C3 A1 too, so maybe the problem is with vim, it should put 00 E1
> for LATIN SMALL LETTER A WITH ACUTE.

    Normal: C3 A1 is the UTF-8 for "á" U+00E1.


> What i didn't tried yet is 'the redmond way', i want to stay away from
> that metod, if possible.

    Was ist den das? Ah yes: Uninstall and reinstall Mutt and Vim. That
is sure a method known to work. ;-)


Bye!    Alain.
-- 
Mutt muttrc tip to send mails in best adapted first necessary and sufficient
charset (version for Western Latin-1/Latin-9/CP-850/CP-1252 terminal users):
set send_charset="us-ascii:iso-8859-1:iso-8859-15:windows-1252:utf-8"