<<< Date Index >>>     <<< Thread Index >>>

Re: wrong charset



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday, May  7 at 09:40 PM, quoth Luis A. Florit:
>> On Monday, May  4 at 05:05 PM, quoth Luis A. Florit:
>>> I use a ISO-8859-1 encoded xterm in maemo, but :set ?charset
>>> gives me charset="utf-8".
>>
>> Are you setting it in your config somewhere? (test it my running
>> `mutt - -F /dev/null` and seeing what the value of $charset is
>> there)
>
>utf-8. No mater what I do...
>
>But I have three charsets:
>
>$charset=//TRANSLIT
>?charset=utf-8

What? That doesn't make any sense. Are those two lines actually in 
your muttrc?

I think at least part of the problem here is that you aren't 
understanding what ?charset means. The "charset" variable is almost 
always referred to as $charset, with the dollar sign. If you swap the 
dollar sign for a question mark, that's a way of telling mutt you want 
it to display the value of the variable. It's NOT a way to set the 
variable.

In other words, "set ?charset=us-ascii" is completely bogus, and 
meaningless.

>In fact, it seems that I am not able to change that ?charset
>variable to ISO-8859-1.

So, if, while running mutt, you execute the command:

    :set charset=iso-8859-1

What happens? Does an error get displayed?

>>> I tried setting by hand LANG, LC_ALL, LC_CTYPE to pt_BR and such,
>>> but no luck. No, pt_BR.ISO-8859-1 is not among the xterm locales.
>>
>> Okay, I think the first thing you need to do here (aside from ensure
>> that you're not setting $charset manually somewhere) is to find out
>> what locales your machine supports. Something like this will probably
>> work:
>>
>>      locale -a | grep '^pt_BR'
>
>Just pt_BR. But I don't want to change the default language, just the
>encoding.

If pt_BR is the only pt_BR-related locale you have installed, then 
you're stuck with the default charset, whatever that happens to be. If 
you want access to other charsets (such as utf-8), you'll have to 
install additional locales (e.g. the locale named pt_BR.utf8).

On debian, this can be done by using `dpkg-reconfigure locale`. I'm 
sure other distributions have similar means of installing/enabling 
additional locales.

>> Whatever it outputs, those are the values your computer (currently)
>> understands, and so those are the values that LANG or the LC_*
>> variables can be set to.
>
> I see. But even if I set LC_ALL=pt_BR, I get the messages in 
> Portuguese but the encoding in UTF-8. Exactly the opposite that I 
> want.

What do you mean "the encoding in UTF-8"? You mean the messages you 
receive are encoded in UTF-8? That's fine; it doesn't matter what the 
messages are encoded in, as long as mutt (and all the supporting 
libraries it uses) know what characters can be displayed, so that it 
can convert from the message's encoding to the correct encoding for 
display on your terminal.

>> It's possible that if you really want your xterm to only display 
>> ISO-8859-1 characters, you may have to install the right character 
>> sets (how to do this is often distro-dependent).
>
> But my xterm works perfectly with ISO-8859-1, for example, vim does. 
> That is not the problem, but that mutt just does not want to 
> understand the encoding.

Okay, we're over-using the term "encoding" here. Let's try and be 
clear about what's going on:

1. When you run mutt, it reports that the charset it thinks is 
appropriate is utf-8
2. Nothing you seem to do can convince mutt to avoid utf-8

It sounds like somewhere in your mutt config, you're setting $charset 
to be utf-8, and then attempting (perhaps with the wrong syntax) to 
set it to be something else.

Mutt's generally pretty good at figuring out the right $charset value 
to use, if you leave it to its own devices.

>> On the other hand, if your machine ALREADY correctly understands
>> UTF8... go with it! UTF8 is far more capable than ISO-8859-1 or any
>> other ISO charset.
>
> Several years ago I tried UTF-8, but the vast majority (I mean, 
> almost 100%) of the emails/texts/etc I read/save are ISO-8859-1 
> (that are not correctly displayed in a UTF8 console). I don't need 
> any of the non-ISO characters in UTF-8.

This is probably not worth arguing about... BUT - ISO-8859-1 files 
*should* display properly on a UTF8 console. If it doesn't, then 
something (your terminal, your text reader, whatever) is broken.

>I think I can resume in two questions:
>
>1) why ?charset=utf-8 if I am working in a ISO-8859-1 xterm?

*Probably* because somewhere in your muttrc, you're setting $charset 
to utf-8.

>2) why I am not able to change the ?charset by hand in .muttrc
>or manually with a :set command?

Because ?charset isn't a variable you can set.

Now, I admit, mutt isn't very clear in this respect, because it's 
unlike anything else. But "charset" is the name of the variable. 
"$charset" is usually the way it's referred to. However, since mutt 
doesn't have an "echo" command or anything similar, one of the 
developers (I don't know who) thought that a convenient way of 
displaying the current value of a variable would be to refer to it 
with a question mark. In other words "set ?charset" means "what is the 
$charset variable set to?".

~Kyle
- -- 
Strong coffee, much strong coffee, is what awakens me. Coffee gives me 
warmth, waking, an unusual force and a pain that is not without very 
great pleasure.
                                                  -- Napoleon Bonaparte
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iEYEARECAAYFAkoDtBIACgkQBkIOoMqOI15LVwCfR/2Sb4e8+7KJS6mCvGHTCe9R
EjwAn0wQt2TQ0ZaGUTjLqEORHhxzrWAL
=BeVh
-----END PGP SIGNATURE-----