Re: Charset Issue
- To: mutt-users@xxxxxxxx
- Subject: Re: Charset Issue
- From: Kyle Wheeler <kyle-mutt@xxxxxxxxxxxxxx>
- Date: Wed, 12 Mar 2008 16:18:16 -0500
- Comment: DomainKeys? See http://domainkeys.sourceforge.net/
- Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=memoryhole.net; h=date: from:to:subject:message-id:references:mime-version:content-type: content-transfer-encoding:in-reply-to; q=dns/txt; s=default; bh= OsMR87mVt5fwFHvh97gi91XlfR8=; b=iqE0YHnJ56hqYUaY8/pmP47kitGea6VS OxOGa72d7LdabatBr3DO3JZSRu3S9W7mWB5Q+SklFLvjUT7jhdYVTRxJbujhuvmx RirUJrsjAhYo3sr6BHAj2KB59zE8fj3eelpyOozQLBfdkAv+t0vdAuHe/mXKskw4 meayySiB71w=
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=memoryhole.net; b=ng2/RFVNJY2pqzjhNbdOceRwOKwkkMiR4H4xX4osWrUlZPAv+y7GT1OUNZilnf1GchfV6lkhfzvLJDMfw+IL6UOfzSvhwHbVCjLWixsZGCxcyq2v968nD93h/Pqp+rPvUf1A3hyJV2bQGxaDO663GlhHQa8OBnJPtqAo1/me5YU=; h=Received:Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:References:MIME-Version:Content-Type:Content-Disposition:Content-Transfer-Encoding:In-Reply-To:OpenPGP:User-Agent;
- In-reply-to: <20080312200255.GE35324@xxxxxxxx>
- List-post: <mailto:mutt-users@mutt.org>
- List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-users"
- Mail-followup-to: mutt-users@xxxxxxxx
- Openpgp: id=CA8E235E; url=http://www.memoryhole.net/~kyle/kyle-pgp.asc; preference=signencrypt
- References: <20080312200255.GE35324@xxxxxxxx>
- Sender: owner-mutt-users@xxxxxxxx
- User-agent: Mutt/1.5.17 (2008-02-27)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alain can correct me if I'm wrong about any of this. :)
On Wednesday, March 12 at 04:02 PM, quoth Jorge Luis:
>set charset="iso-8859-1"
Setting the $charset manually is usually a bad idea.
>satyr's environment includes LANG=en_US.UTF-8; yekk's is
>LANG=en_US.ISO8859-1.
And THAT is a good example of WHY setting the $charset manually is a
bad idea. iso-8859-1 is not byte-compatible with UTF-8.
>My .sigs include an "a" with an acute accent (C octal escaped UTF-8:
>\303\241). Sigs on both machines were created with emacs in a Latin-1
>language environment.
>
>Mail sent from satyr to yekk comes across with the accented glyph
>properly displayed.
That is most likely due to your editor, not mutt. Your editor is being
fed a file with a UTF-8 character, and is recognizing that it needs to
convert that character into the charset specified by $LANG.
>The headers include:
>
>Content-Type: text/plain; charset=iso-8859-1
>Content-Transfer-Encoding: quoted-printable
>
>The same mail, when viewed in satyr's =Sent folder, shows an escaped
>character (\341) in place of the á, even though the headers of the
>saved mail are the same.
The octal value 0341 (or 0xE1 in hex) is the encoding of á in
ISO-8859-1. So, the headers are all correct, and so is what you sent.
Here's the thing, though: on satyr (the UTF-8 environment), mutt reads
that and thinks that it doesn't need to do ANY conversion to display
properly. Which isn't true: 0xE1 means something very different
(namely, it means you have a malformed character) in a UTF-8
environment. So when validating that character for display (via
ncurses or whatever your mutt uses---software that does not see mutt's
$charset setting but instead sees the LANG environment variable)
*FAILS*, mutt falls back to the ASCII-only escaped version. If mutt
was aware that it was operating in a UTF-8 environment, it would
convert that character and successfully display it.
>The only way I can get the accented character to display properly on
>both machines is to create a utf-8 encoded signature and set
>allow_8bit on satyr so that there's no qp encoding of the mail.
Here's the thing, though: how does your terminal handle malformed
characters? Many terminals fall back to displaying the malformed
pieces of characters as if they were ISO-8859-1 characters. When mutt
doesn't have to decode quoted-printable, it doesn't verify every
character letter-by-letter, and instead just does a wholesale
conversion from the character set the mail is labeled as (iso-8859-1,
in this case) to $charset (so, no change is made in this case) and
dumps the result to the terminal. Your terminal sees the 0xE1 byte,
recognizes that it is a malformed character, and does it's fallback:
pretends that the byte is an ISO-8859-1 character. What you're seeing
is not "correct"; you're seeing your terminal's error-recovery mode.
:)
The difference here is, I think, that when mutt is decoding
quoted-printable, it checks whether each decoded character is
displayable, while when displaying messages that are not
quoted-printable-encoded, it does not check each and every byte
(because that would take too long).
>Is the global LANG evironment variable overriding the charset that's
>set in muttrc?
Think of it this way: your terminal is going to accept a specific
character set. In order for your applications to know what the
terminal will display, they rely on LANG (and all the other related
envariables). When you specify a $charset manually, you're telling
mutt to ignore LANG, but it ignores it at its own peril. What happens
is that mutt then tries to display characters that are valid in
$charset---but they may be invalid characters as far as the terminal
is concerned.
To use another analogy, imagine that your terminal only outputs
characters through a square hole. LANG indicates "square hole".
However, you've set $charset to "round hole". That $charset setting
doesn't change the fact that your terminal only outputs characters
through a square hole. Thus, as mutt runs, it says "ah, $charset says
that I'm pushing characters through a round hole" and so mutt converts
all characters to look like round pegs. But no matter whether mutt
believes the hole to be round, it is actually square, and you can't
fit round pegs through a square hole. It would have been better if you
allowed mutt to set $charset itself, because then it could
automatically detect whether it needs to output characters as square
pegs or round pegs. Does that make sense?
>How can I set mutt to use the iso-9959-1 charset? Am
>I missing something obvious?
The question that comes to my mind is: what are you trying to achieve?
~Kyle
- --
Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies.
-- Jay
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!
iEYEARECAAYFAkfYSJgACgkQBkIOoMqOI17dmQCg2RKYbKFQ6MBPdjhkWna8Lgym
fZcAn2wSf/ixu0NP9MhCqc8k12kOYuM8
=QhaE
-----END PGP SIGNATURE-----