Re: utf-8 problems continued
- To: mutt-users@xxxxxxxx
- Subject: Re: utf-8 problems continued
- From: Kyle Wheeler <kyle-mutt@xxxxxxxxxxxxxx>
- Date: Wed, 19 Mar 2008 11:28:52 -0500
- Comment: DomainKeys? See http://domainkeys.sourceforge.net/
- Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=memoryhole.net; h=date: from:to:subject:message-id:references:mime-version:content-type: in-reply-to; q=dns/txt; s=default; bh=x10f99aT8PXnOUNVrXDr1mVkc5 Q=; b=dtMpk958twPEMMwh8U8AhjlTpBILVVbS9awVICZOXVws4F1/5UQVx0G0ex dBoyJs5NLTVKoYX5CTW7rvv42N+29ahff6NOn01PwOiyCMwmeSPx6iCZY7XAR4af ygv6k28H3wSkiXP3ZiGMpTo8iBWVDJFbZt5DrBrySKx+xgz2Q=
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=memoryhole.net; b=bYv720MnXcNtAjCmVHc4THjvxnwSDpigi9RX1+Q7BUQAXdHRrp4EfZ49/0XVHjEBwbMKkLOkl7VkQ8JTtl4qlIk6uy7ZbxONgR/Dk6fCxUIaV4AYPVawpIqkoPKnlYhbmja3VpXYpYlGXAP2C9xp1ylvXRzc2WyjMKMjGXjzXWc=; h=Received:Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:OpenPGP:User-Agent;
- In-reply-to: <20080319154219.GB1152@th-shell-1>
- List-post: <mailto:mutt-users@mutt.org>
- List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-users"
- Mail-followup-to: mutt-users@xxxxxxxx
- Openpgp: id=CA8E235E; url=http://www.memoryhole.net/~kyle/kyle-pgp.asc; preference=signencrypt
- References: <20080319145719.GA25875@xxxxxxxxxxxxxxxxxxxxxxx> <20080319150217.GA26007@th-shell-1> <20080319151443.GC6131@xxxxxxxxxxxxxxxxx> <20080319154219.GB1152@th-shell-1>
- Sender: owner-mutt-users@xxxxxxxx
- User-agent: Mutt/1.5.17 (2008-02-27)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Wednesday, March 19 at 03:42 PM, quoth Chris G:
>> http://www.mutt.org/doc/devel/manual.html#send-charset
>
> On reading the manual I'm not sure I'm any the wiser, it says "Mutt
> will use the first character set into which the text can be converted
> exactly.", what does this mean?
For example, let's take the Euro symbol. If your email includes the
euro symbol, mutt needs to decide which charset to use. If your
send_charset is "us-ascii:iso-8859-1:utf-8", what happens? First, mutt
uses iconv to try and convert your email into us-ascii. Since your
email includes a euro-symbol, this will fail because us-ascii doesn't
include a euro symbol, forcing mutt to try the next charset:
iso-8859-1. This conversion attempt will ALSO fail, because iso-8859-1
doesn't include a euro symbol either. Finally, mutt will try utf-8,
which will succeed. Now let's consider something like a u-umlaut.
Again, it does not exist in us-ascii, so converting a message
containing a u-umlaut into us-ascii will fail. The iso-8859-1
character set, however, DOES have a u-umlaut, so the conversion will
succeed, and mutt will use that character set to send the email.
You can re-create these conversion attempts by creating a text file
(i.e. an email message). Make sure the file is in utf-8 format (just
for demonstration purposes). To do the same checks that mutt does, run
these commands in your shell:
iconv -f utf-8 -t us-ascii file.txt >/dev/null && echo success!
Did it print out success? If file.txt contained a Euro it didn't. Now
try this:
iconv -f utf-8 -t iso-8859-1 file.txt >/dev/null && echo success!
Did that work? Again, if file.txt contained a Euro, it shouldn't have.
Now this (which is a trivial thing):
iconv -f utf-8 -t utf-8 file.txt >/dev/null && echo success!
That's how mutt uses $send_charset to figure out the minimum charset
to use for encoding an email.
> What does mutt expect the text that is fed into it to be?
Mutt expects the text your editor generates to be in $attach_charset,
or if that isn't set, $charset.
> If I understand then mutt is trying to choose the 'least complex'
> charset that it can. So even if my system is locally all set up to
> work in utf-8 and my editor sends mutt a file with pound signs
> encoded in utf-8 if the only special characters are the pound signs
> then mutt will re-encode the file as iso-8859-1 and send it with
> that charset.
Correct. Another way of saying "least complex" would be "most
compatible with other email programs".
> That would explain why my utf-8 encoded pounds were sent correctly
> and understood by everyone, mutt recognised the pounds, saw no other
> special characters and sent it all as iso-8859-1. So far so good
> (if my understanding is correct). On the other hand my iso-8859-1
> pound signs *weren't* understood by mutt so it sent them as utf-8
> 'bad' characters.
More or less, yes.
> It would seem that all (ha, ha) that I need to do then is to get my
> editor to play ball properly and be quite sure that mutt understands
> what I have entered using the editor.
Exactly!
~Kyle
- --
We must not confuse dissent with disloyalty. When the loyal opposition
dies, I think the soul of America dies with it.
-- Edward R. Murrow
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!
iEYEARECAAYFAkfhP0QACgkQBkIOoMqOI17ifACcCdxnL4NDulEt64dbczUjFlGO
pFkAoMck/Zr67cCDYboh+J+lu0Wnftrb
=LejX
-----END PGP SIGNATURE-----