<<< Date Index >>>     <<< Thread Index >>>

Re: utf-8 problems continued



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday, March 19 at 03:42 PM, quoth Chris G:
>> http://www.mutt.org/doc/devel/manual.html#send-charset
>
> On reading the manual I'm not sure I'm any the wiser, it says "Mutt 
> will use the first character set into which the text can be converted 
> exactly.", what does this mean?

For example, let's take the Euro symbol. If your email includes the 
euro symbol, mutt needs to decide which charset to use. If your 
send_charset is "us-ascii:iso-8859-1:utf-8", what happens? First, mutt 
uses iconv to try and convert your email into us-ascii. Since your 
email includes a euro-symbol, this will fail because us-ascii doesn't 
include a euro symbol, forcing mutt to try the next charset: 
iso-8859-1. This conversion attempt will ALSO fail, because iso-8859-1 
doesn't include a euro symbol either. Finally, mutt will try utf-8, 
which will succeed. Now let's consider something like a u-umlaut. 
Again, it does not exist in us-ascii, so converting a message 
containing a u-umlaut into us-ascii will fail. The iso-8859-1 
character set, however, DOES have a u-umlaut, so the conversion will 
succeed, and mutt will use that character set to send the email.

You can re-create these conversion attempts by creating a text file 
(i.e. an email message). Make sure the file is in utf-8 format (just 
for demonstration purposes). To do the same checks that mutt does, run 
these commands in your shell:

     iconv -f utf-8 -t us-ascii file.txt >/dev/null && echo success!

Did it print out success? If file.txt contained a Euro it didn't. Now 
try this:

     iconv -f utf-8 -t iso-8859-1 file.txt >/dev/null && echo success!

Did that work? Again, if file.txt contained a Euro, it shouldn't have. 
Now this (which is a trivial thing):

     iconv -f utf-8 -t utf-8 file.txt >/dev/null && echo success!

That's how mutt uses $send_charset to figure out the minimum charset 
to use for encoding an email.

> What does mutt expect the text that is fed into it to be?

Mutt expects the text your editor generates to be in $attach_charset, 
or if that isn't set, $charset.

> If I understand then mutt is trying to choose the 'least complex' 
> charset that it can.  So even if my system is locally all set up to 
> work in utf-8 and my editor sends mutt a file with pound signs 
> encoded in utf-8 if the only special characters are the pound signs 
> then mutt will re-encode the file as iso-8859-1 and send it with 
> that charset.

Correct. Another way of saying "least complex" would be "most 
compatible with other email programs".

> That would explain why my utf-8 encoded pounds were sent correctly 
> and understood by everyone, mutt recognised the pounds, saw no other 
> special characters and sent it all as iso-8859-1.  So far so good 
> (if my understanding is correct).  On the other hand my iso-8859-1 
> pound signs *weren't* understood by mutt so it sent them as utf-8 
> 'bad' characters.

More or less, yes.

> It would seem that all (ha, ha) that I need to do then is to get my 
> editor to play ball properly and be quite sure that mutt understands 
> what I have entered using the editor.

Exactly!

~Kyle
- -- 
We must not confuse dissent with disloyalty. When the loyal opposition 
dies, I think the soul of America dies with it.
                                                    -- Edward R. Murrow
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iEYEARECAAYFAkfhP0QACgkQBkIOoMqOI17ifACcCdxnL4NDulEt64dbczUjFlGO
pFkAoMck/Zr67cCDYboh+J+lu0Wnftrb
=LejX
-----END PGP SIGNATURE-----