Re: Problems with mutt and utf-8, can't talk to itself even!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Wednesday, March 19 at 02:14 PM, quoth Chris G:
>Well I'm still not sure things are right, even after getting my
>editor to do (approximately) the right thing.
>
>Here are some incorrect pound signs:-
Those are all encoded as three bytes: 0xEF 0xBF 0xBD
>Here are some correct (as in correctly encoded as utf-8 by my editor)
>pound signs:-
Those are also all the same three bytes: 0xEF 0xBF 0xBD
That *looks* like valid utf-8.
For a quick tutorial in three-byte utf-8, the way three-byte letters
are encoded (in binary) is like this:
1110xxxx 10yyyyyy 10zzzzzz
The three bytes 0xEF 0xBF and 0xBD are, in binary, this:
11101111 10111111 10111101
Thus, the decoded portions are:
1111 111111 111101
Put them back together as a single binary number:
1111111111111101
That's 65533 in decimal (0xfffd in hex). In utf-8, that's referred to
as U+FFFD, which (according to the Unicode specification) is:
REPLACEMENT CHARACTER
- used to replace an incoming character whose value is unknown or
unrepresentable in Unicode
- compare the use of U+001A as a control character to indicate the
substitute function
In other words, if that's what your editor is generating, then it
obviously doesn't know how to handle a pound symbol, even though it
DOES seem to understand UTF-8 (kinda).
For what it's worth, the CORRECT utf-8 encoding of the pound symbol
(U+00A3) is only two bytes. Here's how we get it. Two-byte unicode
characters are encoded like this (in binary):
110yyyyy 10zzzzzz
U+00A3 translates to the hex number 0xA3, which in binary is this:
10100011
If we split that up, that becomes:
10 100011
Thus, in UTF-8 it's encoded as:
11000010 10100011
Thus, the correct UTF-8 encoding for a pound symbol is 0xC2 0xA3.
Here's an example: £
~Kyle
- --
I contend that we are both atheists. I just believe in one fewer god
than you do. When you understand why you dismiss all the other
possible gods, you will understand why I dismiss yours.
-- Sir Stephen Henry Roberts
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!
iEYEARECAAYFAkfhMFsACgkQBkIOoMqOI144+gCg5bLJ2t7fK7+Ih1A6qBFgeuka
jO0AoKDy+JgwsknmCiSDkOwG4OTE2p0Z
=euIx
-----END PGP SIGNATURE-----