<<< Date Index >>>     <<< Thread Index >>>

Re: Split-screen mode in mutt?



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday, May  2 at 06:51 PM, quoth Michelle Konzack:
> Hello Kyle,
>
> Am 2006-04-30 10:58:56, schrieb Kyle Wheeler:
>
>> Unless I???m misunderstanding you, it doesn???t sound like it???s 
>> particularly necessary for the two sides of your split to talk to each 
>> other. If that???s the case, then you can use ways to divide your 
>> terminal and run multiple instances of mutt in order to best use the 
>> space you have.
>
> Your UTF-8 Encoding is broken...

No, it's not. Better yet, I can prove it.

If you examine my email in its raw form, you will see that (for 
example) the curly single right-quotes are encoded, in 
"quoted-printable" format, as the string =E2=80=99, which of course is 
the quoted-printable form of three hexadecimal bytes 0xE2 0x80 and 
0x99, respectively. In octal, those bytes are 0342 0200 0231

When I do "echo -e '\342\200\231'" in my UTF-8 terminal, I get a curly 
right single-quote. What do you get?

Now, let's figure out what it *should* be. In Unicode, the curly 
single right-quote is 0x2019 (you can look it up here: 
http://www.unicode.org/charts/PDF/U2000.pdf) which, in binary, is 
0b0010000000011001 (you'll see why this is important in a moment). 
Since this value (0x2019) is bigger than 0x7ff and smaller than 
0xffff, when it is encoded in UTF-8, it is represented by three bytes. 
The first four bits are in the first byte, the next six bits are in 
the second byte, and the last six bits are in the last byte. These 
bytes have the templates:

    1110xxxx 10yyyyyy 10zzzzzz

So when the template blanks are filled in with the Unicode value for 
the curly single right-quote, that becomes:

    1110xxxx 10yyyyyy 10zzzzzz
        0010   000000   011001
or
    11100010 10000000 10011001

Those values, represented in hexadecimal, are:

    0xE2 0x80 0x99

Which is exactly the set of bytes what was encoded in my email.

If you've already deleted my original emails, you can doublecheck the 
archives, for example: 
http://marc.theaimsgroup.com/?l=mutt-users&m=114650327408492&w=2

The first instance of the curly-quote is in the word "I'm" in the 
line:

    Heh, much to the detriment? Meh. I'm encouraging those...

You may notice that those web archives don't include an encoding. Most 
browsers (e.g. Firefox) assume that all web pages without an encoding 
are in Windows-1252 (aka CP1252) encoding, so those bytes show up 
somewhat differently. They show up (in my version of Firefox) as an 
"a" with a hat (^), a euro symbol, and a trademark (TM) character. If 
you look those up in the Windows-1252 chart (for example, on 
Wikipedia: http://en.wikipedia.org/wiki/CP1252), you'll see that the a 
with a hat corresponds to the byte 0xE2, the euro symbol corresponds 
to 0x80, and the trademark character corresponds to 0x99.

> I can view all other UTF-8 Mails under de_DE@euro but not yours.  
> Please can you check it?

Yup. I think you need to double-check your side.

~Kyle
- -- 
It's amazing how much "mature wisdom" resembles being too tired.
                                                 -- Robert A. Heinlein
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iD8DBQFEW5/VBkIOoMqOI14RAtzVAJ9MYN9HQYbB36qAXDMr2QOFqhyoZwCgxQxs
kK4iIGOl3gmCvV6ZAZo1c0E=
=K+tb
-----END PGP SIGNATURE-----