<<< Date Index >>>     <<< Thread Index >>>

Re: Subject üî



 On Monday, September 3, 2007 at 15:22:31 -0400, dv1445@xxxxxxxxx wrote:

> my muttrc, which wants to see "set charset=UTF-8" instead of
> "=en_US.UTF-8", while the latter is what $LANG wants to see.

    Exactly. When I said they have to agree, it was about the charset
only. However, muttrc doesn't want to set $charset: The default value is
automatically derived from the current locale. This automatic link was
absent or broken in previous versions, but since around
MacOS 10.3 Panther, it works well.


> in the UTF-8 mode, I see that my subject line in the header is encoded
> in ISO-8859-1.

    Feature: Upon sending a text, Mutt determines the first best suited
minimal necessary and sufficient charset declared in the $send_charset
list.


> I have to do something to xterm to force it to work under UTF-8

    Right. Try to start it via the uxterm command.


> I should be doing something to my vimrc to make vim write the file in
> the right encoding.

    That may very well be the case: Mutt's $editor has to dumbly read
and write in locale's charset, period. Smart charset auto-sensing
(Vim's $fileencodings) is useless (to Mutt), and can be harmfull (when
it gives the apparence of rightness to broken characters). However, that
doesn't seem to be your problem of today.


> Ångström.  SØren.  Cristóbal. Æneid.  Straße.  ¼dipus.

    All seems well in body. Subject is "[]üîå[]", meaning "uia" flanked
by 2 U+FFFD replacement characters (in my font they look like empty
squares). That seems to be what you described sending. Next question is:
What added those U+FFFD chars??

    I dropped the replacements, and just kept the "uia" in subject:
Please reply with your UTF setup. So we'll see if replacements chars are
reintroduced when you don't type special chars yourself.


> Is it better to do Latin 1 or UTF-8?

    Latin-1 is extremely common, and contains the characters needed by
most western languages. Spanish, German, French, Portugese, and some
such. Well... It lacks the Euro symbol ¤, and the oe ligatures ½ ¼.

    UTF-8 is a form of Unicode, containing all characters of the world.
Including the above, plus Polish, Russian, Chinese ideograms, Tamilian,
and all such. It is however a little less portable, a little more
difficult to setup (nothing hairy), and on Macs a little less stable.

    My advice would be to make the (little) effort to setup UTF-8 for
yourself. And for sending, trust Mutt's optimal choice. As for setting
a right $send_charset list, it depends on the languages you're writing.
The default $send_charset is a good start, and look at my infosig for
another example.


 On Monday, September 3, 2007 at 16:10:16 -0400, dv1445@xxxxxxxxx wrote:

> Wow, I can see myself that this looks terrible. So does the subject
> line

    The parent mail looks good (outside of the U+FFFD pair). You see it
broken because you broke your Latin-1 setup: Do not set $charset.


> when I set everything to UTF-8, the little arrows drawn in thread view
> are all wrong.

    Those semigraphic chars should work provided that $charset=utf-8
exactly (that's the default value derived from LANG=en_US.UTF-8), that
TERM=nsterm-16color, and that you have unchecked Terminal.app's "Wide
glyphs for Japanese/Chinese/etc." setting. If they are still wrong,
please describe how they look like.


Bye!    Alain.
-- 
Mutt muttrc tip to send mails in best adapted first necessary and sufficient
charset (version for East Europe Latin-2/CP-852/CP-1250 terminal users):
set 
send_charset="us-ascii:iso-8859-1:iso-8859-15:windows-1252:iso-8859-2:windows-1250:utf-8"