<<< Date Index >>>     <<< Thread Index >>>

Smarter send_charset



Long-time user, first time poster.

The default send_charset is "us-ascii:iso-8859-1:utf-8".  From that list,
"Mutt will use the first character set into which the text can be converted
exactly."

I'm struggling to think of any way the utf-8 encoding will be selected -
because all bitpatterns from the smallest 0x00 to the grandest 0xFF are
valid ISO-8859-1 (as far as I know).  Try it and see:
    head /dev/urandom | iconv -f iso-8859-1 -t utf-8 > /dev/null
Run this as many times as you like, and iconv will never complain.  Now,
change that "-f iso-8859-1" to "-f utf-8", and your odds of iconv accepting
the input are worse than winning your state's lottery.

This means that, though the following line is going to be valid UTF-8, my
client will lie to you all about the charset being used:
    "Und sie sprachen: Wohlan, bauen wir uns eine Stadt und einen Turm,
    dessen Spitze an den Himmel reiche, und machen wir uns einen Namen, daß
    wir nicht zerstreut werden über die ganze Erde!"

My proposal, then, is to change the default send_charset to
"us-ascii:utf-8:iso-8859-1".

I can't see how this behavior would surprise anyone, due to UTF-8's
strictness.  Even if it did, isn't it time to start making UTF-8 the
default everywhere?

-rjk