Smarter send_charset

To: mutt-dev@xxxxxxxx
Subject: Smarter send_charset
From: Ryan King <rking@xxxxxxxxxxxx>
Date: Mon, 5 Sep 2005 23:09:54 -0400
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.10i

Long-time user, first time poster.

The default send_charset is "us-ascii:iso-8859-1:utf-8".  From that list,
"Mutt will use the first character set into which the text can be converted
exactly."

I'm struggling to think of any way the utf-8 encoding will be selected -
because all bitpatterns from the smallest 0x00 to the grandest 0xFF are
valid ISO-8859-1 (as far as I know).  Try it and see:
    head /dev/urandom | iconv -f iso-8859-1 -t utf-8 > /dev/null
Run this as many times as you like, and iconv will never complain.  Now,
change that "-f iso-8859-1" to "-f utf-8", and your odds of iconv accepting
the input are worse than winning your state's lottery.

This means that, though the following line is going to be valid UTF-8, my
client will lie to you all about the charset being used:
    "Und sie sprachen: Wohlan, bauen wir uns eine Stadt und einen Turm,
    dessen Spitze an den Himmel reiche, und machen wir uns einen Namen, daß
    wir nicht zerstreut werden über die ganze Erde!"

My proposal, then, is to change the default send_charset to
"us-ascii:utf-8:iso-8859-1".

I can't see how this behavior would surprise anyone, due to UTF-8's
strictness.  Even if it did, isn't it time to start making UTF-8 the
default everywhere?

-rjk

Follow-Ups:
- Re: Smarter send_charset
  - From: Lionel Elie Mamane
- Re: Smarter send_charset
  - From: Alain Bench

Prev by Date: [2005-09-06] CVS repository changes
Next by Date: Re: IMAP server side search integration
Previous by thread: [2005-09-06] CVS repository changes
Next by thread: Re: Smarter send_charset
Index(es):
- Date
- Thread