<<< Date Index >>>     <<< Thread Index >>>

Re: multiple entries in send_charset



 On Thursday, October 23, 2003 at 8:49:04 AM +0200, Andrei A. Voropaev wrote:

> On Thu, Oct 16, 2003 at 03:34:44PM +0200, Alain Bench wrote:
>> People with -HAVE_LANGINFO_CODESET can do it with [snip]
> Does it really have to be so complicated? Right now I don't have
> anything and things are working great both in UTF-8 and in 8-bit
> enviroments.

    You run a +HAVE_LANGINFO_CODESET system. Note the leading '+'. So
your default $charset is automatically set to the correct one from the
current locale. Other systems exist, needing "manual" $charset setting:
old FreeBSD < 4.5, old NetBSD < 1.6, OpenBSD, Linux libc5 and some early
Glibc2, Darwin, Cygwin...


>> My present émail is sent L1/8bit
> See if this response got converted.

    Yes, it came converted to quoted-unreadable by an unknown MTA (not a
problem to read L1/QP: Mutt decoded it cleanly).


>> a $send_charset of "...:koi8-r:koi8-u:windows-1251:utf-8"
> I think koi-8-r and windows-1251 won't work together since russian
> characters take exactly the same positions in these encodings. Just
> different characters order. So mutt will never figure out if this is
> koi8-r or windows-1251

    There is no such difficulty, since Mutt knows what the $charset is
at the beginning. Mutt selects a charset for sending by trying to
convert text from $charset to each of $send_charset in turn, until it
succeeds fully.

    If KOI8-R and CP-1251 were really both containing all same chars in
mixed order, then Mutt would simply only select the first listed.

    But CP-1251 is very different from KOI8-R, having 57 more chars
(mostly Cyrillics for Serbian, Ukrainian, and Byelorussian) and lacking
58 (mostly semi-graphics). This makes CP-1251 somewhat more "general"
for Cyrillic users.

    And nearly all added chars are also in ISO-8859-5. Someone said me
offlist (thanks!) that if ISO-8859-5 is not much used in practice, it
should theoretically always be used in priority over country or platform
specific charsets as KOI8-* or 1251.

    This would lead us to proposal:

| $send_charset="...:iso-8859-5:koi8-r:koi8-u:windows-1251:..."

    Note this leads to most mails sent in ISO. KOI8-* would nearly never
be selected: Only in cases one writes Cyrillic and semi-graphics. Same
for 1251: Only selected when one writes Cyrillic and some Windows
specials as the infamous smart apostrophe, or other symbols as euro.

    Would this lead to any problems? Not with Mutt of course, but other
mailers: Say a Russian Mutt mails to another Russian using any platform
any mailer. Mutt will select ISO for sending. Are there mailers able to
display both K8R and 1251, but not ISO? Are mailers with fixed K8R
charset (or fixed 1251) widely used?


    I try to build something adviceable to anyone. Any comments welcome.
Thanks to you Andrei, and thanks to Sergei.


Bye!    Alain.