<<< Date Index >>>     <<< Thread Index >>>

Re: charset question



 On Wednesday, September 8, 2004 at 11:24:07 AM +0200, Martin F. Krafft wrote:

> also sprach Alain Bench <messtic@xxxxxxxxx> [2004.09.08.1030 +0200]:
>> You probably want single "windows-1252" [in $assumed_charset] (it
>> covers itself, and in turn Latin-1, and US-Ascii).
> I don't see why I should use anything with "windows" in the name...

    I proposed CP-1252 because it's handy and effective: Beeing a
perfect superset of both most common charsets, one can just declare it
single, and have the functional equivalent of the 3 in order. If
declaring 3 charsets was possible. But it is not really.

    And also because in such mails where $assumed_charset applies, the
actual charset does happen to really be CP-1252.

    But you can well set $assumed_charset="iso-8859-1", though: It will
work nearly in the same way. Loss of readability will be quite
reasonable (¤uro and some such in some rare minority mails will be
masked by question marks). Your choice, and this one is perfectly valid.


>> you /probably/ don't want UTF-8
> I should want UTF-8. I should force my readers to upgrade to
> unicode...

    We were talking about $assumed_charset: That is about what *you*
read, not what you send to your readers. I confirm you /probably/ don't
want UTF-8 in $assumed_charset.


    [$file_charset]
>> Highly inconsistent.
> I completely don't understand.

    One attaches a text file. Mutt tries to guess in which charset the
file is. Later upon sending Mutt will convert file to $send_charset, and
MIME-label it accordingly.

    For guessing file's charset, Mutt tries each $file_charset in turn,
until one charset is valid for the entire text file.

    My point was that your $file_charset was badly constructed, with one
catch-nearly-all charset "masking" the following ones which would never
(or rarely) be selected. With this setting, your Mutt was able to
auto-detect Latin-1 text files OK, but never Latin-9, and half time only
UTF-8.

    Take those 2 anonymous bytes: "é" (C3 A9). That's *one* UTF-8
character (e acute "é"), but can also be misinterpreted as two Latin-1
characters (A tilde, copyright sign). Check Latin-1 first, and Mutt is
fooled. Check UTF-8 first, the correct "é" is recognised.


> I will try to read Mascheck's page...

    Oh sorry: My .sig was primarily intended for Niklas (another Mutt
user), not you. Very interesting site, related, tough mostly out of our
today's topic.


> Mail-Followup-To: Mutt users ml <mutt-users@xxxxxxxx>
> [please keep CC'ing me]

    You can help repliers to CC you automatically by replacing the
declaration "subscribe ^mutt-users@mutt\\.org$" in your muttrc by a
"lists ^mutt-users@mutt\\.org$". Your MFT will reflect your desire.


Bye!    Alain.
-- 
Give your computer's unused idle processor cycles to a scientific goal:
The Folding@home project at <URL:http://folding.stanford.edu/>.