<<< Date Index >>>     <<< Thread Index >>>

Re: Demoroniser (was: Display Filters)



 On Monday, July 3, 2006 at 19:16:58 -0700, Gary Johnson wrote:

> [iso8859toascii.c] converts some characters in the range 128-255 to
> strings of one or more ASCII characters.

    Very similar to Demoroniser, with some pros and cons. But I'm
affraid most limitations and drawbacks I listed are the same, as they
are in fact inherant to the $display_filter approach.

    OTOS I can now guess that you got from this filter more benefit than
the average guy, because of specificities of you setup. You are more
frequently in my 3rd point case, and less or never in the 1st case. More
\200 octalisations and chances for filter action, than ?-masks. Guessed
that seeing:


> User-Agent: Mutt/1.5.9i
> Content-Type: text/plain; charset=iso-8859-1
>> The U+20AC '€' Euro symbol
>> the U+0192 'ƒ' hooked letter f
>> The U+2030 '‰' per mille sign

    Gaargl! Mutt sent out ugly Outlook-like lying MIME charset label...
Scary! Contrary to the very common confusion, ISO-8859-1 and CP-1252 are
different charsets (your filter should be named cp1252toascii.c). And
the characters I wrote do not exist in ISO-8859-1. This means something
is broken, probably iconv, and should be fixed. What is the output of:

| $ printf "\x80 \x83 \x89\n" | iconv -f windows-1252 -t us-ascii//TRANSLIT
| EUR f o/oo
|
| $ printf "\x80 \x83 \x89\n" | iconv -f cp1252 -t us-ascii//TRANSLIT
| EUR f o/oo
|
| $ mutt -v

    And what is in Mutt the value of ":set ?charset"


> do you know of a HOW-TO or bootstrap procedure I could follow to get
> this working?

    The Mutt Wiki <URL:http://wiki.mutt.org/?MuttFaq/Charset> has nearly
everything from base settings to advanced solutions for some corner
problems. But not much about X, fonts, and such.

    I'd say that the first step should be to determine what charset
exactly does you current terminal and font display. Please describe what
you see doing at shell:

| $ printf "\xC3\xBC \x9E \n"
| ü ž

 - capital A with tilde, 1/4 symbol, small z with caron ==> CP-1252
 - capital A with tilde, 1/4 symbol, nothing ==> Latin-1
 - capital A with tilde, OE ligature, nothing ==> Latin-9
 - 2 line drawing chars, and a Peseta symbol ==> CP-437
 - 2 line drawing chars, and an x (multiplication sign) ==> CP-850
 - small u with diaeresis, nothing or garbage ==> UTF-8


Bye!    Alain.
-- 
A "Reply-To:" header field pointing to the same email address
as the "From:" is uselessly redundant: A loss of space.