<<< Date Index >>>     <<< Thread Index >>>

Re: Demoroniser (was: Display Filters)



On 2006-07-01, Alain Bench <veronatif@xxxxxxx> wrote:
> Hi Gary,
> 
>  On Thursday, June 29, 2006 at 18:58:50 -0700, Gary Johnson wrote:
> 
> >>> converts certain Microsoft characters to their ASCII equivalents.
> > that [Demoroniser] script is widely used and has documentation, I'd
> > recommend using it
> 
>     Sorry: Me not. I downloaded Demoroniser and evaluated its special
> characters conversion feature in the context of Mutt $display_filter. It
> has other features, and can be used for HTML outside of Mutt. But in the
> precise evaluated conditions, Demoroniser seems very bad to me:
> 
>  · In many cases it doesn't provide the expected benefit. You just get 1
> to 3 question marks for those special chars, as usual.
> 
>  · It doesn't apply to index, nor to replies you compose.
> 
>  · In some cases it acts, and provides the wanted Ascii approximations
> for \200-\237. That's mostly cases where something is broken: Mutt
> setup, iconv, locale, or the mail itself. Properly fixing the brokenness
> from the beginning would give better results.
> 
>  · The said Ascii approximations are sometimes not really nice. The
> U+20AC '?' Euro symbol shouts an error. Spurious HTML tags do appear
> (like "<em>f</em>" for the U+0192 '?' hooked letter f). The U+2030 '?'
> per mille sign hurts in some locales...
> 
>  · Demoroniser breaks badly in many locales having a charset different
> from Latin-1 (and similar scheme). Especially it garbles display of some
> "normal" accented letters and symbols in locales using UTF-8, CP-852,
> GBK, EUC-JP, EUC-TW, and such.
> 
>     Incomplete action and grave drawbacks. Conclusion: I recommend to
> avoid usage of Demoroniser as $display_filter in Mutt. Other better,
> cleaner, more elegant and universal solutions do exist against the base
> problem.

Hi Alain,

Rats!  I thought I remembered Demoroniser working better than that 
when I tried it out myself.  I should know better than to recommend 
something I don't regularly use myself.  I'm sorry I gave everyone 
bad advice.

I have attached the C source for the filter I really use for this 
purpose.  It's just an ad hoc translator that converts some 
characters in the range 128-255 to strings of one or more ASCII 
characters.  It works pretty well for me for characters in the 
e-mail I regularly receive that are otherwise not displayed properly 
by my setup.  Given many of your comments above, however, I'm not 
sure it is really any better than Demoroniser.

Until recently, mutt's display of characters just hasn't been a 
major problem for me.  All of the e-mail I receive is in English, 
most of it has contained only characters within the ASCII range, and 
I have fixed the funky Microsoft characters from Outlook with the 
above-mentioned display_filter.

I think that the time may have come for me to finally do this right, 
though, as you suggest.  I have received e-mail from other countries 
for a long time, but those senders generally used the ASCII 
character set.  Now I'm starting to receive e-mail containing 
character sets that support the native language of the sender and 
which include punctuation marks and non-English characters that 
don't render correctly.

Given all the parts that have to work correctly--font sets, font 
server, terminal, application charset-recognition--and given that no 
one else on these systems uses anything other than ASCII, do you 
know of a HOW-TO or bootstrap procedure I could follow to get this 
working?

Regards,
Gary

-- 
Gary Johnson                               | Agilent Technologies
garyjohn@xxxxxxxxxxxxxxx                   | Wireless Division
http://www.spocom.com/users/gjohnson/mutt/ | Spokane, Washington, USA