<<< Date Index >>>     <<< Thread Index >>>

Re: Demoroniser (was: Display Filters)



On 2006-07-04, Alain Bench <veronatif@xxxxxxx> wrote:
>  On Monday, July 3, 2006 at 19:16:58 -0700, Gary Johnson wrote:
> 
> > [iso8859toascii.c] converts some characters in the range 128-255 to
> > strings of one or more ASCII characters.
> 
>     Very similar to Demoroniser, with some pros and cons. But I'm
> affraid most limitations and drawbacks I listed are the same, as they
> are in fact inherant to the $display_filter approach.
> 
>     OTOS I can now guess that you got from this filter more benefit than
> the average guy, because of specificities of you setup. You are more
> frequently in my 3rd point case, and less or never in the 1st case. More
> \200 octalisations and chances for filter action, than ?-masks. Guessed
> that seeing:
> 
> 
> > User-Agent: Mutt/1.5.9i
> > Content-Type: text/plain; charset=iso-8859-1
> >> The U+20AC '?' Euro symbol
> >> the U+0192 '?' hooked letter f
> >> The U+2030 '?' per mille sign
> 
>     Gaargl! Mutt sent out ugly Outlook-like lying MIME charset label...
> Scary! Contrary to the very common confusion, ISO-8859-1 and CP-1252 are
> different charsets (your filter should be named cp1252toascii.c). And
> the characters I wrote do not exist in ISO-8859-1. This means something
> is broken, probably iconv, and should be fixed. What is the output of:
> 
> | $ printf "\x80 \x83 \x89\n" | iconv -f windows-1252 -t us-ascii//TRANSLIT
> | EUR f o/oo

After discovering that printf on this SunOS 5.8 system does not 
support the \x escape and converting to octal:

$ printf "\200 \203 \211\n" | iconv -f windows-1252 -t us-ascii//TRANSLIT
EUR f o/oo

> | $ printf "\x80 \x83 \x89\n" | iconv -f cp1252 -t us-ascii//TRANSLIT
> | EUR f o/oo

$ printf "\200 \203 \211\n" | iconv -f cp1252 -t us-ascii//TRANSLIT
EUR f o/oo

> |
> | $ mutt -v

Mutt 1.5.9i (2005-03-13)
Copyright (C) 1996-2002 Michael R. Elkins and others.
Mutt comes with ABSOLUTELY NO WARRANTY; for details type `mutt -vv'.
Mutt is free software, and you are welcome to redistribute it
under certain conditions; type `mutt -vv' for details.

System: SunOS 5.8 (sun4u) [using ncurses 5.4]
Compile options:
-DOMAIN
+DEBUG
-HOMESPOOL  -USE_SETGID  +USE_DOTLOCK  -DL_STANDALONE  
+USE_FCNTL  -USE_FLOCK   -USE_INODESORT   
+USE_POP  -USE_IMAP  -USE_GSS  -USE_SSL  -USE_GNUTLS  -USE_SASL  -USE_SASL2  
+HAVE_REGCOMP  -USE_GNU_REGEX  
+HAVE_COLOR  +HAVE_START_COLOR  +HAVE_TYPEAHEAD  +HAVE_BKGDSET  
+HAVE_CURS_SET  +HAVE_META  +HAVE_RESIZETERM  
+CRYPT_BACKEND_CLASSIC_PGP  +CRYPT_BACKEND_CLASSIC_SMIME  -CRYPT_BACKEND_GPGME  
+BUFFY_SIZE -EXACT_ADDRESS  -SUN_ATTACHMENT  
+ENABLE_NLS  -LOCALES_HACK  +HAVE_WC_FUNCS  +HAVE_LANGINFO_CODESET  
+HAVE_LANGINFO_YESEXPR  
+HAVE_ICONV  +ICONV_NONTRANS  -HAVE_LIBIDN  +HAVE_GETSID  +HAVE_GETADDRINFO  
-USE_HCACHE  
ISPELL="/opt/TWWfsw/bin/ispell"
SENDMAIL="/usr/lib/sendmail"
MAILPATH="/var/mail"
PKGDATADIR="/home/garyjohn/src/SunOS/mutt-1.5.9i/share/mutt"
SYSCONFDIR="/home/garyjohn/src/SunOS/mutt-1.5.9i/etc"
EXECSHELL="/bin/sh"
-MIXMASTER
To contact the developers, please mail to <mutt-dev@xxxxxxxx>.
To report a bug, please use the flea(1) utility.

patch-1.5.5.1.gj.sigontop_space_fix.1
patch-1.5.5.1.gj.attach_sanitize.1
patch-1.5.5.1.gj.stuff_all_quoted.3

>     And what is in Mutt the value of ":set ?charset"

charset="iso-8859-1"

> > do you know of a HOW-TO or bootstrap procedure I could follow to get
> > this working?
> 
>     The Mutt Wiki <URL:http://wiki.mutt.org/?MuttFaq/Charset> has nearly
> everything from base settings to advanced solutions for some corner
> problems. But not much about X, fonts, and such.
> 
>     I'd say that the first step should be to determine what charset
> exactly does you current terminal and font display. Please describe what
> you see doing at shell:
> 
> | $ printf "\xC3\xBC \x9E \n"
> | ü ?
> 
>  - capital A with tilde, 1/4 symbol, small z with caron ==> CP-1252
>  - capital A with tilde, 1/4 symbol, nothing ==> Latin-1
>  - capital A with tilde, OE ligature, nothing ==> Latin-9
>  - 2 line drawing chars, and a Peseta symbol ==> CP-437
>  - 2 line drawing chars, and an x (multiplication sign) ==> CP-850
>  - small u with diaeresis, nothing or garbage ==> UTF-8

I usually use an xterm at work, but right now I'm using PuTTY on 
Windows XP to login remotely from home.  When I first tried the 
above (after converting the hex escapes to octal), I saw the second 
choice.  I checked my PuTTY Window -> Translation setting and saw 
that the character set was set to "ISO-8859-1:1998 (Latin-1, West 
Europe)".  I changed that to "UTF-8" and saw the last choice:  a 
small u with diaeresis followed by nothing.

I'll be in the office only briefly this week, but I'll try to run 
that experiment in an xterm there and report what I see.

In the mean time, I'll take a look at the wiki.

Thanks very much for your help.

Regards,
Gary

-- 
Gary Johnson                               | Agilent Technologies
garyjohn@xxxxxxxxxxxxxxx                   | Wireless Division
http://www.spocom.com/users/gjohnson/mutt/ | Spokane, Washington, USA