<<< Date Index >>>     <<< Thread Index >>>

Re: unicode body search fails, squares in subjects



Hello Alexandros,

 On Tuesday, August 17, 2004 at 8:38:18 PM +0200, Alexandros Droseltis wrote:

> in a UTF-8 locale [...] I can read and write greek and german [but]
> body-search (/~b) fails (also when it shouldn't) when the search
> pattern has greek or special german characters;

    You must search UTF-8 patterns in bodies that are decoded and
converted to UTF-8. Setting $thorough_search should do it (at expense
of a little processor time).

    In case that's not enough, perhaps try to unset LC_COLLATE so it
takes the default el_GR.UTF-8 value of LANG.


> some subject lines from greek/german mails consist of squares instead
> of the greek/special german characters

    Very probably bad raw 8 bits "Subject:" without charset indication.
The sending mailer is at fault (should use RFC 2047 encoding for Greek
or umlauts in headers). It would be nice to warn the sender and perhaps
advice him about "allow 8 bits chars in headers" that must be unchecked
in MSOE. If it's MSOE.

    You will need the $assumed_charset feature of the JA-patch by
Takashi Takizawa at <URL:http://www.emaillab.org/mutt/download14.html>.
Then set a different $assumed_charset value depending on language. Say
you have Greek mails everywhere, and German mails in specific folder(s):

| folder-hook . "set assumed_charset=iso-8859-7"
| folder-hook german\\.mbox$ "set assumed_charset=iso-8859-1"

    Or more complete to also deal with other sorts of problem mails:

| folder-hook . "\
|       unhook charset-hook ;\
|       charset-hook ^us-ascii$ iso-8859-7 ;\
|       set assumed_charset=iso-8859-7 "
| folder-hook german\\.mbox$ "\
|       unhook charset-hook ;\
|       charset-hook ^us-ascii$   windows-1252 ;\
|       charset-hook ^iso-8859-1$ windows-1252 ;\
|       set assumed_charset=windows-1252 "


> I would be grateful for any help

    It's a pleasure!


> In muttrc I have
> set charset="utf-8"

    Drop this line: The (same here) default will be derived from your
current locale. You have a +HAVE_LANGINFO_CODESET system.


> LC_CTYPE=el_GR.UTF-8

    Remove this from environment: Same default will be taken from LANG.

    This way the day you experiment changing terminal, you just change
LANG accordingly, and everything works at best.


> Content-Type: text/plain; charset=iso-8859-7

    Not optimal charset: US-Ascii would have sufficed. You should review
your $send_charset. May I suggest the untested:

| set 
send_charset="us-ascii:iso-8859-1:iso-8859-15:windows-1252:iso-8859-7:windows-1253:utf-8"


Bye!    Alain.
-- 
Hotmail users break umlauts for everyone else on a mailing list!
They should stop doing so immediately!
        « MSN considered HARMFUL » PCC CB on MU. © June 2002