Re: reply_regex bug on Linux?
On 2008-11-12, TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx> wrote:
> * Wed Nov 12 2008 Gary Johnson <garyjohn@xxxxxxxxxxxxxxx>
> > worked fine until I added a particular term to the 'reply_regex'
> > expression. That term contained some non-ASCII characters that have
> > appeared in place of "Re" in replies I've received from Outlook
> > users in Beijing. Apparently my Solaris mutt read those characters
> > just as literal characters in the expression while my Linux mutt
> > interprets them as something else--I don't know what. I also don't
> > know if the differing interpretation occurs in the regular
> > expression engine or when mutt parses the line.
> >
> > The term I added, after "|aw", was
> >
> > |\347\255.\345\244.
>
> Compare your locale settings (e.g. LC_CTYPE, LC_ALL and LANG
> environment variables and $charset) of your two systems, i.e.
> Solaris and Linux(glibc).
>
> mutt_which_case() in pattern.c forces case-sensitive regex
> if mbrtowc() fails. I don't know what charset your term is in,
> but the charset has to be the same as your $charset. Otherwise
> your regex ("re:") doesn't match case-insensitively ("RE:").
That was it. Thank you so much!
The output of 'locale' on Solaris is:
LANG=
LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LC_TIME=en_US.ISO8859-1
LC_COLLATE=C
LC_MONETARY=en_US.ISO8859-1
LC_MESSAGES=C
LC_ALL=
The output of 'locale' on Linux was:
LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
The value of 'charset' is "iso-8859-1//TRANSLIT".
I spent some time reading about locale and did some experimenting.
It turns out that this Linux system (Red Hat Enterprise Linux WS
release 4) defaults to these values:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
but some of these had caused various problems in the past, so I had
put first "LC_COLLATE=C" and then "LC_ALL=C" in my ~/.profile to fix
those problems when using Linux. I have replaced "LC_ALL=C" with
"LANG=en_US.ISO8859-1" and Mutt's 'reply_regex' now appears to work
as it should. FWIW, the output of 'locale' on Linux is now:
LANG=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE=C
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=
Regards,
Gary