<<< Date Index >>>     <<< Thread Index >>>

Re: reply_regex bug on Linux?



On 2008-11-12, TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx> wrote:
> * Wed Nov 12 2008 Gary Johnson <garyjohn@xxxxxxxxxxxxxxx>
> > worked fine until I added a particular term to the 'reply_regex' 
> > expression.  That term contained some non-ASCII characters that have 
> > appeared in place of "Re" in replies I've received from Outlook 
> > users in Beijing.  Apparently my Solaris mutt read those characters 
> > just as literal characters in the expression while my Linux mutt 
> > interprets them as something else--I don't know what.  I also don't 
> > know if the differing interpretation occurs in the regular 
> > expression engine or when mutt parses the line.
> > 
> > The term I added, after "|aw", was
> > 
> >    |\347\255.\345\244.
> 
> Compare your locale settings (e.g. LC_CTYPE, LC_ALL and LANG
> environment variables and $charset) of your two systems, i.e.
> Solaris and Linux(glibc).
> 
> mutt_which_case() in pattern.c forces case-sensitive regex
> if mbrtowc() fails. I don't know what charset your term is in,
> but the charset has to be the same as your $charset. Otherwise
> your regex ("re:") doesn't match case-insensitively ("RE:").

That was it.  Thank you so much!

The output of 'locale' on Solaris is:

   LANG=
   LC_CTYPE=en_US.ISO8859-1
   LC_NUMERIC=en_US.ISO8859-1
   LC_TIME=en_US.ISO8859-1
   LC_COLLATE=C
   LC_MONETARY=en_US.ISO8859-1
   LC_MESSAGES=C
   LC_ALL=

The output of 'locale' on Linux was:

   LANG=en_US.UTF-8
   LC_CTYPE="C"
   LC_NUMERIC="C"
   LC_TIME="C"
   LC_COLLATE="C"
   LC_MONETARY="C"
   LC_MESSAGES="C"
   LC_PAPER="C"
   LC_NAME="C"
   LC_ADDRESS="C"
   LC_TELEPHONE="C"
   LC_MEASUREMENT="C"
   LC_IDENTIFICATION="C"
   LC_ALL=C

The value of 'charset' is "iso-8859-1//TRANSLIT".

I spent some time reading about locale and did some experimenting.  

It turns out that this Linux system (Red Hat Enterprise Linux WS 
release 4) defaults to these values:

   LANG=en_US.UTF-8
   LC_CTYPE="en_US.UTF-8"
   LC_NUMERIC="en_US.UTF-8"
   LC_TIME="en_US.UTF-8"
   LC_COLLATE="en_US.UTF-8"
   LC_MONETARY="en_US.UTF-8"
   LC_MESSAGES="en_US.UTF-8"
   LC_PAPER="en_US.UTF-8"
   LC_NAME="en_US.UTF-8"
   LC_ADDRESS="en_US.UTF-8"
   LC_TELEPHONE="en_US.UTF-8"
   LC_MEASUREMENT="en_US.UTF-8"
   LC_IDENTIFICATION="en_US.UTF-8"
   LC_ALL=

but some of these had caused various problems in the past, so I had 
put first "LC_COLLATE=C" and then "LC_ALL=C" in my ~/.profile to fix 
those problems when using Linux.  I have replaced "LC_ALL=C" with 
"LANG=en_US.ISO8859-1" and Mutt's 'reply_regex' now appears to work 
as it should.  FWIW, the output of 'locale' on Linux is now:

   LANG=en_US.ISO8859-1
   LC_CTYPE=en_US.ISO8859-1
   LC_NUMERIC="en_US.ISO8859-1"
   LC_TIME="en_US.ISO8859-1"
   LC_COLLATE=C
   LC_MONETARY="en_US.ISO8859-1"
   LC_MESSAGES="en_US.ISO8859-1"
   LC_PAPER="en_US.ISO8859-1"
   LC_NAME="en_US.ISO8859-1"
   LC_ADDRESS="en_US.ISO8859-1"
   LC_TELEPHONE="en_US.ISO8859-1"
   LC_MEASUREMENT="en_US.ISO8859-1"
   LC_IDENTIFICATION="en_US.ISO8859-1"
   LC_ALL=

Regards,
Gary