Re: reply_regex bug on Linux?

To: mutt-dev@xxxxxxxx
Subject: Re: reply_regex bug on Linux?
From: Gary Johnson <garyjohn@xxxxxxxxxxxxxxx>
Date: Fri, 14 Nov 2008 15:54:01 -0800
In-reply-to: <20081112141902.GB2040@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-post: <mailto:mutt-dev@mutt.org>
List-unsubscribe: send mail to majordomo@mutt.org, body only "unsubscribe mutt-dev"
Mail-followup-to: mutt-dev@xxxxxxxx
References: <20081111081806.GA28438@xxxxxxxxxxxxxxxxxxxxxxx> <20081111144516.GO33138@xxxxxxxxxxxxx> <20081112085052.GB10795@xxxxxxxxxxxxxxxxxxxxxxx> <20081112141902.GB2040@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.17 (2007-11-01)

On 2008-11-12, TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx> wrote:
> * Wed Nov 12 2008 Gary Johnson <garyjohn@xxxxxxxxxxxxxxx>
> > worked fine until I added a particular term to the 'reply_regex' 
> > expression.  That term contained some non-ASCII characters that have 
> > appeared in place of "Re" in replies I've received from Outlook 
> > users in Beijing.  Apparently my Solaris mutt read those characters 
> > just as literal characters in the expression while my Linux mutt 
> > interprets them as something else--I don't know what.  I also don't 
> > know if the differing interpretation occurs in the regular 
> > expression engine or when mutt parses the line.
> > 
> > The term I added, after "|aw", was
> > 
> >    |\347\255.\345\244.
> 
> Compare your locale settings (e.g. LC_CTYPE, LC_ALL and LANG
> environment variables and $charset) of your two systems, i.e.
> Solaris and Linux(glibc).
> 
> mutt_which_case() in pattern.c forces case-sensitive regex
> if mbrtowc() fails. I don't know what charset your term is in,
> but the charset has to be the same as your $charset. Otherwise
> your regex ("re:") doesn't match case-insensitively ("RE:").

That was it.  Thank you so much!

The output of 'locale' on Solaris is:

   LANG=
   LC_CTYPE=en_US.ISO8859-1
   LC_NUMERIC=en_US.ISO8859-1
   LC_TIME=en_US.ISO8859-1
   LC_COLLATE=C
   LC_MONETARY=en_US.ISO8859-1
   LC_MESSAGES=C
   LC_ALL=

The output of 'locale' on Linux was:

   LANG=en_US.UTF-8
   LC_CTYPE="C"
   LC_NUMERIC="C"
   LC_TIME="C"
   LC_COLLATE="C"
   LC_MONETARY="C"
   LC_MESSAGES="C"
   LC_PAPER="C"
   LC_NAME="C"
   LC_ADDRESS="C"
   LC_TELEPHONE="C"
   LC_MEASUREMENT="C"
   LC_IDENTIFICATION="C"
   LC_ALL=C

The value of 'charset' is "iso-8859-1//TRANSLIT".

I spent some time reading about locale and did some experimenting.  

It turns out that this Linux system (Red Hat Enterprise Linux WS 
release 4) defaults to these values:

   LANG=en_US.UTF-8
   LC_CTYPE="en_US.UTF-8"
   LC_NUMERIC="en_US.UTF-8"
   LC_TIME="en_US.UTF-8"
   LC_COLLATE="en_US.UTF-8"
   LC_MONETARY="en_US.UTF-8"
   LC_MESSAGES="en_US.UTF-8"
   LC_PAPER="en_US.UTF-8"
   LC_NAME="en_US.UTF-8"
   LC_ADDRESS="en_US.UTF-8"
   LC_TELEPHONE="en_US.UTF-8"
   LC_MEASUREMENT="en_US.UTF-8"
   LC_IDENTIFICATION="en_US.UTF-8"
   LC_ALL=

but some of these had caused various problems in the past, so I had 
put first "LC_COLLATE=C" and then "LC_ALL=C" in my ~/.profile to fix 
those problems when using Linux.  I have replaced "LC_ALL=C" with 
"LANG=en_US.ISO8859-1" and Mutt's 'reply_regex' now appears to work 
as it should.  FWIW, the output of 'locale' on Linux is now:

   LANG=en_US.ISO8859-1
   LC_CTYPE=en_US.ISO8859-1
   LC_NUMERIC="en_US.ISO8859-1"
   LC_TIME="en_US.ISO8859-1"
   LC_COLLATE=C
   LC_MONETARY="en_US.ISO8859-1"
   LC_MESSAGES="en_US.ISO8859-1"
   LC_PAPER="en_US.ISO8859-1"
   LC_NAME="en_US.ISO8859-1"
   LC_ADDRESS="en_US.ISO8859-1"
   LC_TELEPHONE="en_US.ISO8859-1"
   LC_MEASUREMENT="en_US.ISO8859-1"
   LC_IDENTIFICATION="en_US.ISO8859-1"
   LC_ALL=

Regards,
Gary

References:
- reply_regex bug on Linux?
  - From: Gary Johnson
- Re: reply_regex bug on Linux?
  - From: Kyle Wheeler
- Re: reply_regex bug on Linux?
  - From: Gary Johnson
- Re: reply_regex bug on Linux?
  - From: TAKAHASHI Tamotsu

Prev by Date: Re: Parent match
Next by Date: [PATCH] Extraneous SMTP messages in batch mode
Previous by thread: Re: reply_regex bug on Linux?
Next by thread: [PATCH] Parent match
Index(es):
- Date
- Thread