<<< Date Index >>>     <<< Thread Index >>>

Re: $thorough_search ([Mutt] #1317: wish $edit_charset)



Hi,

On 2009-07-05 11:39:04 +0200, Rocco Rutte wrote:
> Hi,
> 
> * Vincent Lefevre wrote:
> 
> > I don't know what you mean here, but by default, Mutt does bad things
> > with charsets. The $thorough_search variable is broken by design and
> > should be removed.
> 
> Can you explain that a bit, please?

I've attached a testcase. Open it under, say, UTF-8 locales, and with
$thorough_search variable unset (the problem doesn't occur when it is
set). Then limit to "~Bé". Only the message "body in utf-8" is found.

Note: I wonder why Mutt doesn't encode the regexp in the encoding of
the message / body part (any time it finds a new encoding). IMHO, that
would be faster than decoding the body part as the number of different
encodings remain limited in practice.

Note that the manual says:

   Users searching attachments or for non-ASCII characters should set this
   value because decoding also includes MIME parsing/decoding and possible
   character set conversions. Otherwise mutt will attempt to match against
   the raw message received (for example quoted-printable encoded or with
   encoded headers) which may lead to incorrect search results.

but this is worse than that. For instance, I need to set
$thorough_search to search for some strings with ASCII characters
only, when such strings contain a space, as some mailers encode
all spaces as =20 (more generally they can also occur at the end
of a line).

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
>From a@xxxxxxxxx Sun Jul  5 00:43:52 2009
Date: Sun, 5 Jul 2009 00:43:52 +0200
Subject: body in utf-8
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Status: RO
Content-Length: 57
Lines: 1

In the index, limit to ~Bé with $thorough_search unset.

>From a@xxxxxxxxx Sun Jul  5 00:43:52 2009
Date: Sun, 5 Jul 2009 00:43:52 +0200
Subject: body in iso-8859-1
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Status: RO
Content-Length: 56
Lines: 1

In the index, limit to ~Bé with $thorough_search unset.