Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8

To: Mutt Developers <mutt-dev@xxxxxxxx>
Subject: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
From: Brendan Cully <brendan@xxxxxxxxxx>
Date: Fri, 05 Aug 2005 22:35:23 -0700
In-reply-to: <20050805094044.GD4188@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
Mail-followup-to: Mutt Developers <mutt-dev@xxxxxxxx>
References: <mutt-pr-1536@xxxxxxxxxxxxx> <E1E0sCk-00040b-ID@xxxxxxxxxxxxxxxxxxxx> <20050805042028.GA7893@xxxxxxxxxxxx> <20050805094044.GD4188@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.9i

On Friday, 05 August 2005 at 18:40, TAKAHASHI Tamotsu wrote:
> * Fri Aug  5 2005 Anders Helmersson <anders.helmersson.utsikt@xxxxxxxxxxxx>
> > On Fri, 2005-08-05 at 04:45:02 +0200, Brendan Cully wrote:
> > >  Looking at it a bit more closely I wonder if it could be done more
> > >  efficiently. Is it possible to scan from the back of the buffer until
> > >  mbrtowc returns n > 0, then trim the buffer to current_pos + n? Or am
> > >  I missing some tricky multibyte issue? Seems a bit nicer than walking
> > >  over every character of every line.
> > 
> > At least for UTF-8 it should be possible to do this, since the first
> > byte in a multibyte characters has a unique pattern that includes the
> > length. If we include (all) other multibyte encodings it may become
> > more complicated, I haven't checked yet.
> 
> Yeah, I guess it could break ISO-2022, though I'm not familiar with
> the charset standards. So it would be good to use the ``from-back''
> trimming only for UTF-8 strings (or use AH's original patch for all
> the charsets).
> 
>   FYI: debian bug#260623 has ASP's patch which implements Brendan's
>   idea: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=260623
> 
> And I'm afraid the patches should malloc, clear and use mbstate
> instead of NULL. Every other mbrtowc() in mutt is using its own
> mbstate, AFAIK. It could work well even without its own mbstate,
> but it would be hard to debug once a problem occurred.
> 
> I'm going to ask Japanese users for comments about this.
> Perhaps someone would have a insight. :)

Ok, this sounds a little bit risky. How about another suggestion: we
only do the check when b_read == blen - 2? that is, when fgets has run
all the way to the end of the buffer. That should keep things speedy
in the normal case.

Attachment: pgpjgYdVgjj08.pgp
Description: PGP signature

Follow-Ups:
- Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: TAKAHASHI Tamotsu

References:
- mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: Brendan Cully
- Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: Anders Helmersson
- Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: TAKAHASHI Tamotsu

Prev by Date: Re: mutt: Improve const correctness
Next by Date: 1 minute Refi Form
Previous by thread: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
Next by thread: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
Index(es):
- Date
- Thread