<<< Date Index >>>     <<< Thread Index >>>

Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8



On Friday, 05 August 2005 at 18:40, TAKAHASHI Tamotsu wrote:
> * Fri Aug  5 2005 Anders Helmersson <anders.helmersson.utsikt@xxxxxxxxxxxx>
> > On Fri, 2005-08-05 at 04:45:02 +0200, Brendan Cully wrote:
> > >  Looking at it a bit more closely I wonder if it could be done more
> > >  efficiently. Is it possible to scan from the back of the buffer until
> > >  mbrtowc returns n > 0, then trim the buffer to current_pos + n? Or am
> > >  I missing some tricky multibyte issue? Seems a bit nicer than walking
> > >  over every character of every line.
> > 
> > At least for UTF-8 it should be possible to do this, since the first
> > byte in a multibyte characters has a unique pattern that includes the
> > length. If we include (all) other multibyte encodings it may become
> > more complicated, I haven't checked yet.
> 
> Yeah, I guess it could break ISO-2022, though I'm not familiar with
> the charset standards. So it would be good to use the ``from-back''
> trimming only for UTF-8 strings (or use AH's original patch for all
> the charsets).
> 
>   FYI: debian bug#260623 has ASP's patch which implements Brendan's
>   idea: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=260623
> 
> And I'm afraid the patches should malloc, clear and use mbstate
> instead of NULL. Every other mbrtowc() in mutt is using its own
> mbstate, AFAIK. It could work well even without its own mbstate,
> but it would be hard to debug once a problem occurred.
> 
> I'm going to ask Japanese users for comments about this.
> Perhaps someone would have a insight. :)

Ok, this sounds a little bit risky. How about another suggestion: we
only do the check when b_read == blen - 2? that is, when fgets has run
all the way to the end of the buffer. That should keep things speedy
in the normal case.

Attachment: pgpjgYdVgjj08.pgp
Description: PGP signature