<<< Date Index >>>     <<< Thread Index >>>

Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8



* Fri Aug  5 2005 Anders Helmersson <anders.helmersson.utsikt@xxxxxxxxxxxx>
> On Fri, 2005-08-05 at 04:45:02 +0200, Brendan Cully wrote:
> >  Looking at it a bit more closely I wonder if it could be done more
> >  efficiently. Is it possible to scan from the back of the buffer until
> >  mbrtowc returns n > 0, then trim the buffer to current_pos + n? Or am
> >  I missing some tricky multibyte issue? Seems a bit nicer than walking
> >  over every character of every line.
> 
> At least for UTF-8 it should be possible to do this, since the first
> byte in a multibyte characters has a unique pattern that includes the
> length. If we include (all) other multibyte encodings it may become
> more complicated, I haven't checked yet.

Yeah, I guess it could break ISO-2022, though I'm not familiar with
the charset standards. So it would be good to use the ``from-back''
trimming only for UTF-8 strings (or use AH's original patch for all
the charsets).

  FYI: debian bug#260623 has ASP's patch which implements Brendan's
  idea: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=260623

And I'm afraid the patches should malloc, clear and use mbstate
instead of NULL. Every other mbrtowc() in mutt is using its own
mbstate, AFAIK. It could work well even without its own mbstate,
but it would be hard to debug once a problem occurred.

I'm going to ask Japanese users for comments about this.
Perhaps someone would have a insight. :)

-- 
tamo