Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8

To: Mutt Developers <mutt-dev@xxxxxxxx>
Subject: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
From: TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx>
Date: Fri, 5 Aug 2005 18:40:44 +0900
In-reply-to: <20050805042028.GA7893@xxxxxxxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
Mail-followup-to: Mutt Developers <mutt-dev@xxxxxxxx>
References: <mutt-pr-1536@xxxxxxxxxxxxx> <E1E0sCk-00040b-ID@xxxxxxxxxxxxxxxxxxxx> <20050805042028.GA7893@xxxxxxxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.9i

* Fri Aug  5 2005 Anders Helmersson <anders.helmersson.utsikt@xxxxxxxxxxxx>
> On Fri, 2005-08-05 at 04:45:02 +0200, Brendan Cully wrote:
> >  Looking at it a bit more closely I wonder if it could be done more
> >  efficiently. Is it possible to scan from the back of the buffer until
> >  mbrtowc returns n > 0, then trim the buffer to current_pos + n? Or am
> >  I missing some tricky multibyte issue? Seems a bit nicer than walking
> >  over every character of every line.
> 
> At least for UTF-8 it should be possible to do this, since the first
> byte in a multibyte characters has a unique pattern that includes the
> length. If we include (all) other multibyte encodings it may become
> more complicated, I haven't checked yet.

Yeah, I guess it could break ISO-2022, though I'm not familiar with
the charset standards. So it would be good to use the ``from-back''
trimming only for UTF-8 strings (or use AH's original patch for all
the charsets).

  FYI: debian bug#260623 has ASP's patch which implements Brendan's
  idea: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=260623

And I'm afraid the patches should malloc, clear and use mbstate
instead of NULL. Every other mbrtowc() in mutt is using its own
mbstate, AFAIK. It could work well even without its own mbstate,
but it would be hard to debug once a problem occurred.

I'm going to ask Japanese users for comments about this.
Perhaps someone would have a insight. :)

-- 
tamo

Follow-Ups:
- Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: Brendan Cully

References:
- mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: Brendan Cully
- Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
  - From: Anders Helmersson

Prev by Date: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
Next by Date: Re: mutt/2022: expansion of shell environment $VAR doesn't like digits in VARnames.
Previous by thread: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
Next by thread: Re: mutt/1536: Segment fault with long lines when LANG=*.UTF-8
Index(es):
- Date
- Thread