On Friday, 05 August 2005 at 18:40, TAKAHASHI Tamotsu wrote: > * Fri Aug 5 2005 Anders Helmersson <anders.helmersson.utsikt@xxxxxxxxxxxx> > > On Fri, 2005-08-05 at 04:45:02 +0200, Brendan Cully wrote: > > > Looking at it a bit more closely I wonder if it could be done more > > > efficiently. Is it possible to scan from the back of the buffer until > > > mbrtowc returns n > 0, then trim the buffer to current_pos + n? Or am > > > I missing some tricky multibyte issue? Seems a bit nicer than walking > > > over every character of every line. > > > > At least for UTF-8 it should be possible to do this, since the first > > byte in a multibyte characters has a unique pattern that includes the > > length. If we include (all) other multibyte encodings it may become > > more complicated, I haven't checked yet. > > Yeah, I guess it could break ISO-2022, though I'm not familiar with > the charset standards. So it would be good to use the ``from-back'' > trimming only for UTF-8 strings (or use AH's original patch for all > the charsets). > > FYI: debian bug#260623 has ASP's patch which implements Brendan's > idea: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=260623 > > And I'm afraid the patches should malloc, clear and use mbstate > instead of NULL. Every other mbrtowc() in mutt is using its own > mbstate, AFAIK. It could work well even without its own mbstate, > but it would be hard to debug once a problem occurred. > > I'm going to ask Japanese users for comments about this. > Perhaps someone would have a insight. :) Ok, this sounds a little bit risky. How about another suggestion: we only do the check when b_read == blen - 2? that is, when fgets has run all the way to the end of the buffer. That should keep things speedy in the normal case.
Attachment:
pgpjgYdVgjj08.pgp
Description: PGP signature