Re: mutt_FormatString() not multibyte-aware

To: mutt-dev@xxxxxxxx
Subject: Re: mutt_FormatString() not multibyte-aware
From: Rocco Rutte <pdmef@xxxxxxx>
Date: Sat, 24 Jun 2006 09:21:45 +0000
In-reply-to: <20060624074616.GA15338@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?Subject="Unsubscribe Mutt Dev"?body=unsubscribe>
Mail-followup-to: mutt-dev@xxxxxxxx
Organization: Berlin University of Technology
References: <20060623130224.GD3538@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20060624074616.GA15338@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: mutt-ng/devel-r802 (SunOS)

* TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx>:

* Fri Jun 23 2006 Rocco Rutte <pdmef@xxxxxxx>

I would like to fix it but don't know how since one of the mbytefunctions failed for we in always returning 1 for the width. Maybe wecould also convert to utf-8 first because it's so trivial to test forcontinuations (as mutt IIRC does in other places already).

Yeah, mbs is too hard to handle because you have to keep mbstate.
See the attached patch. This is just a hack, but it works if you
have wcswidth, wmemcpy, wcslen, etc.

Ouch, it's really a hack and requires more work because once you havewchar_t, all the callbacks should be aware of it too, I guess.

Should I file a bug report for this to have a discussion in the BTS oris a real fix easy enough so I don't have to?

I'm afraid there is no easy fix.

As said above, I think with UTF-8 it's easier than with wchar_t becausewcwidth() is totally trivial and one doesn't need MB state and such.

I think of a fix as follows: in mutt_FormatString, convert to UTF-8first, don't get padding char with 'ch = *src++' but with a simple whileloop testing for UTF-8 continuation. After all is done, convert back to$charset.

Even then the problem with the callbacks remain. I guess with your patchthey need fixing, and when going with UTF-8 they'd need adjustments too(convert a subject or an author name to UTF-8 first before appending,etc.)


However, there's a real bug here...

Another approach which just pops into my brain is to really reworkmutt_FormatString like this: first go through the input and use thecallbacks to make up an array of replacements as first pass. Afterwardsmutt_FormatString() could use whatever semantics it wants internally,like going with wchar_t or UTF-8 or...


  bye, Rocco
--
:wq!

Follow-Ups:
- Re: mutt_FormatString() not multibyte-aware
  - From: TAKAHASHI Tamotsu

References:
- mutt_FormatString() not multibyte-aware
  - From: Rocco Rutte
- Re: mutt_FormatString() not multibyte-aware
  - From: TAKAHASHI Tamotsu

Prev by Date: Re: mutt_FormatString() not multibyte-aware
Next by Date: Re: mutt/2305: [patch] UTF-8 characters do not display correctly with S-lang on an xterm
Previous by thread: Re: mutt_FormatString() not multibyte-aware
Next by thread: Re: mutt_FormatString() not multibyte-aware
Index(es):
- Date
- Thread