<<< Date Index >>>     <<< Thread Index >>>

Re: mutt_FormatString() not multibyte-aware



* TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx>:
* Fri Jun 23 2006 Rocco Rutte <pdmef@xxxxxxx>

I would like to fix it but don't know how since one of the mbyte functions failed for we in always returning 1 for the width. Maybe we could also convert to utf-8 first because it's so trivial to test for continuations (as mutt IIRC does in other places already).

Yeah, mbs is too hard to handle because you have to keep mbstate.
See the attached patch. This is just a hack, but it works if you
have wcswidth, wmemcpy, wcslen, etc.

Ouch, it's really a hack and requires more work because once you have wchar_t, all the callbacks should be aware of it too, I guess.

Should I file a bug report for this to have a discussion in the BTS or is a real fix easy enough so I don't have to?

I'm afraid there is no easy fix.

As said above, I think with UTF-8 it's easier than with wchar_t because wcwidth() is totally trivial and one doesn't need MB state and such.

I think of a fix as follows: in mutt_FormatString, convert to UTF-8 first, don't get padding char with 'ch = *src++' but with a simple while loop testing for UTF-8 continuation. After all is done, convert back to $charset.

Even then the problem with the callbacks remain. I guess with your patch they need fixing, and when going with UTF-8 they'd need adjustments too (convert a subject or an author name to UTF-8 first before appending, etc.)

However, there's a real bug here...

Another approach which just pops into my brain is to really rework mutt_FormatString like this: first go through the input and use the callbacks to make up an array of replacements as first pass. Afterwards mutt_FormatString() could use whatever semantics it wants internally, like going with wchar_t or UTF-8 or...

  bye, Rocco
--
:wq!