Re: mutt_FormatString() not multibyte-aware
* Alain Bench [06-06-30 12:10:11 +0200] wrote:
On Friday, June 23, 2006 at 13:02:24 +0000, Rocco Rutte wrote:
we could also convert to utf-8 first because it's so trivial to test
for continuations (as mutt IIRC does in other places already).
I don't get it: We need to count cells. Conversion to UTF-8 would
easely give the count of characters, but each one can take 0 to 2 cells.
So something around wcwidth() like mutt_strwidth() or such is still
needed. And those don't want UTF-8, but wc or current locale mb.
In rfc2047.c there is:
#define CONTINUATION_BYTE(c) (((c) & 0xc0) == 0x80)
for UTF-8 with which you can easily determine how much bytes a multibyte
character from a 'char*' has and that is what we need for padding.
The RfC2047 encoder now converts everything to UTF-8 and uses the above
to produce encoded words which do not break within multibyte characters
(which RfC2047 requires, but you likely know that) using the above
And I can think of something similar for mutt_FormatString().
That would enable us to have the status lines being more correct; on
single tokens extracted we would still need to use wcwidth() to
determine their width on screen; but for detecting padding chars the
above is good enough (given the performance implication of
local->utf8->local doesn't count much)... and better than 'foo=*bar++'.
On platforms where wcwidth() is unreliable, we could embed the
replacement in wcwidth.c via -HAVE_WC_FUNCS.
This is the case already, see wcwidth.c (which could need an update,