Hi, * Alain Bench [06-06-30 12:10:11 +0200] wrote:
On Friday, June 23, 2006 at 13:02:24 +0000, Rocco Rutte wrote:we could also convert to utf-8 first because it's so trivial to test for continuations (as mutt IIRC does in other places already).I don't get it: We need to count cells. Conversion to UTF-8 would easely give the count of characters, but each one can take 0 to 2 cells. So something around wcwidth() like mutt_strwidth() or such is still needed. And those don't want UTF-8, but wc or current locale mb.
In rfc2047.c there is: #define CONTINUATION_BYTE(c) (((c) & 0xc0) == 0x80)for UTF-8 with which you can easily determine how much bytes a multibyte character from a 'char*' has and that is what we need for padding.
The RfC2047 encoder now converts everything to UTF-8 and uses the above to produce encoded words which do not break within multibyte characters (which RfC2047 requires, but you likely know that) using the above #define.
And I can think of something similar for mutt_FormatString().That would enable us to have the status lines being more correct; on single tokens extracted we would still need to use wcwidth() to determine their width on screen; but for detecting padding chars the above is good enough (given the performance implication of local->utf8->local doesn't count much)... and better than 'foo=*bar++'.
On platforms where wcwidth() is unreliable, we could embed the replacement in wcwidth.c via -HAVE_WC_FUNCS.
This is the case already, see wcwidth.c (which could need an update, btw).
bye, Rocco -- :wq!