Re: mutt_FormatString() not multibyte-aware

To: Mutt dev ml <mutt-dev@xxxxxxxx>
Subject: Re: mutt_FormatString() not multibyte-aware
From: Rocco Rutte <pdmef@xxxxxxx>
Date: Tue, 4 Jul 2006 12:20:49 +0000
In-reply-to: <20060630101010.GA21076@xxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?Subject="Unsubscribe Mutt Dev"?body=unsubscribe>
Mail-followup-to: Mutt dev ml <mutt-dev@xxxxxxxx>
Organization: Berlin University of Technology
References: <20060623130224.GD3538@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20060630101010.GA21076@xxxxxxx>
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.11-pdmef-2006-07-03

Hi,

* Alain Bench [06-06-30 12:10:11 +0200] wrote:

On Friday, June 23, 2006 at 13:02:24 +0000, Rocco Rutte wrote:

we could also convert to utf-8 first because it's so trivial to test
for continuations (as mutt IIRC does in other places already).


   I don't get it: We need to count cells. Conversion to UTF-8 would
easely give the count of characters, but each one can take 0 to 2 cells.
So something around wcwidth() like mutt_strwidth() or such is still
needed. And those don't want UTF-8, but wc or current locale mb.


In rfc2047.c there is:

  #define CONTINUATION_BYTE(c) (((c) & 0xc0) == 0x80)

for UTF-8 with which you can easily determine how much bytes a multibytecharacter from a 'char*' has and that is what we need for padding.

The RfC2047 encoder now converts everything to UTF-8 and uses the aboveto produce encoded words which do not break within multibyte characters(which RfC2047 requires, but you likely know that) using the above#define.


And I can think of something similar for mutt_FormatString().

That would enable us to have the status lines being more correct; onsingle tokens extracted we would still need to use wcwidth() todetermine their width on screen; but for detecting padding chars theabove is good enough (given the performance implication oflocal->utf8->local doesn't count much)... and better than 'foo=*bar++'.

   On platforms where wcwidth() is unreliable, we could embed the
replacement in wcwidth.c via -HAVE_WC_FUNCS.

This is the case already, see wcwidth.c (which could need an update,btw).


  bye, Rocco
--
:wq!

Follow-Ups:
- Re: mutt_FormatString() not multibyte-aware
  - From: Ludolf Holzheid

References:
- mutt_FormatString() not multibyte-aware
  - From: Rocco Rutte
- Re: mutt_FormatString() not multibyte-aware
  - From: Alain Bench

Prev by Date: Re: Status NNTP Patch
Next by Date: [patch] space avoids recording in history
Previous by thread: Re: mutt_FormatString() not multibyte-aware
Next by thread: Re: mutt_FormatString() not multibyte-aware
Index(es):
- Date
- Thread