Re: display of CP-1258

To: Mutt dev ml <mutt-dev@xxxxxxxx>, bug-gnu-libiconv@xxxxxxx
Subject: Re: display of CP-1258
From: Alain Bench <veronatif@xxxxxxx>
Date: Tue, 24 Feb 2004 01:50:14 +0100 (CET)
In-reply-to: <200402221417.05155.bruno@xxxxxxxxx>
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
Mail-followup-to: Mutt dev ml <mutt-dev@xxxxxxxx>, bug-gnu-libiconv@xxxxxxx
References: <20040211214526.GA3107@xxxxxxx> <200402221417.05155.bruno@xxxxxxxxx>
Reply-to: Mutt dev ml <mutt-dev@xxxxxxxx>, bug-gnu-libiconv@xxxxxxx
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.4i-ja.1

    [crosspost and MFT mutt-dev and bug-gnu-libiconv]

Hello Bruno, and thank you for the explanations and patch.

 On Sunday, February 22, 2004 at 2:17:05 PM +0100, Bruno Haible wrote:

> Alain Bench wrote:
>> possible bugs in iconv() converting CP-1255 or CP-1258 to Latin-1.
> This is not really a bug.

    Much thanks for the explanation: I begin to half understand. Hadn't
thought about combining chars...


> What happens inside the iconv() function is that when the "oe"
> character is read, it is stored in the state, for possible combination
> with a following accent. Then the "u" character is read. Since it is
> not an accent, the "oe" is scheduled for output, and since ISO-8859-1
> doesn't contain this character, it is now that EILSEQ is returned.

    Wouldn't it be possible to temporarily unread the innocent 'u' char
when iconv() sees it doesn't combine with stored state, so before '½'
output and possible EILSEQ return?


> Still an improvement is possible. Namely, there is a priori no
> possible combination of "oe" with an accent. The appended patch for
> libiconv implements this improvement.

    Works OK: With your patch the 1258 test mail is now displayed in
Mutt at best, with '½' as '?' of course, but no more innocent char
eaten:

| Méilleurs v?ux à tous !
| Je peux éventuellement me rendre disponible.

    As expected, replacing the 9C by a C3 (A breve U+0102) still eats
the 'u'. And the original mail vi-ed to CP-1255 still eats chars after
E0 (alef U+05D0) and E9 (yod U+05D9):

| M?lleurs v?ux ?tous !
| Je peux ?entuellement me rendre disponible.


    But Edmund's test program gives me a strange:

| test("M\xE9illeurs", "windows-1258", "ISO-8859-1");
| Converting from windows-1258 to ISO-8859-1
| iconv returned 0
| Read 9 bytes and wrote 8 bytes
| Méilleurs » Méilleur

    With 1 last byte unwritten and uncounted?


Bye!    Alain.
-- 
set honor_followup_to=yes in muttrc is the default value, and makes your
list replies go where the original author wanted them to go: Only to the
list, or with a private copy.

Follow-Ups:
- Re: display of CP-1258
  - From: Bruno Haible

Prev by Date: Re: bug#1770: mutt patch breaks 'color tree brightred black'
Next by Date: IMAP read-only mailbox and \Seen
Previous by thread: Re: display of CP-1258
Next by thread: Re: display of CP-1258
Index(es):
- Date
- Thread