Re: display of CP-1258
[crosspost and MFT mutt-dev and bug-gnu-libiconv]
Hello Bruno, and thank you for the explanations and patch.
On Sunday, February 22, 2004 at 2:17:05 PM +0100, Bruno Haible wrote:
> Alain Bench wrote:
>> possible bugs in iconv() converting CP-1255 or CP-1258 to Latin-1.
> This is not really a bug.
Much thanks for the explanation: I begin to half understand. Hadn't
thought about combining chars...
> What happens inside the iconv() function is that when the "oe"
> character is read, it is stored in the state, for possible combination
> with a following accent. Then the "u" character is read. Since it is
> not an accent, the "oe" is scheduled for output, and since ISO-8859-1
> doesn't contain this character, it is now that EILSEQ is returned.
Wouldn't it be possible to temporarily unread the innocent 'u' char
when iconv() sees it doesn't combine with stored state, so before '½'
output and possible EILSEQ return?
> Still an improvement is possible. Namely, there is a priori no
> possible combination of "oe" with an accent. The appended patch for
> libiconv implements this improvement.
Works OK: With your patch the 1258 test mail is now displayed in
Mutt at best, with '½' as '?' of course, but no more innocent char
eaten:
| Méilleurs v?ux à tous !
| Je peux éventuellement me rendre disponible.
As expected, replacing the 9C by a C3 (A breve U+0102) still eats
the 'u'. And the original mail vi-ed to CP-1255 still eats chars after
E0 (alef U+05D0) and E9 (yod U+05D9):
| M?lleurs v?ux ?tous !
| Je peux ?entuellement me rendre disponible.
But Edmund's test program gives me a strange:
| test("M\xE9illeurs", "windows-1258", "ISO-8859-1");
| Converting from windows-1258 to ISO-8859-1
| iconv returned 0
| Read 9 bytes and wrote 8 bytes
| Méilleurs » Méilleur
With 1 last byte unwritten and uncounted?
Bye! Alain.
--
set honor_followup_to=yes in muttrc is the default value, and makes your
list replies go where the original author wanted them to go: Only to the
list, or with a private copy.