Re: display of CP-1258
Alain Bench <veronatif@xxxxxxx>:
> It "works": I get 2 readable lines. But the first next char (an 'u')
> got eaten: I get "v?x", not "v?ux" in place of "vœux". Again the 'u'
> disappears when mail is either CP-1255 or CP-1258, not others.
This is because of a glibc/iconv bug that I can't really work around.
See the C program below (which I am including inline because I vaguely
remember something about attachments causing messages to mutt-dev to
require moderator approval).
This patch is good for CVS, I think: it shouldn't break anything and
should give you "v?x" rather than "v?????...".
Index: charset.c
===================================================================
RCS file: /home/roessler/cvs/mutt/charset.c,v
retrieving revision 3.6
diff -u -r3.6 charset.c
--- charset.c 11 Dec 2002 11:19:39 -0000 3.6
+++ charset.c 5 Feb 2004 14:39:02 -0000
@@ -379,18 +379,24 @@
if (*t)
continue;
}
- if (outrepl)
+ /* Replace the output */
+ if (!outrepl)
+ outrepl = "?";
+ iconv (cd, 0, 0, &ob, &obl);
+ if (obl)
{
- /* Try replacing the output */
int n = strlen (outrepl);
- if (n <= obl)
+ if (n > obl)
{
- memcpy (ob, outrepl, n);
- ++ib, --ibl;
- ob += n, obl -= n;
- ++ret;
- continue;
+ outrepl = "?";
+ n = 1;
}
+ memcpy (ob, outrepl, n);
+ ++ib, --ibl;
+ ob += n, obl -= n;
+ ++ret;
+ iconv (cd, 0, 0, 0, 0); /* for good measure */
+ continue;
}
}
*inbuf = ib, *inbytesleft = ibl;
#include <assert.h>
#include <iconv.h>
#include <stdio.h>
#include <string.h>
void test(char *str, char *from, char *to)
{
char *ib, *ob;
size_t ibl, obl, ret;
char buf[100];
iconv_t cd;
printf("\nConverting from %s to %s\n", from, to);
cd = iconv_open(to, from);
if (cd == (iconv_t)-1) {
printf("iconv_open failed\n");
return;
}
ib = str;
ibl = strlen(str);
ob = buf;
obl = sizeof(buf);
ret = iconv(cd, &ib, &ibl, &ob, &obl);
printf("iconv returned %d\n", ret);
assert(ib + ibl == str + strlen(str));
assert(ob + obl == buf + sizeof(buf));
printf("Read %d bytes and wrote %d bytes\n", ib - str, ob - buf);
iconv_close(cd);
}
int main()
{
test("v\x9cux", "windows-1258", "ISO-8859-1");
test("v\xc5\x93ux", "utf-8", "ISO-8859-1");
return 0;
}
/*
With glibc-2.3.2 (Red Hat 9) I obtained the following output:
Converting from windows-1258 to ISO-8859-1
iconv returned -1
Read 2 bytes and wrote 1 bytes
Converting from utf-8 to ISO-8859-1
iconv returned -1
Read 1 bytes and wrote 1 bytes
When converting from windows-1258, the unconvertable character is skipped.
When converting from utf-8, the unconvertable character is not skipped.
*/