<<< Date Index >>>     <<< Thread Index >>>

Re: display of CP-1258



Alain Bench <veronatif@xxxxxxx>:

>     It "works": I get 2 readable lines. But the first next char (an 'u')
> got eaten: I get "v?x", not "v?ux" in place of "vœux". Again the 'u'
> disappears when mail is either CP-1255 or CP-1258, not others.

This is because of a glibc/iconv bug that I can't really work around.
See the C program below (which I am including inline because I vaguely
remember something about attachments causing messages to mutt-dev to
require moderator approval).

This patch is good for CVS, I think: it shouldn't break anything and
should give you "v?x" rather than "v?????...".

Index: charset.c
===================================================================
RCS file: /home/roessler/cvs/mutt/charset.c,v
retrieving revision 3.6
diff -u -r3.6 charset.c
--- charset.c   11 Dec 2002 11:19:39 -0000      3.6
+++ charset.c   5 Feb 2004 14:39:02 -0000
@@ -379,18 +379,24 @@
        if (*t)
          continue;
       }
-      if (outrepl)
+      /* Replace the output */
+      if (!outrepl)
+       outrepl = "?";
+      iconv (cd, 0, 0, &ob, &obl);
+      if (obl)
       {
-       /* Try replacing the output */
        int n = strlen (outrepl);
-       if (n <= obl)
+       if (n > obl)
        {
-         memcpy (ob, outrepl, n);
-         ++ib, --ibl;
-         ob += n, obl -= n;
-         ++ret;
-         continue;
+         outrepl = "?";
+         n = 1;
        }
+       memcpy (ob, outrepl, n);
+       ++ib, --ibl;
+       ob += n, obl -= n;
+       ++ret;
+       iconv (cd, 0, 0, 0, 0); /* for good measure */
+       continue;
       }
     }
     *inbuf = ib, *inbytesleft = ibl;



#include <assert.h>
#include <iconv.h>
#include <stdio.h>
#include <string.h>

void test(char *str, char *from, char *to)
{
  char *ib, *ob;
  size_t ibl, obl, ret;
  char buf[100];
  iconv_t cd;

  printf("\nConverting from %s to %s\n", from, to);
  cd = iconv_open(to, from);
  if (cd == (iconv_t)-1) {
    printf("iconv_open failed\n");
    return;
  }
  ib = str;
  ibl = strlen(str);
  ob = buf;
  obl = sizeof(buf);
  ret = iconv(cd, &ib, &ibl, &ob, &obl);
  printf("iconv returned %d\n", ret);
  assert(ib + ibl == str + strlen(str));
  assert(ob + obl == buf + sizeof(buf));
  printf("Read %d bytes and wrote %d bytes\n", ib - str, ob - buf);
  iconv_close(cd);
}

int main()
{
  test("v\x9cux", "windows-1258", "ISO-8859-1");
  test("v\xc5\x93ux", "utf-8", "ISO-8859-1");
  return 0;
}

/*
  With glibc-2.3.2 (Red Hat 9) I obtained the following output:

Converting from windows-1258 to ISO-8859-1
iconv returned -1
Read 2 bytes and wrote 1 bytes

Converting from utf-8 to ISO-8859-1
iconv returned -1
Read 1 bytes and wrote 1 bytes

  When converting from windows-1258, the unconvertable character is skipped.
  When converting from utf-8, the unconvertable character is not skipped.
*/