<<< Date Index >>>     <<< Thread Index >>>

Re: MB_LEN_MAX has to be more than 5 if mutt uses UTF-8



* Sun Jul 13 2008 TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx>
> Now mutt depends on UTF-8, so MB_LEN_MAX needs to be
> at least 6. Therefore I suggest mutt.h checks it.

A multibyte guru NOZAKI-san told me that that is not enough.
You must expect errno==E2BIG even if you malloc'ed MB_LEN_MAX *
ibl bytes. Because iconv sometimes does "N:M conversion".

For example, imagine your MB_LEN_MAX is 1. This value should be
enough if you use only US-ASCII and ISO-8859-1 because they are
NOT multibyte. But some TRANSLIT locales convert
        (e accent aigu) to "e'"
and
        (a umlaut) to "ae"
So, MB_LEN_MAX*ibl is not enough for obl.

The above two are just "1:N" cases.
The situation is far worse in a real multibyte world.
(imagine a conversion like ascii-"ae" to utf8-a-umlaut. this
is not a real example, but you can kinda see how hard it is.)
So mutt has to handle E2BIG case with realloc.

A patch is following.

An example subject:
> =?iso-2022-jp?b?MSAbJEI3byROGyhCIE11dHQtai11c2VycyAbJEI/PUBBGyhC?=
>       =?iso-2022-jp?b?GyRCMEY3byQsJCIkaiReJDkbKEI=?=
dprint'ed
> E2BIG: ibl=9, obl=2, new obl=54, safe_realloc(92)
on my MB_LEN_MAX==1 system with EUC-JP locale.

(In fact, iconv(0,0) may return E2BIG. But I didn't check it
in this patch. Some corner cases might cause overflow when
mutt _reads_ the incomplete strings, but it shouldn't be a
security hole, AFAIK.)

diff -r cc67b008038c charset.c
--- a/charset.c Fri Jul 11 11:34:42 2008 +0200
+++ b/charset.c Wed Jul 16 13:57:59 2008 +0900
@@ -391,6 +391,8 @@
     ret1 = iconv (cd, &ib, &ibl, &ob, &obl);
     if (ret1 != (size_t)-1)
       ret += ret1;
+    else /* if (errno == E2BIG) */
+      ret = -1;
     if (ibl && obl && errno == EILSEQ)
     {
       if (inrepls)
@@ -479,7 +481,19 @@
     obl = MB_LEN_MAX * ibl;
     ob = buf = safe_malloc (obl + 1);
     
-    mutt_iconv (cd, &ib, &ibl, &ob, &obl, inrepls, outrepl);
+    /* MB_LEN_MAX may be insufficient */
+    while (mutt_iconv (cd, &ib, &ibl, &ob, &obl, inrepls, outrepl) == 
(size_t)-1)
+    {
+      if (errno != E2BIG)
+       break;
+      dprint(4, (debugfile, "mutt_convert_string E2BIG: ibl=%u, obl=%u, ", 
ibl, obl));
+      len = ob - buf;
+      obl = 6 * ibl; /* XXX: "6" is a magic number */
+      dprint(4, (debugfile, "new obl=%u, safe_realloc(%u)\n", obl, len+obl+1));
+      safe_realloc (&buf, len + obl + 1);
+      ob = buf + len;
+    }
+    iconv (cd, 0, 0, &ob, &obl);
     iconv_close (cd);
 
     *ob = '\0';