<<< Date Index >>>     <<< Thread Index >>>

[PATCH] more lenient RFC2047 parsing (was Re: RFC2047 Subjects)



On Thu, Sep 02, 2010 at 07:20:18PM -0400, Ed Blackman wrote:
Does mutt rely on the fact that encoded-text shouldn't have "?" or SPACE because it makes the implementation easier? Or is it just following the RFC strictly? Reading the RFC, it's not clear to me *why* encoded-text can't have "?" or SPACE.

Having unquoted delimeters (the question mark) in the middle of the string makes it harder to parse. The space character can be problematic in certain header fields that are structured. Thus, it is always a good idea to encode those characters.

I forwarded the message I copied the headers from, along with a one that had spaces in the encoded-text, to my work Outlook and to my Gmail account. Both Outlook and Gmail decoded the subjects as intended, which is probably why Intrade and Twitter can get away with sending out non-conformant messages.

Any chance of a rfc2047 lenient decode, perhaps as an option?

Try the attached patch.

me
diff --git a/rfc2047.c b/rfc2047.c
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -629,12 +629,23 @@
   const char *t, *t1;
   int enc = 0, count = 0;
   char *charset = NULL;
+  int rv = -1;
 
   pd = d0 = safe_malloc (strlen (s));
 
   for (pp = s; (pp1 = strchr (pp, '?')); pp = pp1 + 1)
   {
     count++;
+
+    /* hack for non-compliant MUAs that allow unquoted question marks in 
encoded-text */
+    if (count == 4)
+    {
+      while (pp1 && *(pp1 + 1) != '=')
+       pp1 = strchr(pp1 + 1, '?');
+      if (!pp1)
+         goto error_out_0;
+    }
+
     switch (count)
     {
       case 2:
@@ -650,11 +661,7 @@
        else if (toupper ((unsigned char) *pp) == 'B')
          enc = ENCBASE64;
        else
-       {
-         FREE (&charset);
-         FREE (&d0);
-         return (-1);
-       }
+         goto error_out_0;
        break;
       case 4:
        if (enc == ENCQUOTEDPRINTABLE)
@@ -707,9 +714,11 @@
     mutt_convert_string (&d0, charset, Charset, M_ICONV_HOOK_FROM);
   mutt_filter_unprintable (&d0);
   strfcpy (d, d0, len);
+  rv = 0;
+error_out_0:
   FREE (&charset);
   FREE (&d0);
-  return (0);
+  return rv;
 }
 
 /*
@@ -731,7 +740,8 @@
       ;
     if (q[0] != '?' || !strchr ("BbQq", q[1]) || q[2] != '?')
       continue;
-    for (q = q + 3; 0x20 < *q && *q < 0x7f && *q != '?'; q++)
+    /* non-strict check since many MUAs will not encode spaces and question 
marks */
+    for (q = q + 3; 0x20 <= *q && *q < 0x7f && (*q != '?' || q[1] != '='); q++)
       ;
     if (q[0] != '?' || q[1] != '=')
     {