[PATCH] more lenient RFC2047 parsing (was Re: RFC2047 Subjects)
On Thu, Sep 02, 2010 at 07:20:18PM -0400, Ed Blackman wrote:
Does mutt rely on the fact that encoded-text shouldn't have "?" or
SPACE because it makes the implementation easier? Or is it just
following the RFC strictly? Reading the RFC, it's not clear to me
*why* encoded-text can't have "?" or SPACE.
Having unquoted delimeters (the question mark) in the middle of the string
makes it harder to parse. The space character can be problematic in certain
header fields that are structured. Thus, it is always a good idea to encode
those characters.
I forwarded the message I copied the headers from, along with a one
that had spaces in the encoded-text, to my work Outlook and to my
Gmail account. Both Outlook and Gmail decoded the subjects as
intended, which is probably why Intrade and Twitter can get away with
sending out non-conformant messages.
Any chance of a rfc2047 lenient decode, perhaps as an option?
Try the attached patch.
me
diff --git a/rfc2047.c b/rfc2047.c
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -629,12 +629,23 @@
const char *t, *t1;
int enc = 0, count = 0;
char *charset = NULL;
+ int rv = -1;
pd = d0 = safe_malloc (strlen (s));
for (pp = s; (pp1 = strchr (pp, '?')); pp = pp1 + 1)
{
count++;
+
+ /* hack for non-compliant MUAs that allow unquoted question marks in
encoded-text */
+ if (count == 4)
+ {
+ while (pp1 && *(pp1 + 1) != '=')
+ pp1 = strchr(pp1 + 1, '?');
+ if (!pp1)
+ goto error_out_0;
+ }
+
switch (count)
{
case 2:
@@ -650,11 +661,7 @@
else if (toupper ((unsigned char) *pp) == 'B')
enc = ENCBASE64;
else
- {
- FREE (&charset);
- FREE (&d0);
- return (-1);
- }
+ goto error_out_0;
break;
case 4:
if (enc == ENCQUOTEDPRINTABLE)
@@ -707,9 +714,11 @@
mutt_convert_string (&d0, charset, Charset, M_ICONV_HOOK_FROM);
mutt_filter_unprintable (&d0);
strfcpy (d, d0, len);
+ rv = 0;
+error_out_0:
FREE (&charset);
FREE (&d0);
- return (0);
+ return rv;
}
/*
@@ -731,7 +740,8 @@
;
if (q[0] != '?' || !strchr ("BbQq", q[1]) || q[2] != '?')
continue;
- for (q = q + 3; 0x20 < *q && *q < 0x7f && *q != '?'; q++)
+ /* non-strict check since many MUAs will not encode spaces and question
marks */
+ for (q = q + 3; 0x20 <= *q && *q < 0x7f && (*q != '?' || q[1] != '='); q++)
;
if (q[0] != '?' || q[1] != '=')
{