[ja-patch] short versions (Re: What should go into 1.5.7?)

To: mutt-dev@xxxxxxxx
Subject: [ja-patch] short versions (Re: What should go into 1.5.7?)
From: TAKAHASHI Tamotsu <ttakah@xxxxxxxxxxxxxxxxx>
Date: Fri, 28 Jan 2005 23:08:52 +0900
Cc: Marco d'Itri <md@xxxxxxxx>, Thomas Roessler <roessler@xxxxxxxxxxxxxxxxxx>
In-reply-to: <20050127183459.GA15128@xxxxxxxxxxxxxxxxxxx>
Mail-followup-to: mutt-dev@xxxxxxxx, Marco d'Itri <md@xxxxxxxx>, Thomas Roessler <roessler@xxxxxxxxxxxxxxxxxx>
References: <20050126115002.GH4589@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20050127183459.GA15128@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.6i

On Thu, Jan 27, 2005 at 07:34:59PM +0100, Marco d'Itri wrote:

> The other patch is a subset of the ja patch, it adds the the config
> options assumed_charset, file_charset and strict_mime. I can't see how
> people using mutt in the real world and having to deal with Outlook
> users can live without it.

Thank you!

>   - File: http://www.emaillab.org/mutt/1.5/patch-1.5.6.tt.assumed_charset.1.gz

A shorter version is attached if you are interested.
I removed $strict_mime in this version.
If $assumed_charset is set "us-ascii" (default), mutt behaves the same
as the current. So this is still compatible with the current mutt by default.
(Though incompatible with the current JA patch.)

Some users (I've forgotten, but a French user posted to mutt-dev IIRC) may think
that, strings which cannot be converted should be sanitized. Otherwise, 
irregular
data are put out to the screen. It is very annoying. So I add 
mutt_convert_string()
as a fallback filter. However this never happens if $assumed_charset is unset or
set to "us-ascii".
(For example: set assumed_charset="utf-8:iso-8859-1"
a. First, mutt tries utf-8.
b. If failed, mutt tries iso-8859-1.
c. If failed, mutt converts the string from utf-8 to $charset anyway,
 with '?'-substitution in place of chars out of the range.)

But OTOH, other users (including Alain?) may want "pass-through" behaviour.
i.e. Mutt tries all the charsets in $assumed_charset, and gives up converting
if he reaches "us-ascii" in $assumed_charset.
(For example: set assumed_charset="utf-8:iso-8859-1:us-ascii"
a. Mutt tries utf-8.
b. If failed, mutt tries iso-8859-1.
c. If failed, mutt passes through the string without conversion.)

I am not sure this is betther than JA or compat patch.
But this is shorter.

>   - File: 
> http://www.emaillab.org/mutt/1.5/patch-1.5.6.tt.adjust_edited_file.1.gz

Thomas commited this to the CVS already. Thanks!

>   - File: http://www.emaillab.org/mutt/1.5/patch-1.5.6.tt.adjust_line.3.gz

I have a little modified version of this patch, too.
It uses mutt_strwidth() mutt already has. Attached.

-- 
tamo

[assumed_charset]

Some broken MUAs send non-ASCII messages without declaring charset.
This patch allows you to set $assumed_charset.

This variable takes "1:2:3:..." format like $PATH environment variable.

For message body, only the first charset will be used.
For headers, mutt will try all the charsets. If none of them is
appropriate to convert the string from, mutt will use the same
conversion as for message body. If you don't like it,
set "us-ascii" as the last charset in $assumed_charset.
Then such a string will be passed-through.

If $assumed_charset is "us-ascii" (default), strings will be assumed
to be $charset (i.e. they will be passed-through). This is the current
behaviour of mutt. So, this patch is completely backwards compatible.

[file_charset]

When you attach a text file, the file may have a different charset
than your $charset. So, this patch allows you to set $file_charset.
Such files are assumed to be $file_charset.
This variable takes the same format as $assumed_charset. ("1:2:3:4")

--- mutt-1.5.6.orig/PATCHES     Mon Feb  2 02:42:47 2004
+++ mutt-1.5.6/PATCHES  Sat Feb 14 10:02:10 2004
@@ -0,0 +1 @@
+patch-1.5.6.tt+tamo.short_assumed_file_charset.1

--- mutt-1.5.6.orig/charset.c   Tue Jan 21 21:25:21 2003
+++ mutt-1.5.6/charset.c        Sat Feb 14 10:02:03 2004
@@ -581,3 +584,95 @@
     iconv_close (fc->cd);
   FREE (_fc);
 }
+
+char *mutt_get_first_charset (const char *charset)
+{
+  static char fcharset[SHORT_STRING];
+  const char *c, *c1;
+  
+  c = charset;
+  if (!mutt_strlen(c))
+    return "us-ascii";
+  if (!(c1 = strchr (c, ':')))
+    return charset;
+  strfcpy (fcharset, c, c1 - c + 1);
+  return fcharset;
+}
+
+static size_t convert_string (ICONV_CONST char *f, size_t flen,
+                             const char *from, const char *to,
+                             char **t, size_t *tlen)
+{
+  iconv_t cd;
+  char *buf, *ob;
+  size_t obl, n;
+  int e;
+
+  if (!ascii_strcasecmp (from, "us-ascii"))
+  {
+    *t = safe_strdup (f);
+    *tlen = mutt_strlen (f);
+    return 0;
+  }
+
+  cd = mutt_iconv_open (to, from, M_ICONV_HOOK_FROM);
+  if (cd == (iconv_t)(-1))
+    return (size_t)(-1);
+  obl = 4 * flen + 1;
+  ob = buf = safe_malloc (obl);
+  n = iconv (cd, &f, &flen, &ob, &obl);
+  if (n == (size_t)(-1) || iconv (cd, 0, 0, &ob, &obl) == (size_t)(-1))
+  {
+    e = errno;
+    FREE (&buf);
+    iconv_close (cd);
+    errno = e;
+    return (size_t)(-1);
+  }
+  *ob = '\0';
+  
+  *tlen = ob - buf;
+
+  safe_realloc (&buf, ob - buf + 1);
+  *t = buf;
+  iconv_close (cd);
+
+  return n;
+}
+
+int mutt_convert_nomime_string (char **ps)
+{
+  const char *c, *c1;
+
+  for (c = AssumedCharset; c; c = c1 ? c1 + 1 : 0)
+  {
+    char *u = *ps;
+    char *s;
+    char *fromcode;
+    size_t m, n;
+    size_t ulen = mutt_strlen (*ps);
+    size_t slen;
+
+    if (!u || !*u)
+      return 0;
+
+    c1 = strchr (c, ':');
+    n = c1 ? c1 - c : mutt_strlen (c);
+    if (!n)
+      continue;
+    fromcode = safe_malloc (n + 1);
+    strfcpy (fromcode, c, n + 1);
+    m = convert_string (u, ulen, fromcode, Charset, &s, &slen);
+    FREE (&fromcode);
+    if (m != (size_t)(-1))
+    {
+      FREE (ps);
+      *ps = s;
+      return 0;
+    }
+  }
+  mutt_convert_string (ps,
+    mutt_get_first_charset (AssumedCharset), Charset, M_ICONV_HOOK_FROM);
+  return -1;
+}
+

--- mutt-1.5.6.orig/charset.h   Tue Mar  4 16:49:43 2003
+++ mutt-1.5.6/charset.h        Sat Feb 14 10:01:29 2004
@@ -35,6 +35,8 @@
 #endif
 
 int mutt_convert_string (char **, const char *, const char *, int);
+char *mutt_get_first_charset (const char *);
+int mutt_convert_nomime_string (char **);
 
 iconv_t mutt_iconv_open (const char *, const char *, int);
 size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, 
ICONV_CONST char **, const char *);

--- mutt-1.5.6.orig/globals.h   Mon Feb  2 02:15:17 2004
+++ mutt-1.5.6/globals.h        Sat Feb 14 10:01:29 2004
@@ -32,6 +32,7 @@
 
 WHERE char *AliasFile;
 WHERE char *AliasFmt;
+WHERE char *AssumedCharset;
 WHERE char *AttachSep;
 WHERE char *Attribution;
 WHERE char *AttachFormat;
@@ -45,6 +46,7 @@
 WHERE char *DsnReturn;
 WHERE char *Editor;
 WHERE char *EscChar;
+WHERE char *FileCharset;
 WHERE char *FolderFormat;
 WHERE char *ForwFmt;
 WHERE char *Fqdn;

--- mutt-1.5.6.orig/handler.c   Wed Nov  5 18:41:31 2003
+++ mutt-1.5.6/handler.c        Sat Feb 14 10:01:29 2004
@@ -1718,12 +1726,22 @@
   int istext = mutt_is_text_part (b);
   iconv_t cd = (iconv_t)(-1);
 
-  if (istext && s->flags & M_CHARCONV)
-  {
-    char *charset = mutt_get_parameter ("charset", b->parameter);
-    if (charset && Charset)
-      cd = mutt_iconv_open (Charset, charset, M_ICONV_HOOK_FROM);
-  }
+  if (istext)
+  {
+    if(s->flags & M_CHARCONV)
+    {
+      char *charset = mutt_get_parameter ("charset", b->parameter);
+      if (!charset)
+        charset = mutt_get_first_charset (AssumedCharset);
+      if (charset && Charset)
+        cd = mutt_iconv_open (Charset, charset, M_ICONV_HOOK_FROM);
+    }
+    else
+    {
+      if (b->file_charset)
+        cd = mutt_iconv_open (Charset, b->file_charset, M_ICONV_HOOK_FROM);
+    }
+  }
 
   fseek (s->fpin, b->offset, 0);
   switch (b->encoding)

--- mutt-1.5.6.orig/init.h      Mon Feb  2 02:15:17 2004
+++ mutt-1.5.6/init.h   Sat Feb 14 10:01:29 2004
@@ -184,6 +184,22 @@
   ** If set, Mutt will prompt you for carbon-copy (Cc) recipients before
   ** editing the body of an outgoing message.
   */  
+  { "assumed_charset", DT_STR, R_NONE, UL &AssumedCharset, UL "us-ascii"},
+  /*
+  ** .pp
+  ** This variable is a colon-separated list of character encoding 
+  ** schemes for messages without character encoding indication.
+  ** Header field values without character encoding indication would be 
+  ** assumed to be written in one of this list.
+  ** Message body content without character encoding indication would be
+  ** assumed to be written in the first entry of the list.
+  ** By default, any header fields and message body without charset 
+  ** indication are assumed to be in $$charset.
+  ** .pp
+  ** For example, Japanese users might prefer this setting:
+  ** .pp
+  **   set assumed_charset="iso-2022-jp:euc-jp:shift_jis:utf-8"
+  */
   { "attach_format",   DT_STR,  R_NONE, UL &AttachFormat, UL "%u%D%I %t%4n 
%T%.40d%> [%.7m/%.10M, %.6e%?C?, %C?, %s] " },
   /*
   ** .pp
@@ -532,6 +590,20 @@
   ** (PGP only)
   */
+  { "file_charset",    DT_STR,  R_NONE, UL &FileCharset, UL 0 },
+  /*
+  ** .pp
+  ** This variable is a colon-separated list of character encoding
+  ** schemes for text file attatchments.
+  ** If unset, $$charset value will be used instead.
+  ** For example, the following configuration would work for Japanese
+  ** text handling:
+  ** .pp
+  **   set file_charset="iso-2022-jp:euc-jp:shift_jis:utf-8"
+  ** .pp
+  ** Note: "iso-2022-*" must be put at the head of the value as shown above
+  ** if included.
+  */
   { "folder",          DT_PATH, R_NONE, UL &Maildir, UL "~/Mail" },
   /*
   ** .pp
   ** Specifies the default location of your mailboxes.  A `+' or `=' at the

--- mutt-1.5.6.orig/mutt.h      Mon Feb  2 02:15:17 2004
+++ mutt-1.5.6/mutt.h   Sat Feb 14 10:01:29 2004
@@ -599,6 +613,7 @@
                                 * If NULL, filename is used 
                                 * instead.
                                 */
+  char *file_charset;          /* charset of attached file */
   CONTENT *content;             /* structure used to store detailed info about
                                 * the content of the attachment.  this is used
                                 * to determine what content-transfer-encoding

--- mutt-1.5.6.orig/parse.c     Wed Nov  5 18:41:33 2003
+++ mutt-1.5.6/parse.c  Sat Feb 14 10:01:29 2004
@@ -208,9 +208,21 @@
 
       if (*s == '"')
       {
+       int state_ascii = 1;
        s++;
-       for (i=0; *s && *s != '"' && i < sizeof (buffer) - 1; i++, s++)
+       for (i=0; *s && i < sizeof (buffer) - 1; i++, s++)
        {
+         if (*s == 0x1b)
+         {
+           if (s[1] == '(' && (s[2] == 'B' || s[2] == 'J'))
+             state_ascii = 1;
+           else
+             state_ascii = 0;
+         }
+         if (state_ascii)
+         {
+           if (*s == '"')
+             break;
          if (*s == '\\')
          {
            /* Quote the next character */
@@ -221,6 +233,9 @@
          else
            buffer[i] = *s;
        }
+         else
+           buffer[i] = *s;
+       }
        buffer[i] = 0;
        if (*s)
          s++; /* skip over the " */
@@ -379,7 +394,9 @@
   if (ct->type == TYPETEXT)
   {
     if (!(pc = mutt_get_parameter ("charset", ct->parameter)))
-      mutt_set_parameter ("charset", "us-ascii", &ct->parameter);
+      mutt_set_parameter ("charset",
+                         (const char *) mutt_get_first_charset 
(AssumedCharset),
+                         &ct->parameter);
   }
 
 }

--- mutt-1.5.6.orig/rfc2047.c   Wed Nov  5 18:41:33 2003
+++ mutt-1.5.6/rfc2047.c        Sat Feb 14 10:01:29 2004
@@ -729,8 +773,23 @@
     if (!(p = find_encoded_word (s, &q)))
     {
       /* no encoded words */
+      n = mutt_strlen (s);
+      if (ascii_strcasecmp (AssumedCharset, "us-ascii"))
+      {
+       char *t;
+       size_t tlen;
+
+       t = safe_malloc (n + 1);
+       strfcpy (t, s, n + 1);
+       mutt_convert_nomime_string (&t);
+       tlen = mutt_strlen (t);
+       strncpy (d, t, tlen); 
+       d += tlen;
+       FREE (&t);
+       break;
+      }
       strncpy (d, s, dlen);
       d += dlen;
       break;
     }
 
@@ -766,7 +861,7 @@
 {
   while (a)
   {
-    if (a->personal && strstr (a->personal, "=?") != NULL)
+    if (a->personal)
       rfc2047_decode (&a->personal);
 #ifdef EXACT_ADDRESS
     if (a->val && strstr (a->val, "=?") != NULL)

--- mutt-1.5.6.orig/rfc2231.c   Wed Nov  5 18:41:33 2003
+++ mutt-1.5.6/rfc2231.c        Sat Feb 14 10:01:29 2004
@@ -113,6 +113,11 @@
 
       if (option (OPTRFC2047PARAMS) && p->value && strstr (p->value, "=?"))
        rfc2047_decode (&p->value);
+      else
+      {
+       if (ascii_strcasecmp (AssumedCharset, "us-ascii"))
+         mutt_convert_nomime_string (&p->value);
+      }
 
       *last = p;
       last = &p->next;

--- mutt-1.5.6.orig/sendlib.c   Wed Nov  5 18:41:33 2003
+++ mutt-1.5.6/sendlib.c        Sat Feb 14 10:01:29 2004
@@ -496,7 +523,7 @@
   }
 
   if (a->type == TYPETEXT && (!a->noconv))
-    fc = fgetconv_open (fpin, Charset, 
+    fc = fgetconv_open (fpin, a->file_charset, 
                        mutt_get_body_charset (send_charset, sizeof 
(send_charset), a),
                        0);
   else
@@ -896,6 +923,7 @@
   CONTENT *info;
   CONTENT_STATE state;
   FILE *fp = NULL;
+  char *fromcode;
   char *tocode;
   char buffer[100];
   char chsbuf[STRING];
@@ -930,15 +958,18 @@
   if (b != NULL && b->type == TYPETEXT && (!b->noconv && !b->force_charset))
   {
     char *chs = mutt_get_parameter ("charset", b->parameter);
+    char *fchs = b->use_disp ? ((FileCharset && *FileCharset) ?
+                                FileCharset : Charset) : Charset;
     if (Charset && (chs || SendCharset) &&
-       convert_file_from_to (fp, Charset, chs ? chs : SendCharset,
-                             0, &tocode, info) != (size_t)(-1))
+       convert_file_from_to (fp, fchs, chs ? chs : SendCharset,
+                             &fromcode, &tocode, info) != (size_t)(-1))
     {
       if (!chs)
       {
        mutt_canonical_charset (chsbuf, sizeof (chsbuf), tocode);
        mutt_set_parameter ("charset", chsbuf, &b->parameter);
       }
+      b->file_charset = fromcode;
       FREE (&tocode);
       safe_fclose (&fp);
       return info;
@@ -1318,6 +1349,7 @@
   body->unlink = 1;
   body->use_disp = 0;
   body->disposition = DISPINLINE;
+  body->noconv = 1;
 
   mutt_parse_mime_message (ctx, hdr);

mutt_FormatString() has "wlen" and "len" variable, but it does not
have "width" variable -- This is problematic because some charsets
may have 2-bytes-but-1-column or 3-bytes-but-2-columns characters.
This patch correctly handles such cases.


--- mutt-1.5.6.orig/PATCHES     Mon Feb  2 02:42:47 2004
+++ mutt-1.5.6/PATCHES  Sat Feb 14 10:02:10 2004
@@ -0,0 +1 @@
+patch-1.5.6.tt.fmtstring.1

--- mutt-1.5.6.orig/muttlib.c   Mon Feb  2 02:15:17 2004
+++ mutt-1.5.6/muttlib.c        Sat Feb 14 10:01:29 2004
@@ -959,11 +922,12 @@
 {
   char prefix[SHORT_STRING], buf[LONG_STRING], *cp, *wptr = dest, ch;
   char ifstring[SHORT_STRING], elsestring[SHORT_STRING];
-  size_t wlen, count, len;
+  size_t wlen, count, len, col, wid;
 
   prefix[0] = '\0';
   destlen--; /* save room for the terminal \0 */
   wlen = (flags & M_FORMAT_ARROWCURSOR && option (OPTARROWCURSOR)) ? 3 : 0;
+  col = wlen;
     
   while (*src && wlen < destlen)
   {
@@ -973,6 +937,7 @@
       {
        *wptr++ = '%';
        wlen++;
+       col++;
        src++;
        continue;
       }
@@ -1045,23 +1010,26 @@
        /* calculate space left on line.  if we've already written more data
           than will fit on the line, ignore the rest of the line */
        count = (COLS < destlen ? COLS : destlen);
-       if (count > wlen)
+       if (count > col)
        {
-         count -= wlen; /* how many chars left on this line */
+         count -= col; /* how many columns left on this line */
          mutt_FormatString (buf, sizeof (buf), src, callback, data, flags);
          len = mutt_strlen (buf);
-         if (count > len)
+         wid = mutt_strwidth (buf);
+         if (count > wid)
          {
-           count -= len; /* how many chars to pad */
+           count -= wid; /* how many chars to pad */
            memset (wptr, ch, count);
            wptr += count;
            wlen += count;
+           col += count;
          }
          if (len + wlen > destlen)
            len = destlen - wlen;
          memcpy (wptr, buf, len);
          wptr += len;
          wlen += len;
+         col += mutt_strwidth (buf);
        }
        break; /* skip rest of input */
       }
@@ -1112,7 +1080,8 @@
 
        memcpy (wptr, buf, len);
        wptr += len;
        wlen += len;
+       col += mutt_strwidth (buf);
       }
     }
     else if (*src == '\\')
@@ -1143,11 +1112,13 @@
       src++;
       wptr++;
       wlen++;
+      col++;
     }
     else
     {
       *wptr++ = *src++;
       wlen++;
+      col++;
     }
   }
   *wptr = 0;

--- mutt-1.5.6.orig/pager.c     Mon Feb  2 02:10:57 2004
+++ mutt-1.5.6/pager.c  Sat Feb 14 10:01:29 2004
@@ -1706,15 +1750,17 @@
       CLEARLINE (statusoffset);
       if (IsHeader (extra))
       {
-       _mutt_make_string (buffer,
-                          COLS-9 < sizeof (buffer) ? COLS-9 : sizeof (buffer),
-                          NONULL (PagerFmt), Context, extra->hdr, 
M_FORMAT_MAKEPRINT);
+       size_t l1 = (COLS - 9) * MB_LEN_MAX;
+       size_t l2 = sizeof (buffer);
+       _mutt_make_string (buffer, l1 < l2 ? l1 : l2, NONULL (PagerFmt),
+                          Context, extra->hdr, M_FORMAT_MAKEPRINT);
       }
       else if (IsMsgAttach (extra))
       {
-       _mutt_make_string (buffer,
-                          COLS - 9 < sizeof (buffer) ? COLS - 9: sizeof 
(buffer),
-                          NONULL (PagerFmt), Context, extra->bdy->hdr, 
M_FORMAT_MAKEPRINT);
+       size_t l1 = (COLS - 9) * MB_LEN_MAX;
+       size_t l2 = sizeof (buffer);
+       _mutt_make_string (buffer, l1 < l2 ? l1 : l2, NONULL (PagerFmt),
+                          Context, extra->bdy->hdr, M_FORMAT_MAKEPRINT);
       }
       mutt_paddstr (COLS-10, IsHeader (extra) || IsMsgAttach (extra) ?
                    buffer : banner);

References:
- What should go into 1.5.7?
  - From: Thomas Roessler
- Re: What should go into 1.5.7?
  - From: Marco d'Itri

Prev by Date: Coding Style: Looking for the indent commandline options to format according to mutt codingstyle
Next by Date: Re: Coding Style: Looking for the indent commandline options to format according to mutt codingstyle
Previous by thread: Re: What should go into 1.5.7?
Next by thread: Re: What should go into 1.5.7?
Index(es):
- Date
- Thread