assumed_charset, file_charset and iconv-hook

To: mutt-dev@xxxxxxxx
Subject: assumed_charset, file_charset and iconv-hook
From: Tamotsu Takahashi <ttakah@xxxxxxxxxxxxxxxxx>
Date: Sun, 13 Feb 2005 07:45:34 +0900
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
Mail-followup-to: mutt-dev@xxxxxxxx
Sender: owner-mutt-dev@xxxxxxxx
User-agent: Mutt/1.5.6i

Thank you, Thomas. Thank you!
And I thank you for commiting pgp-auto-decode, too.
What a wonderful version 1.5.8 is!

On Sat, Feb 12, 2005 at 09:42:31PM +0100, Thomas Roessler wrote:
> On 2005-02-10 10:06:16 +0900, TAKAHASHI Tamotsu wrote:
> > charset.c mbyte.c hook.c:
> >     Allow iconv-hook overwrite existing charset
> >     (MORIYAMA Masayuki)
> 
> The euc-ms-jp stuff is in, the iconv-hook-on-everything and
> iconv-hook-as-regexp not, for the moment.

I don't think it's iconv-hook-as-regexp.
I think it's just "iconv-hook as ICASE."

And you can see it is already in, when M_CRYPTHOOK is NOT defined.
: #ifdef M_CRYPTHOOK
:     if ((rc = REGCOMP (rx, NONULL(pattern.data), ((data & 
(M_CRYPTHOOK|M_CHARSETHOOK)) ? REG_ICASE : 0))) != 0)
: #else
:     if ((rc = REGCOMP (rx, NONULL(pattern.data), (data & 
(M_CHARSETHOOK|M_ICONVHOOK)) ? REG_ICASE : 0)) != 0)
: #endif /* M_CRYPTHOOK */

BTW, can anyone ask Edmund to confirm this patch?

> > charset.c charset.h globals.h handler.c init.h parse.c rfc2047.c rfc2231.c:
> >     Assume charset of messages if not declared ($assumed_charset)
> >     (TAKIZAWA Takashi)
> 
> not in at this point.

Hmmm...

> > globals.h init.h mutt.h sendlib.c:
> >     Fix a bug in forwarding messages as MIME attachment ($file_charset)
> >     (TAKIZAWA Takashi)
> 
> I don't think I get what bug is being fixed here; not in.

Sorry, Maybe I've forgotten to explain it.
It is apparently a bug.
0) Set $charset to something different from us-ascii.
1) Forward a message which is neither us-ascii nor
 your $charset, as MIME attachment (message/rfc822).
2) Look! The message has *us-ascii* MIME attachment.
 And it's heavily garbled.

For you are a programmer, I give you another explanation.
Look at sendlib.c (line 812):

: /*
:  * Find the first of the fromcodes that gives a valid conversion and
:  * the best charset conversion of the file into one of the tocodes. If
:  * successful, set *fromcode and *tocode to dynamically allocated
:  * strings, set CONTENT *info, and return the number of characters
:  * converted inexactly. If no conversion was possible, return -1.
:  *
:  * Both fromcodes and tocodes may be colon-separated lists of charsets.
:  * However, if fromcode is zero then fromcodes is assumed to be the
:  * name of a single charset even if it contains a colon.
:  */
: static size_t convert_file_from_to (FILE *file,
:                                   const char *fromcodes, const char *tocodes,
:                                   char **fromcode, char **tocode, CONTENT 
*info)

But the only one place using this function is (line 936):

:    char *chs = mutt_get_parameter ("charset", b->parameter);
:    if (Charset && (chs || SendCharset) &&
:       convert_file_from_to (fp, Charset, chs ? chs : SendCharset,
:                             0, &tocode, info) != (size_t)(-1))
:    {

So, the fourth argument (char **fromcode) is not used at all.
That makes a forwarded message us-ascii.

And you must notice that the second argument (char *fromcodes) takes
colon-seperated list of charsets, but mutt just uses Charset.
So, file_charset patch simply uses the ability mutt already has.

> > main.c:
> >     Correct "mutt -h"
> >     (Oswald Buddenhagen)
> 
> Committed.
> 
> > init.h doc/manual.sgml.head doc/manual.sgml.tail:
> >     Trivial documentation fixes
> >     (TAKAHASHI Tamotsu, Brendan Cully, Paul Walker, Derek Martin)
> 
> Committed.

Great!
That's a good news because I'm working on ja.po and Japanese manual.

-- 
tamo

Follow-Ups:
- For 1.5.9: assumed_charset
  - From: TAKAHASHI Tamotsu

Prev by Date: [1.5.8] gpgme support broken
Next by Date: Re: [Announce] 1.5.8 on its way to mirrors.
Previous by thread: Re: gpgme support broken
Next by thread: For 1.5.9: assumed_charset
Index(es):
- Date
- Thread