<<< Date Index >>>     <<< Thread Index >>>

Re: w3m can't show html mail with charset: gb2312



After checking the mail header, now I think I know what the problem is.

On Wed, Jul 20, 2005 at 09:18:12AM +0800, bxuefeng wrote:
>
> Here is a sample of those garbled messages:
> ==================
> Date: Mon, 18 Jul 2005 08:14:41 +0800
> From:刘德华 <andayd@xxxxxxxxx>
> Reply-To: 刘德华 <andayd@xxxxxxxxx>
> To: ze phyr <phyrster@xxxxxxxxx>
> Subject: Re: 有吗
> Mime-Version: 1.0
> Content-Type: multipart/alternative;
> Status:
> RO
> Content-Length:
> 1998
> Lines:
> 34
>

Have you noticed that in fact the subject line (as well as the address 
lines) are totally nuked? Where naked means that the characters are plain,
they haven't been encoded. An encoded one must looks like this (for example):

Subject: =?gb18030?B?suLK1A==?=

This one told mutt and many other mail client softwares that the subject
is gb18030 encoded, so they now know how to display them.

But if the subject is plain, mutt has no idea about what coding system should
be applied on, so it will use the default setting.

Since your locale is en_US.UTF-8, and can display the subject correctly, so
I guess the plain subject is IN FACT UTF-8 ENCODED. But I'm really not sure 
about this, you'd better check it out, which will help us to settle this 
problem. You can press `e' and call the editor out, say Emacs, then if your
editor is powerful enough, you can get some information about the coding system.

> Content-Type:
> text/html;
> charset=GB2312
> Content-Transfer-Encoding:
> base64
> Content-Disposition:
> inline

As the `charset' indicates, the body is gb2312 encoded, so if you apply utf-8
on the text, surely you can't get the right result!

Here's a summary: 

1. The subject line is plain by mistake, so mutt doesn't know which coding
system should be applied on it.

2. The body is gb2312 encoded, which may be different from that applied on the
subject. Since the subject is plain, so it becomes really hard to display them
simutaneously.

I guess, you can write a program to settle this (just encode the subject 
correctly), 
which's not a hard work. Or you can manully call Emacs to edit it, where you can
change it to any coding system, as you wish, and this can settle this problem.
I think there are many other editors can also do this job, but I only know 
Emacs.

> 
> [-- Autoview using elinks -dump -dump-charset utf-8 -default-mime-type 
> text/html '/tmp/mutt9os81t' --]
> 
> 
> \277\311\304\334\312\307\304\343mutt\323ʼ\376\277\47\266\313 Encoding
> û\311\350\326ú\303
> \301\355\315\342,mutt is a text-based mail client for Unix operating
> systems,\266\370Gmail\265\304\323ʼ\376\317\326\324\332Ĭ\310\317\312Ǹ\273ý\314\345\270\361ʽ
> \323\303webmail
> \265\307½,Firefox,Mozilla,Opera,Konqeror\317¶\274OK.
> ==================
> 
> As you can see, Suject and User's name can be displayed correctly, mail body
> is rendered into meaningless numbers. If I change html viewer into: w3m and
> add a line like this in mailcap:
> 
> text/html; w3m -T text/html -O utf-8 -dump %s; copiousoutput
> 
> the message body becomes this: 
> ==================
> [-- Autoview using w3m -T text/html -O utf-8 -dump '/tmp/muttsNST1c' --]
>      羭    ҵ  ?
>                            ¥    ,            Linux        ,    ?
> 
>                                       ,              .
> ==================
> 
> you see, it becomes even worse. 
> 
> > Also, you'd better make sure that if the contents are really encoded as 
> > gb2312 
> > as declared in the header? Because some mail clients, especially some 
> > foolish 
> > webmail systems don't make sure that the really used code system is in 
> > accord 
> > with what declared in the message header. You'd better check it manually.
> 
> This garbled message is from gmail so I assume they should have done the
> right job. How can I check message header manually?
> 
> regards, 
> 
> 
> bxuef