Re: w3m can't show html mail with charset: gb2312

To: bxuefeng <phyrster@xxxxxxxxx>
Subject: Re: w3m can't show html mail with charset: gb2312
From: Haizi Zheng <haizi_zh@xxxxxxxxxxxx>
Date: Wed, 20 Jul 2005 15:20:43 +0800
Cc: mutt-users@xxxxxxxx
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.cn; h=Received:Date:From:To:Cc:Subject:Message-ID:Reply-To:Mail-Followup-To:References:Mime-Version:Content-Type:Content-Disposition:Content-Transfer-Encoding:In-Reply-To:User-Agent; b=PRMaM+fV+87M5Riu32G+A8xcKXL63Jb+NmaZD+IJNP9BdlnkgCjqQ9GVM0HoKkQmB70/DwkW02hk+Nb9QlQWMqgCcISCF5bxO4D2b2RmlCP0p6WqsEGLZGzs5my+hNC/NGgktPWqfedM1xA0uy8+9pd/fp/M36ApPdtKhQQF5yc= ;
In-reply-to: <20050720011812.GA7116@xxxxxxxxxx>
List-unsubscribe: <mailto:mutt-users-request@mutt.org?body=unsubscribe>
Mail-followup-to: bxuefeng <phyrster@xxxxxxxxx>, mutt-users@xxxxxxxx
References: <20050719140529.GA6229@xxxxxxxxxx> <20050719193330.GB6133@xxxxxxxxxxxxxxxx> <20050720011812.GA7116@xxxxxxxxxx>
Reply-to: Haizi Zheng <haizi_zh@xxxxxxxxxxxx>
Sender: owner-mutt-users@xxxxxxxx
User-agent: Mutt/1.5.9i

After checking the mail header, now I think I know what the problem is.

On Wed, Jul 20, 2005 at 09:18:12AM +0800, bxuefeng wrote:
>
> Here is a sample of those garbled messages:
> ==================
> Date: Mon, 18 Jul 2005 08:14:41 +0800
> From:刘德华 <andayd@xxxxxxxxx>
> Reply-To: 刘德华 <andayd@xxxxxxxxx>
> To: ze phyr <phyrster@xxxxxxxxx>
> Subject: Re: 有吗
> Mime-Version: 1.0
> Content-Type: multipart/alternative;
> Status:
> RO
> Content-Length:
> 1998
> Lines:
> 34
>

Have you noticed that in fact the subject line (as well as the address 
lines) are totally nuked? Where naked means that the characters are plain,
they haven't been encoded. An encoded one must looks like this (for example):

Subject: =?gb18030?B?suLK1A==?=

This one told mutt and many other mail client softwares that the subject
is gb18030 encoded, so they now know how to display them.

But if the subject is plain, mutt has no idea about what coding system should
be applied on, so it will use the default setting.

Since your locale is en_US.UTF-8, and can display the subject correctly, so
I guess the plain subject is IN FACT UTF-8 ENCODED. But I'm really not sure 
about this, you'd better check it out, which will help us to settle this 
problem. You can press `e' and call the editor out, say Emacs, then if your
editor is powerful enough, you can get some information about the coding system.

> Content-Type:
> text/html;
> charset=GB2312
> Content-Transfer-Encoding:
> base64
> Content-Disposition:
> inline

As the `charset' indicates, the body is gb2312 encoded, so if you apply utf-8
on the text, surely you can't get the right result!

Here's a summary: 

1. The subject line is plain by mistake, so mutt doesn't know which coding
system should be applied on it.

2. The body is gb2312 encoded, which may be different from that applied on the
subject. Since the subject is plain, so it becomes really hard to display them
simutaneously.

I guess, you can write a program to settle this (just encode the subject 
correctly), 
which's not a hard work. Or you can manully call Emacs to edit it, where you can
change it to any coding system, as you wish, and this can settle this problem.
I think there are many other editors can also do this job, but I only know 
Emacs.

> 
> [-- Autoview using elinks -dump -dump-charset utf-8 -default-mime-type 
> text/html '/tmp/mutt9os81t' --]
> 
> 
> \277\311\304\334\312\307\304\343mutt\323ʼ\376\277\47\266\313 Encoding
> û\311\350\326ú\303
> \301\355\315\342,mutt is a text-based mail client for Unix operating
> systems,\266\370Gmail\265\304\323ʼ\376\317\326\324\332Ĭ\310\317\312Ǹ\273ý\314\345\270\361ʽ
> \323\303webmail
> \265\307½,Firefox,Mozilla,Opera,Konqeror\317¶\274OK.
> ==================
> 
> As you can see, Suject and User's name can be displayed correctly, mail body
> is rendered into meaningless numbers. If I change html viewer into: w3m and
> add a line like this in mailcap:
> 
> text/html; w3m -T text/html -O utf-8 -dump %s; copiousoutput
> 
> the message body becomes this: 
> ==================
> [-- Autoview using w3m -T text/html -O utf-8 -dump '/tmp/muttsNST1c' --]
>      羭    ҵ  ?
>                            ¥    ,            Linux        ,    ?
> 
>                                       ,              .
> ==================
> 
> you see, it becomes even worse. 
> 
> > Also, you'd better make sure that if the contents are really encoded as 
> > gb2312 
> > as declared in the header? Because some mail clients, especially some 
> > foolish 
> > webmail systems don't make sure that the really used code system is in 
> > accord 
> > with what declared in the message header. You'd better check it manually.
> 
> This garbled message is from gmail so I assume they should have done the
> right job. How can I check message header manually?
> 
> regards, 
> 
> 
> bxuef

Follow-Ups:
- Re: w3m can't show html mail with charset: gb2312
  - From: Alain Bench
- Re: w3m can't show html mail with charset: gb2312
  - From: Bob Proulx
- Re: w3m can't show html mail with charset: gb2312
  - From: bxuefeng

References:
- w3m can't show html mail with charset: gb2312
  - From: phyrster
- Re: w3m can't show html mail with charset: gb2312
  - From: Haizi Zheng

Prev by Date: Re: Maildir message sizes not loading?
Next by Date: Re: stop mutt from checking a specific IMAP connection for new mail
Previous by thread: Re: w3m can't show html mail with charset: gb2312
Next by thread: Re: w3m can't show html mail with charset: gb2312
Index(es):
- Date
- Thread