<<< Date Index >>>     <<< Thread Index >>>

Re: japanese text in email body



 On Thursday, June 17, 2004 at 6:40:21 AM +0900, Henry Nelson wrote:

> How do you all do with the character "??"?  My use of this (a one
> inside of a circle) caused a bit of havoc in a recent e-mail.

    You mean "①", the U+2460 circled digit one? In your mail it was
replaced by a pair of question marks.

    Humm... According to the Glibc 2.3.2 charmap tables, this character
is not part of EUC-JP. It exists only in:

| glibc-2.3.2/localedata/charmaps>$ grep -l U2460 *
| BIG5-HKSCS
| CP949
| EUC-JISX0213
| EUC-KR
| EUC-TW
| GB18030
| GB2312
| GBK
| JOHAB
| SHIFT_JISX0213
| UTF-8

    Even if your EUC-JP terminal is in some way enhanced and has U+2460,
iconv is not aware of this, and will fail to convert it from EUC-JP to
anything else.

    What hex are the 2 bytes encoding circled one on your terminal? Here
3 bytes on UTF-8:

| $ echo -n "①" | hex
| E2 91 A0


    [sending EUC or UTF]
> There is however the problem of assuming your recipient will have a
> "modern mailer". [...] some of my students use pretty old equipment
> and OSs

    I guess that when they receive 2022-JP, it's OK. But that when they
receive EUC-JP, Shift-JIS, or UTF-8, it has 2/3 risks to fail, as soon
as the term doesn't match... Or are your students behind your magic
NKF all-to-EUC converter?


>> When you post a new message [snip advice .sig]
> I knew this. Sorry if I messed up.

    You are easely excused: It was a general advice .sig not directed at
you at all. You've done nothing bad. ;-)


    [3 shoguns]
> I don't view these other than as "=1B$B>..." or "=BE=AD...", i.e., an
> equal sign followed by random letters or symbols?

    What you quoted is verbatim what I sent, but in raw encoded form.
For everyone Mutt shows this with multipart structure hidden, decoded,
and nicely rendered.

    I'm affraid your nasty procmail rule or gawk script to auto-detach
and remove attachments are at fault. This destroys the MIME structure
and loses necessary informations. After that, no mailer has a chance to
properly decode the mess.

    You may try to <edit-type> (^E) my mail and replace the prompt by:

| multipart/mixed; boundary="ZGiS0Q5IWpPtfppv"

    ...but I'm not sure of the result. You will perhaps gain the shoguns
but lose my text.


 On Wednesday, June 16, 2004 at 4:49:04 PM -0700, Dave Driscoll wrote:

> Date: Wed, 16 Jun 2004 16:49:04 -0700

    Your clock or timezone seems ahead by 3 hours.


> The Japanese you sent above displays correctly in mutt.

    You see the 2 double-wide Kanjis for shogun? Good! This probably
means your $charset is OK and matches your terminal: What gives
":set ?charset" in mutt? What is your locale? What is your terminal?


> when I start a reply using emacs it shows the text incorrectly.

    And my shogun you quoted is garbled: Result are 6 chars, mostly
replacement chars U+FFFD, and 2 random Kanjis U+5030 and U+803B.

    The editor for Mutt should read and write in the current terminal
charset, perhaps instructed by the locale, without trying to be smart at
recognising and converting. Just configure Emacs this way, and it should
work better.


> Emacs does show the following snip from an email from Japan correctly
> but mutt does not.

    Unfortunately your example was also garbled: Mostly replacement
chars. Can't infer nothing sure. Only wild guess that the charset of
mail was not the announced 2022-JP, that Emacs autodetects real charset
and succeeded, while Mutt trusts the false 2022 label and failed.


> This snip always shows incorrect in an xterm window. In a kterm window
> cat and more work but less does not.

    What is the charset of your xterm? And of your kterm?


 On Thursday, June 17, 2004 at 6:53:00 AM +0900, Henry Nelson wrote:

> On Wed, Jun 16, 2004 at 04:49:04PM -0700, Dave Driscoll wrote:
>> On Wed, Jun 16, 2004 at 12:58:16PM +0200, Alain Bench wrote:
>>> [shogun]
> Why is this totally garbled for me?

    And a second level of garbling... I wrote 2 Kanjis, Dave quoted 6
other chars, you quoted 10 totally different chars! At least, what you
see and describe matches what you quote, unlike Dave. ;-)


> Any clues?

    Dave's mail was UTF-8. Your procmail rules wrongly interpreted it as
being EUC-JP, and converted it, or overwrited the label. Or something
like that. I get the same strange chars as you if I iconv Dave's raw
UTF-8 mail from EUC-JP to ISO-2022-JP. After such corruption, no mailer
can display it right. Not Mutt's fault.


Bye!    Alain.
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?