<<< Date Index >>>     <<< Thread Index >>>

Re: japanese text in email body



[Alain, sorry for the confusion.  I'll move back on list, one point
at a time so others do not lose context.]

On Fri, Jun 18, 2004 at 06:39:14PM +0200, Alain Bench wrote:
>  On Thursday, June 17, 2004 at 6:40:21 AM +0900, Henry Nelson wrote:
> 
> > How do you all do with the character "??"?  My use of this (a one
> > inside of a circle) caused a bit of havoc in a recent e-mail.
> 
>     You mean "???", the U+2460 circled digit one? In your mail it was
> replaced by a pair of question marks.
> 
>     Humm... According to the Glibc 2.3.2 charmap tables, this character
> is not part of EUC-JP. It exists only in:
> 
> | glibc-2.3.2/localedata/charmaps>$ grep -l U2460 *
> | BIG5-HKSCS
> | CP949
> | EUC-JISX0213
[...]
> | SHIFT_JISX0213
> | UTF-8
> 
>     Even if your EUC-JP terminal is in some way enhanced and has U+2460,
> iconv is not aware of this, and will fail to convert it from EUC-JP to
> anything else.

>From your many hints I finally have discovered that iconv WILL convert
among the Japanese character sets that I loosely referred to as iso-2022-jp,
shift-jis and euc-jp; it's just that you have to correctly identify the
extended-capability charset.  (This is unlike nkf, which, as you guessed,
is very lax in interpreting "-E" to accept either "euc-jp" or "euc-jisx0213"
as input.)

Specifically:
% echo "(1)(2)(3)" > circle123.euc.txt   ## (n) represents the circled digit
% hexdump circle123.euc.txt
0000000 a1ad a2ad a3ad 000a
0000007
% iconv -f euc-jisx0213 -t iso-2022-jp-3 < circle123.euc.txt > circle123.iso.txt
% hexdump circle123.iso.txt
0000000 1b24 284f 2d21 2d22 2d23 1b28 420a
000000e
% iconv -f euc-jisx0213 -t cp932 < circle123.euc.txt > circle123.pck.txt
% hexdump circle123.pck.txt
0000000 8740 8741 8742 0a00
0000007
% iconv -f euc-jisx0213 -t utf-8 < circle123.euc.txt > circle123.utf.txt
% hexdump circle123.utf.txt
0000000 e291 a0e2 91a1 e291 a20a
000000a

NOW, how can I put this knowledge to work to get Mutt to display characters
represented in euc-jisx0213, and also to pass them to my editor when I reply
to mail?

As you notice, I can produce and display those circled digits on the command
line of my shell via the terminal emulation TeraTermPro.  I can also use
them in my editor.  In other words, it seems that my locale (LC_CTYPE),
"ja_JP.eucJP" includes the euc-jisx0213 character set.  How do I let Mutt
know about that capability and get iconv to do the right conversion?

I think another way of putting it is: how can I get Mutt to behave like
nkf and accept both euc-jp and euc-jisx0213?

With this mail I am trying charset="euc-jisx0213".  I wonder if you can see
this left-pointing double angle quotation mark ("<<"): "≪"?  (This mail
is composed in nvi, mutt takes it and passes it directly to sendmail.  There
are no filters or other processing applied to outgoing mail.)

With charset="euc-jisx0213" the "≪" included a in your private mail
shows up as "\251\250".

-- 
henry nelson
 | day job: | http://yuba.kcn.ne.jp/biorec/nehan/henken.html