Re: japanese text in email body
[Alain, sorry for the confusion. I'll move back on list, one point
at a time so others do not lose context.]
On Fri, Jun 18, 2004 at 06:39:14PM +0200, Alain Bench wrote:
> On Thursday, June 17, 2004 at 6:40:21 AM +0900, Henry Nelson wrote:
>
> > How do you all do with the character "??"? My use of this (a one
> > inside of a circle) caused a bit of havoc in a recent e-mail.
>
> You mean "???", the U+2460 circled digit one? In your mail it was
> replaced by a pair of question marks.
>
> Humm... According to the Glibc 2.3.2 charmap tables, this character
> is not part of EUC-JP. It exists only in:
>
> | glibc-2.3.2/localedata/charmaps>$ grep -l U2460 *
> | BIG5-HKSCS
> | CP949
> | EUC-JISX0213
[...]
> | SHIFT_JISX0213
> | UTF-8
>
> Even if your EUC-JP terminal is in some way enhanced and has U+2460,
> iconv is not aware of this, and will fail to convert it from EUC-JP to
> anything else.
>From your many hints I finally have discovered that iconv WILL convert
among the Japanese character sets that I loosely referred to as iso-2022-jp,
shift-jis and euc-jp; it's just that you have to correctly identify the
extended-capability charset. (This is unlike nkf, which, as you guessed,
is very lax in interpreting "-E" to accept either "euc-jp" or "euc-jisx0213"
as input.)
Specifically:
% echo "(1)(2)(3)" > circle123.euc.txt ## (n) represents the circled digit
% hexdump circle123.euc.txt
0000000 a1ad a2ad a3ad 000a
0000007
% iconv -f euc-jisx0213 -t iso-2022-jp-3 < circle123.euc.txt > circle123.iso.txt
% hexdump circle123.iso.txt
0000000 1b24 284f 2d21 2d22 2d23 1b28 420a
000000e
% iconv -f euc-jisx0213 -t cp932 < circle123.euc.txt > circle123.pck.txt
% hexdump circle123.pck.txt
0000000 8740 8741 8742 0a00
0000007
% iconv -f euc-jisx0213 -t utf-8 < circle123.euc.txt > circle123.utf.txt
% hexdump circle123.utf.txt
0000000 e291 a0e2 91a1 e291 a20a
000000a
NOW, how can I put this knowledge to work to get Mutt to display characters
represented in euc-jisx0213, and also to pass them to my editor when I reply
to mail?
As you notice, I can produce and display those circled digits on the command
line of my shell via the terminal emulation TeraTermPro. I can also use
them in my editor. In other words, it seems that my locale (LC_CTYPE),
"ja_JP.eucJP" includes the euc-jisx0213 character set. How do I let Mutt
know about that capability and get iconv to do the right conversion?
I think another way of putting it is: how can I get Mutt to behave like
nkf and accept both euc-jp and euc-jisx0213?
With this mail I am trying charset="euc-jisx0213". I wonder if you can see
this left-pointing double angle quotation mark ("<<"): "≪"? (This mail
is composed in nvi, mutt takes it and passes it directly to sendmail. There
are no filters or other processing applied to outgoing mail.)
With charset="euc-jisx0213" the "≪" included a in your private mail
shows up as "\251\250".
--
henry nelson
| day job: | http://yuba.kcn.ne.jp/biorec/nehan/henken.html