<<< Date Index >>>     <<< Thread Index >>>

Re: japanese text in email body



 On Saturday, June 19, 2004 at 7:35:18 AM +0900, Henry Nelson wrote:

> On Fri, Jun 18, 2004 at 06:39:14PM +0200, Alain Bench wrote:
>> You mean "???", the U+2460 circled digit one? In your mail it was
>> replaced by a pair of question marks.
> Would doing "^E" and changing charset to "iso-2022-jp" display the
> character? (Maybe not since should already be labeled okay.)

    No, there were really 2 raw Ascii question marks. Consequence of
char not existing in libiconv's idea of EUC-JP. Mutt can't display, nor
give properly as quote to editor in reply. And when you type one in
editor, Mutt can't select a $send_charset to convert it. So Mutt
replaces it by as many question marks as bytes in the char.


>> if your EUC-JP terminal is in some way enhanced and has U+2460, iconv
>> is not aware of this
> I compile iconv with "--enable-extra-encodings"; maybe this makes it
> aware?

    Libiconv 1.9.2? Very interesting: --enable-extra-encodings doesn't
change EUC-JP, but adds an "EUC-JISX0213" charset.


> TeraTerm is set to receive and send "EUC"
> % echo -n "??" | hexdump              ## between quotes is (1)
> 0000000 ada1

    Humm... Grepping Glibc tables, there is only one charset that has
circled one U+2460 coded ADA1: EUC-JISX0213.

| glibc-2.3.2/localedata/charmaps>$ grep -i "u2460.*xad.*xa1" *
| EUC-JISX0213:<U2460>     /xad/xa1     CIRCLED DIGIT ONE

    This charset seems very near EUC-JP: Vast majority of common chars,
coded the same. Some hundreds of chars more (like this circled one), and
some hundreds of chars less (like "™" the trade mark sign U+2122 (little
superscript letters TM) coded 8FA2EF in EUC-JP).

    I'd advice you experiment creating or selecting a locale with this
charset, and setting $charset accordingly. Perhaps TeraTerm manual has
infos? I said _experiment_, thoroughly, this has effects on some Kanjis.


> conversion _from_ ISO-2022-JP _to_ EUC-JP is triggered by characters
> between [a raw escape followed by "$B"] and [a raw escape followed by
> "(" and a "B" or "J"]. Are you saying that UTF-8 is stuck between the
> same two tags?

    Well no: No escape sequences changing mode in UTF. Each char is
independent, coded by a variable number of bytes. All bytes are >= 128
but normal Ascii.


 On Saturday, June 19, 2004 at 6:11:13 PM +0900, Henry Nelson wrote:

> when I hit 'L' to follow up, all three of the "shoguns" were displayed
> correctly in the editor (nvi)

    Mutt has converted each shogun from it's individual MIME charset to
your EUC $charset, and gave the result to editor. Nvi saved reply in EUC
$charset. Upon sending Mutt converted from EUC $charset to the best
suited $send_charset, here ISO-2022-JP. All is well. :-)


> Sometimes, though, I actually like to see the "raw" mail.

    Me too: I made a one key macro to pipe to less in $pipe_decode=no
mode. And another key piping to "LC_ALL=C less" to see in hex the raw
bytes of not Ascii chars. Of course "less" is configured to follow the
locale (no forced LESSCHARSET/LESSCHARDEF definitions).


> how to get mutt to:
> 1) convert and save all (Japanese) mails in euc-jp (so I can view and
> edit them with editors or other software which do not have automatic
> detection of the character set).

    Well <decode-copy> (<Esc>C by default) while $charset=euc-jp.


> 2) save the "record" file in euc-jp.

    No way. But <decode-copy> a mail when needed.


 On Sunday, June 20, 2004 at 8:04:39 AM +0900, Henry Nelson wrote:

> When I try to reply to my own message I can't read the Japanese when
> inside the editor (nvi). Seeing "\xbe\xad\xb7\xb3".

    These are the 4 bytes encoding shogun in EUC. Should have been
displayed Kanji as good as when replying to my mail... Note your quote
here was EUC-JP announced EUC-JP, good, but had a spurious "ESC ) B"
prepended.


>> I will attempt to attach files with the circled digits (1), (2) and
>> (3) in euc-jp, iso-2022-jp, shift-jis and utf-8.
> This turned out to be a total disaster. None of them are right.

    Yes. And the labels were false too (charset=us-ascii or
charset="euc-jp:iso-2022-jp:utf-8"). Possibly a bad interaction between
JA-patch's $file_charset autosense and Mutt's $send_charset autoselect.

    When wanting to force a given sending charset, I generally
<edit-type> at compose (^T), set desired charset, and reply "no" to
"Convert to [said charset] upon sending? ([yes]/no):" prompt. This
bypasses the autosense/autoselect thing, and just sends file as is, not
converted, labelled as requested.

    OTOH what have you done? I was unable to reproduce such invalid
"euc-jp:iso-2022-jp:utf-8" label, out of forcing it explicitly, or
setting $charset to it and counting on unconvertability.


 On Sunday, June 20, 2004 at 2:36:19 PM +0900, Henry Nelson wrote:

> Is there some way to get mutt to pass the mime-decoded message and
> inlines to your editor?

    When you reply? Yes, that's the normal way: Decoded and converted to
$charset.


Bye!    Alain.
-- 
« Be liberal in what you accept, and conservative in what you send. »
        Jon Postel / Robustness Principle / RFC 1122