<<< Date Index >>>     <<< Thread Index >>>

Re: [OT] ncursesw and UTF-8



Hi, Andrei, and thanks for your thoughtful reply :-)

On Tue, Nov 04, 2003 at 09:13:39AM +0100, Andrei A. Voropaev wrote:

> > My locale is setup as LC_ALL, LANG, and LC_CTYPE all set to es_ES.UTF-8
> > everywhere, running on a UTFed console ... and it works fine for many
> > programs - including my own (which use gettext).  In fact, the messages
> > are internationalized even in bash itself (albeit 8859-1, not UTF -
> > perhaps the note you made below applies here).  However, when I type an
> > á for instance, it appears as \303\241 instead (although the resulting
> > message when I try to execute it comes out correctly as "bash: á:
> > command not found").  Bash seems unique in this regard, as every other
> > program I can think of off-hand that I use has no trouble outputting
> > the actual characters I type.
> 
> I'm afraid that this is purely related to console configuration. Bash
> outputs UTF-8 sequence but console does not interpret it as such.

My off-handed guess is that my console must interpret them as such, since
in virtually all other programs (including elvis, which has no knowledge
of UTF) output the characters and not some weird escape sequences.

> Try to
> read the section on Linux console in the document I reference below. It
> mentions unicode_start and unicode_stop programs for making console
> understand UTF-8.

I recompiled my kernel with support for only UTF in the console driver,
so there's no need for unicode_start.  Regardless, though, I have
unicode_start running out of my startup scripts (since I never bothered
to remove it).

> > > In my case I make sure that
> > > there's absolutely no LC_ and no LANG variables in my enviroment.
> > 
> > Neither of those hurts to have if you never switch locales.
> 
> Yes. As long as you set them up to desired values. My desire is to have
> everything in english, and only show text in different languages.

Well, either way, I have only LC_ALL (es_ES.UTF-8) and LANG (es) setup
from the same bash script, so it doesn't really matter much.

> > Consoles don't need any of that fancy gunk ;-P
> 
> Consoles also don't work as good as X windows application :(

That's a matter of debate ... I like my console a lot better than X ;-)

> > > And of course it's worth mentioning that en_GB.UTF-8 locale was not
> > > available originally in my distribution so I had to create it using
> > > localedef program. Note that since this locale is UTF-8 you don't really
> > > care if it is en_GB or de_DE or whatever.
> > 
> > Do you happen to have the commandline available off-hand?  If not,
> > I'll just look it up; but if yes, you can save me some time.
> 
> This comes from
> http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/html_single/Unicode-HOWTO.html
> 
> localedef -v -c -i de_DE -f UTF-8 de_DE.UTF-8

Thanks a bunch ... I know I don't have the es_ES.UTF-8 locale installed.
My gettext autoconverts everything on-the-fly from 8859-1 to UTF-8.
Having a UTF version available should avoid calls to libiconv :-)

> In place of 'de_DE' anything else can be put. I've used 'en_GB'. But if
> you already have es_ES.UTF-8 then most likely you don't need this
> program. Just look into the section for Linux Console.

Thanks again,
 - Dave

BTW - It's worth pointing out that even bash's messages are
internationalized correctly and autoconverted to UTF before being
displayed.  It's just that bash apparently refuses to acknowledge
that these characters are printable, and so gives them the typical
unprintable-character behavior: printing the escape sequence instead.
Now, since a funky UTF character is actually two characters as far as
bash knows, it prints two escape sequences every time.  I'm willing to
bet that simply relinking bash will do the trick.

-- 
Uncle Cosmo, why do they call this a word processor?
It's simple, Skyler.  You've seen what food processors do to food, right?

Please visit this link:
http://rotter.net/israel

Attachment: pgpK0ovltGlJK.pgp
Description: PGP signature