<<< Date Index >>>     <<< Thread Index >>>

Re: UTF-8 issues



On Tue, Dec 09, 2003 at 01:25:18PM -0200, Carlos Laviola wrote:
> On Tue, Dec 09, 2003 at 04:07:08AM -0500, David Yitzchak Cohen wrote:
> > On Tue, Dec 09, 2003 at 03:17:16AM -0200, Carlos Laviola wrote:

> > > I decided to jump on the UTF-8 bandwagon a few weeks ago and I'm having
> > > some weird problems with messages that are sent without any kind of
> > > indication that the message is ISO-8859-1 encoded (or worse, since a
> > > Subject, for instance, should specify the encoding and the accents and
> > > other special characters should be encoded, at least AFAIK).
> > 
> > Can you try sending me a problematic email?  I get loads of email from
> > all kinds of sources, and I haven't noticed any trouble with most mails
> > that aren't deliberately mislabeled by the sending MUA (like Outlook
> > likes to do, for instance).  Mutt simply assumes ISO-8859-1, AFAIK.
> 
> Well, sure.  There's a compressed maildir folder at
> 
> http://carlos.sna.cx/mutt/problematic_message.tar.gz

I took the message from there, and sendmail(1)ed it to myself.
Sure enough, it didn't display properly, so I had to ^E on it.
Even with that, though, the subject doesn't show up properly.  (I have
rfc2047_parameters set, BTW.  The subject isn't individually encoded,
though, so that has no effect here.)  Notice that the message itself
isn't in MIME at all, so I believe a recent post (with a patch) to the
mutt-dev list applies here: without MIME, Mutt essentially doesn't allow
internationalized headers, unless you apply his patch (which uses the
body charset for the header).  If you want, I'll forward you the post.
(The web-based archives for this list suck, so you're almost certainly
better off letting me forward the copy from my own archives to you.)

> Notice it lacks Content-Type, for instance...

It lacks MIME, plain and simple.  It's technically a pre-MIME message,
and Mutt has a totally different set of rules for it :-(

> I have others whose subject is incorrectly displayed just like in
> http://carlos.sna.cx/mutt/mutt.utf8.bug.index_view.png but whose body's
> accents are displayed properly, because (I assume) it defines
> 
> Content-Type: text/plain; charset="iso-8859-1"

Hmm ... that means the subject wasn't RFC 2047-encoded, most likely,
if it was a MIME message.  (If not, the same patch may do the trick,
inferring the header charset from the body charset.)

> whereas the "problematic message" doesn't even have a Content-Type
> header.

Yup, that's a big problem.  Only hardcoded knowledge of 8859-1 default
would help there.

> Can you see it properly with Mutt from CVS?

not even close ... I had to ^E it :-(

I'm guessing the only reason it works with 8859-1 terminals is that
the chars are displayed raw without stripping bit 8, and your terminal
interprets the chars correctly because they happen to be the same charset.

> > > website I setup with screenshots of these annoyances
> > > (http://carlos.sna.cx/mutt/) show that, somehow, just invoking edit-type
> > > (bound to ^E here) with its default "text/plain" argument causes what
> > > you saw change from the first to the second screen grab.
> > 
> > That's a mystery to me.  Maybe changing the content-type to text/plain
> > invokes Mutt's assumption-making code automatically?  beats me. . .
> 
> Yeah, that might be a bug in the version I run. (that has been fixed
> already?)

beats me ... if you can forward me a sample message, I can try doing
\e\n on it and report what happens. . .

> I guess I'll just compile mutt by hand and find out for myself, but
> please check anyway :-)

done :-)

 - Dave

-- 
Uncle Cosmo, why do they call this a word processor?
It's simple, Skyler.  You've seen what food processors do to food, right?

Please visit this link:
http://rotter.net/israel

Attachment: pgp2JXIusebzj.pgp
Description: PGP signature