<<< Date Index >>>     <<< Thread Index >>>

Re: Different encodings at index and pager views



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday, February 26 at 08:07 PM, quoth Carlos Pita:
> I'm noticing different encoding behavior for headers displayed at 
> the index view and those at the pager view for the same email. For 
> example, at the index view I can see subjects like: "?Opin? sobre 
> Bebidas Alcoh?licas y gan? una TV LCD o una Notebook!", while the 
> right chars show instead of ? at the pager view.

Strange!

> First, I want to discard a number of usual causes:
>
> * It's not a cache header issue. I deleted the cache after each related 
> configuration change.

Fair enough.

> * I'm accessing to gmail via imap

Ahhh, fun. Could be trouble.

> but it's not a case of bad encoding by them.

Are you sure? Compile mutt with debugging (configure with the 
- --enable-debug flag) and get a debug trace. That will show you EXACTLY 
what the Gmail server sent you, so that we can be certain whether this 
is a mutt problem or a Gmail problem. It wouldn't be the first time 
that Gmail has gotten encoding issues completely wrong with their IMAP 
service.

Here's something to consider: the index display and the pager display 
are the results of different IMAP commands. The index is generated by 
asking the server for specific headers (e.g. the subject header). The 
pager is generated by asking the server for the body of the email, 
which is then parsed by mutt. Its entirely possible for the IMAP 
server to give different answers to the two questions.

> True, the headers I'm receiving are not 2047 encoded, they are just 
> iso-8859-1 or utf-8 encoded, but in theory I'm forcing their charset 
> by means of assumed_charset (more on this below).

You're not *forcing* the charset, you're *guessing* the charset. 
There's a semantic difference (of course, we computer folk love to 
have semantic arguments, so... ;)

The thing to keep in mind here is that non-ascii characters in email 
headers are *FORBIDDEN*. They're not just a bad idea, they're flat 
illegal. Of course, idiotic email programs (and spammers) still 
generate them, but my point is that you're dealing with some *broken* 
email here. Every piece of software that touches this email gets to 
make its own decisions about how to handle it, and the answers don't 
always line up in your favor.

> * I made some tests with my locale configured to en_US.ISO-8859-1 
> and then to en_US.UTF-8.

I presume your terminal is capable of understanding UTF-8 characters?

> I also tested disabling muttrc charset setting, and forcing it to my 
> current locale, whatever it were. It did no difference at all.

Maybe not, but generally speaking, it's a very VERY bad idea to set 
the $charset manually (unless you really know what you're doing - I 
know of only one situation where it's even useful, much less a good 
idea). Mutt is very good at figuring out the correct charset to use; 
to the point that if mutt guesses wrong, then your system libraries 
are pretty much guaranteed to be giving you incorrect answers.

> To make things weirder, some non 2047 encoded headers are shown 
> correctly at both views. For example, I have a utf-8 email and a latin-1 
> email, both with their subject headers encoded in the respective charset 
> (I verified this editing the raw emails with e). No matter what my 
> locale is the utf-8 email subject is correctly displayed while the 
> latin-1 one isn't. Also assumed_charset=iso-8859-1 doesn't fix the 
> problem for the latin-1 message.

Hmmm. Interesting. This is from the Gmail server?

Sounds like the problem *may* be with the Gmail server. I think you 
need to make *sure* that it's not.

> This begins to feel like random behavior but there is another aspect 
> that could be making the difference: the email that is looking bad is a 
> multipart one, with no charset specified at the main Content-Type:
> multipart/alternative header; but the well behaved email is single part, 
> with Content-Type: text/plain; charset=UTF-8.

You're right, that's probably the difference. That also suggests a 
Gmail issue (because mutt doesn't use that header to learn about the 
possible header character sets).

Anyway - step 1 is to find out *exactly* what the conversation between 
mutt and gmail looks like.

Once you've compiled mutt with the debugging support, run mutt with 
the '-d5' flag. That will create the file ~/.muttdebug0, which will be 
very verbose, but buried in there is the verbatim IMAP conversation. 
If you can't isolate the part relating to these messages, delete your 
username and password information out of there and post that file 
somewhere where we can see it.

~Kyle
- -- 
We act as though comfort and luxury were the chief requirements of 
life, when all that we need to make us really happy is something to be 
enthusiastic about.
                                                    -- Charles Kingsley
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iEYEARECAAYFAkmnKqMACgkQBkIOoMqOI14rdwCgnuv7GiX6LMEkhsVAZyYEOFTf
qMgAoOZ8iiorDab/YV7uDy6yc7J27QVy
=bpcx
-----END PGP SIGNATURE-----