<<< Date Index >>>     <<< Thread Index >>>

Re: Q: View as Windows-1252?



On Fri, Aug 03, 2007 at 10:21:52AM -0500, Kyle Wheeler wrote:

> On Friday, August  3 at 04:55 PM, quoth Kai Grossjohann:
> > I think this applies to bad charset specifications.  But in my case 
> > I notice that Ctrl-E either shows me charset=utf-8 (where the 
> > message is in Windows-1252), or charset=us-ascii (msg also in 
> > Windows-1252).
> 
> Ahh, interesting. Well, the latter is easily remedied; windows-1252 is 
> *also* a superset of us-ascii (so this hook won't harm anything):
> 
>     charset-hook us-ascii windows-1252
> 
> The other one is... well, downright malicious! Out of curiosity, what 
> mail client is composing messages mislabelled utf8 like that?

I confess that I have no idea.  Actually, I already had a value of
assumed_charset and of charset, perhaps that did it.  I had:

set charset=utf8
set assumed_charset=utf-8:windows-1252:iso-8859-1

Perhaps the order of windows-1252 and iso-8859-1 was reversed.  I
thought that this was a smart move, because if decoding as UTF-8 works,
then it's probably going to be UTF-8.

> >> I have a bunch of hooks like this to fix known bad charsets. The 
> >> 'assumed_charset' feature is also really really useful:
> >> 
> >>     set assumed_charset=us-ascii:windows-1252:utf-8
> >
> > I didn't use this because it says "only the first content is valid for 
> > the message body".  But I guess it doesn't hurt to try.
> 
> Hmm, that's a badly worded man-page entry. I think it means one of two 
> things (both of which are, I think, true): either it's saying that 
> only the first charset that is valid for the message will be used 
> (i.e. if windows-1252 is a valid way of interpreting the message, 
> utf-8 will not be tried---this is especially important for asian 
> charsets, where in most cases there's no way to tell if the charset 
> produced random garbage or not),

Hm.  But surely the same thing applies to the header?  So why was it
explicitly talking about the message body?

It seems strange to me to say that it tries all charsets for decoding
the header, even after finding a charset that works.  For then, if more
than one charset works, how would Mutt select one?

> OR it's saying that if your message 
> comes in multiple parts, the charset that is found to be acceptable 
> for the first part will be used for all subsequent parts.

Sounds plausible.

> But this won't work at all for you, I think, because it only applies 
> to parts of the message without any charset indication, and your 
> problem is incorrect charset labelling.

I think I am confused.  Perhaps the situation is this:

The message is sent without a charset indication.  But when I hit
Ctrl-E, a charset is included in the Content-Type header that I can
edit.

And perhaps Mutt was putting utf-8 there after Ctrl-E because that was
the first entry in assumed_charset.

But then, why didn't it try the whole list in the first place?  Then it
would have discovered the correct charset and wouldn't have displayed
question marks for the non-ascii characters.

Very strange situation.  Apologies for not investigating the situation
fully before asking here.

Kai