<<< Date Index >>>     <<< Thread Index >>>

Re: utf8 file corruption after transmission over email



On Fri, May 08, 2009 at 06:04:42PM -0500, Kyle Wheeler wrote:
> On Friday, May  8 at 03:00 PM, quoth Aaron S.:
> > I have a mystery that I'm trying to solve to no avail.
> 
> Hopefully we can help!
> 
> > I got a little sample XML (utf-8) encoded file that I'm trying to 
> > send as attachment. When I attach it, mutt correctly identifies it: 
> > [text/plain, 8bit, utf-8, 0.3K], since there are non-ASCII 
> > characters, in this case there is only 1 such character.
> 
> Well, actually, that's an incorrect identification. It's NOT a 
> text/plain file, it's an xml file. According to RFC 3023, it should 
> either be sent as application/xml or as text/xml.
> 
> Now, that misidentification shouldn't cause the problem you're having, 
> but correcting it *probably* will fix the problem. I bet that if you 
> add the following to your ~/.mime.types file, the problem goes away:
> 
>      application/xml     jff
> 
> > After I send it, this attached file becomes currupt.
> 
> I tried sending your file to myself, both with and without that line 
> in my mime.types file, and the file didn't get corrupted either way.
> 
> My guess is that this is ACTUALLY your mail server's fault (did you 
> send it through an MSFT Exchange server maybe? They're really bad 
> about this). Here's what I think happened: you have configured mutt to 
> send things in 8-bit mode (i.e. $allow_8bit). Thus, when sending a 
> utf-8 file attachment with an unusual character in it, mutt sent it 
> completely unmodified, because that's supposed to be safe to do when 
> sending in 8-bit mode. But some servers (and I've had this happen more 
> often than not with Exchange servers) attempt to convert all messages 
> into 7-bit form. Unfortunately, they're often very bad at it. I've had 
> several messages corrupted by Exchange servers simply because they 
> couldn't handle curly-quotes correctly. It's happened often enough 
> that I finally just unset allow_8bit so that mutt would always take 
> care of encoding my messages in a 7-bit safe manner, because mutt is 
> so much better at it than they are.
> 
> Anyway, does that help?
Hello,
Well, it was sent from @gmail to another @gmail
account. I have no idea what they run there at google. I thought about
adding that to mime.types and it does work.
What bothers me is that now I have to pay much closer attention as to
when I'm attaching strange files.
I'm gonna have to think of a way to intercept whatever mutt is sending
out to make sure it's not mutt that messes up this 3byte UTF-8
character.

In any case, thanks for your pointers.