Re: URLs screwed in the mail body
Kyle,
On Sun, Mar 9, 2008 at 1:58 PM, Francis Moreau <francis.moro@xxxxxxxxx> wrote:
> On Sat, Mar 8, 2008 at 6:15 PM, Kyle Wheeler <kyle-mutt@xxxxxxxxxxxxxx>
> wrote:
> > On Saturday, March 8 at 02:10 PM, quoth Francis Moreau:
> >
> > > Actually it would be better to fix the source of the problem instead
> > > of trying to find a workaround... But I don't know where these URLs
> > > get splitted at first. Perhaps you could enlight me ?
> >
> > Well, the SMTP RFC specifies a recommended maximum line length of 78
> > characters, and a hard limit of 998. If a URL is longer than 78
> > characters, many email clients will split the URL.
> >
> > One way that it's sometimes done, for example by Apple's email client,
> > is as a format=flowed email with delsp=yes. What that does is allows
> > them to indicate (with a single space at the end of the broken line)
> > that the next line should be appended without a space separating them.
> >
>
> My last example of screwed URL has been sent by outlook. format=flowed
> was not set.
> fix the common case once
we agree on it.
> So I guess there's nothing in the email header that could indicate that
> the email has been reformatted to have a maximum length set to 78.
>
> In this case, your approach seems correct: we have no other way to parse
> the email body and try to detect splitted email. If we found some, then we
> need to resplice them.
>
OK, I though more about it and I think the situation is quite bad ;)
Here are the different use cases I can think of:
"""
this is an example of an URL not splitted http://www.splitted.ex/foo
and the next word following the URL is not part of it.
this is an example of an URL not splitted http://www.splitted.ex/foo
but with a trailing space and the next work is not part of the URL.
this is an example of a splitted URL http://www.splitted.ex/foo
/bar/file.html. There's no trailing whitespace.
this is an example of a splitted URL http://www.splitted.ex/foo
/bar/file.html but with a trailing white space.
this is an example of a splitted URL http://www.splitted.ex/foo/
file.html with no trailing white space. This time there's no way
to detect that the next word following the URL is part of it because
it is not a path.
this is an example of a splitted URL http://www.splitted.ex/foo/
file.html with a trailing white space. This time there's no way
to detect that the next word following the URL is part of it.
"""
Looking at them, I can't think of a script that could fix the URL
when needed.
Probably the best we could do is to write a script with 2 modes:
normal and agressive.
The normal mode would fix the easiest and common case
whereas the agressive mode would fix the special cases but
break the common case. We could bind each mode to different
shortcuts.
What do you think ?
--
Francis