<<< Date Index >>>     <<< Thread Index >>>

Re: MB_LEN_MAX has to be more than 5 if mutt uses UTF-8



* Thu Jul 17 2008 Derek Martin <invalid@xxxxxxxxxxxxxx>
> On Thu, Jul 17, 2008 at 12:48:14PM +0900, TAKAHASHI Tamotsu wrote:
> > * Wed Jul 16 2008 Paul Walker <paul@xxxxxxxxxxxxxxx>
> > > On Wed, Jul 16, 2008 at 02:34:06PM +0900, TAKAHASHI Tamotsu wrote:
> > > 
> > > > +    while (mutt_iconv (cd, &ib, &ibl, &ob, &obl, inrepls, outrepl) == 
> > > > (size_t)-1)
> > > [...]
> > > > +      safe_realloc (&buf, len + obl + 1);
> > > > +    }
> > > 
> > > Is it possible to add some kind of limit condition on this? Call me
> > > paranoid, but I get nervous when I see reallocations in a loop with no 
> > > clear
> > > limits on the amount of memory it will try to take.
> > > 
> > > (Consider badly formed messages (deliberate or otherwise) and buggy iconv
> > > implementations.)
> > 
> > Good point.
> > Yes, MB_LEN_MAX*ibl works in most cases,
> > so it is "some kind of limit condition,"
> > though there is no limit _in theory_,
> > even with valid strings and settings.
> 
> But I think Paul is right...  I believe the patch you've posted is a
> bug.  Bear in mind, I only read the patch and didn't look at the code,
> so I may be missing pieces, in which case the rest of what I'm saying
> will probably be worthless.  ;-)  Assuming I'm not wrong:
> 
> Let's pretend for a moment that we have a case where every character
> in the input text needs to get converted to a string which has a
> lenght of  ibl * 8 bytes.  If I read it correctly, your patch will go
> into an infinite loop, with the call to iconv always returning failure
> with errno = E2BIG.

Ugh, you are right. Thanks for the correction.
I've been off for such a long time.


> My assumption (which may be mistaken) is that ibl does not change, and
> therefore obl * 6 is a constant.  I think what you would want to do is
> either: 
> 
>   - not loop and try the realloc only once, returning an error if the
>     iconv() call again returns E2BIG
> 
>   - Create another variable to limit the loop, something like this:
> 
>     multiplier = 6;
>     factor = 1;
>     while ( (iconv(...)) && (factor <= 3) ) {
>       obl = ibl * multiplier * factor;
>       /* do your processing */
>       ...
>       factor++;
>     }
> 
>     You still need to catch that this failed, and return an error. 

Either way is good and reasonable. I haven't rejected Paul's worry.
And of course there are more options such as
"obl *= multiplier" and "obl = STRING * factor".

>     I'm not sure that using 6 for the multiplier is the right thing
>     though... maybe 3 is better.  If most of the input converts to a
>     string of the same size, with only some requiring extra bytes, 3
>     would be enough, saving a little memory.  OTOH if it isn't, an
>     extra realloc() call will be necessary, slightly reducing
>     performance.

Oh, I thought I was a little stingy when I took 6.
Some systems define MB_LEN_MAX == 16 or more.
IMHO the performance damage with realloc is worse than a little memory waste.
(In fact, I'd rather alloc a larger space at first than MB_LEN_MAX*ibl.)


Summary (for committers):
(1) Please make sure MB_LEN_MAX >= 6, as mutt depends on UTF-8 now.
(2) Please keep in mind that MB_LEN_MAX doesn't assure iconv's success.
(2a) When iconv fails with E2BIG, you have to realloc or output an error.
(2b) If you choose to realloc, take care not to go into an infinite loop. ;)

-- 
tamo