<<< Date Index >>>     <<< Thread Index >>>

Re: MB_LEN_MAX has to be more than 5 if mutt uses UTF-8



On Thu, Jul 17, 2008 at 12:48:14PM +0900, TAKAHASHI Tamotsu wrote:
> * Wed Jul 16 2008 Paul Walker <paul@xxxxxxxxxxxxxxx>
> > On Wed, Jul 16, 2008 at 02:34:06PM +0900, TAKAHASHI Tamotsu wrote:
> > 
> > > +    while (mutt_iconv (cd, &ib, &ibl, &ob, &obl, inrepls, outrepl) == 
> > > (size_t)-1)
> > [...]
> > > +      safe_realloc (&buf, len + obl + 1);
> > > +    }
> > 
> > Is it possible to add some kind of limit condition on this? Call me
> > paranoid, but I get nervous when I see reallocations in a loop with no clear
> > limits on the amount of memory it will try to take.
> > 
> > (Consider badly formed messages (deliberate or otherwise) and buggy iconv
> > implementations.)
> 
> Good point.
> Yes, MB_LEN_MAX*ibl works in most cases,
> so it is "some kind of limit condition,"
> though there is no limit _in theory_,
> even with valid strings and settings.

But I think Paul is right...  I believe the patch you've posted is a
bug.  Bear in mind, I only read the patch and didn't look at the code,
so I may be missing pieces, in which case the rest of what I'm saying
will probably be worthless.  ;-)  Assuming I'm not wrong:

Let's pretend for a moment that we have a case where every character
in the input text needs to get converted to a string which has a
lenght of  ibl * 8 bytes.  If I read it correctly, your patch will go
into an infinite loop, with the call to iconv always returning failure
with errno = E2BIG.

My assumption (which may be mistaken) is that ibl does not change, and
therefore obl * 6 is a constant.  I think what you would want to do is
either: 

  - not loop and try the realloc only once, returning an error if the
    iconv() call again returns E2BIG

  - Create another variable to limit the loop, something like this:

    multiplier = 6;
    factor = 1;
    while ( (iconv(...)) && (factor <= 3) ) {
        obl = ibl * multiplier * factor;
        /* do your processing */
        ...
        factor++;
    }

    You still need to catch that this failed, and return an error. 
    I'm not sure that using 6 for the multiplier is the right thing
    though... maybe 3 is better.  If most of the input converts to a
    string of the same size, with only some requiring extra bytes, 3
    would be enough, saving a little memory.  OTOH if it isn't, an
    extra realloc() call will be necessary, slightly reducing
    performance.


> 
> The realloc loop is needed if either
> (1) your MB_LEN_MAX is much less than enough or
> (2) your locale or usage requires heavy N:M conversion.
> 
> Maybe i18n paranoids want to realloc, and
> normal programmers don't.
> I don't know which is right. ;)
> 
> It's okay to avoid the realloc loop (at least for me).
> But I'd like the developers to know the limitation mutt has
> that mutt truncates strings in some cases.
> (So be careful when you use mutt_convert_string.
> Even if a message is PGP-signed, it can be truncated.)
> 
> Well, adding dprint to mutt_iconv should be good then.
> (e.g. if(iconv(...)==-1)dprint(3,(debugfile,"iconv failed: 
> %s\n",strerror(errno)));)
> 
> 
> -- 
> tamo
> "And ulimit is always your friend."

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: pgpU2Hpt4F_nG.pgp
Description: PGP signature