<<< Date Index >>>     <<< Thread Index >>>

Re: base64



On September 24, 2003 at 19:24, David Wilson wrote:

> RFC 2045 states (section 6.8):
> 
>    "The encoded output stream must be represented in lines of no more
>    than 76 characters each.  All line breaks or other characters not
>    found in Table 1 must be ignored by decoding software.  In base64
>    data, characters other than those in Table 1, line breaks, and other
>    white space probably indicate a transmission error, about which a
>    warning message or even a message rejection might be appropriate
>    under some circumstances."
> 
> So, characters outside the 65 character set are "ignored", but "warning"
> or "rejection" might be appropriate. Nicely ambiguous.

Nowadays it is unclear what the proper action is, but you must remember
that the RFC is dated 1996 (and the first MIME RFC is dated 1992), a
time when email security was not as important as it is today and there
was probably a deference to not put too many "must"s on implementors or
to let user's have the opportunity to "recover" badly transmitted data.

You'd need to contact the RFC authors to determine what they were
thinking when the warning or rejection text as only advisory vs
mandatory.

> This does not
> apply strictly to '=', but might taken to apply to '=' within a base64
> encoding.
> 
>    "Because it is used only for padding at the end of the data, the
>    occurrence of any "=" characters may be taken as evidence that the
>    end of the data has been reached (without truncation in transit).  No
>    such assurance is possible, however, when the number of octets
>    transmitted was a multiple of three and no "=" characters are
>    present."
> 
> This is also not clear. Does it mean that '=' can be taken to delimit
> the data? Or does it mean no more than finding a '=' that means no data
> has been lost?

The '=' is not always present, but when it is, it occurs at the end
of data.  The statement you quote about a '=' not being present cannot,
by itself, be used to assume that the data has been truncated since
the existence of '=' is optional.  I.e. Other mechanisms outside
of base64 encoding itself must be used to determine if truncation has
occured.

> So, there is ambiguity in RFC 2045, and this is the point of the
> original post. Different people, and therefore different implementations
> will have different interpretations. There is therefore potential for a
> vulnerability when checks are performed using one interpretation but the
> actual receiver uses another interpretation.

This will only happen if the server check is forgiving about "errors"
in the base64 data vs only allowing proper "well-formed" base64 data
through.

> It would be nice to be able to enforce rules in email servers - there
> are many ways in which messages do not conform to the standards - even
> when they are unambiguous. But there are too many common email user
> agents which generate non-conforming messages.

I think you are over generalizing to other parts of email vs the
issue of base64 data.  Can you name one MUA that generates improper
base64 entities?  When just addressing the base64 encoding/decoding
issue, a reasonable set of rules can be established for dealing with
it since the permutations of variations is small.

Each other mail formatting problem will have to be dealt with
separately.  I do advocate that the MIME RFC maintainers be contacted
to make things more clear (hence the term "Request For Comments")
or at least get their thoughts on the matter, but as we all know,
even when a standard is clear about something, there will be software
that is non-conforming and that security folks will have to deal with.

You may also get the case where it is reasonable in some situations
where reasonable people will disagree on what do about something.
For example, should bad characters be in base64 data cause a rejection,
or should some form of error recovery be allowed?  Who is to say one
is always right and the other is always wrong?  It may be best to
state what best practices are for these cases vs mandating a specific
action.

> Or should we reject all these broken messages? ;-)

It seems the following can be applied by servers and other software
trying to detect for malicious data:

* If there are any non-base64 characters and security is to be as
  strict as possible, reject the message, or at a minimum the entity
  that is base64 encoded.

* If there is any non-whitespace data after an '=', or '==' sequence,
  reject the entity if strict security is required since it is just
  a variation of item 1.

Basically, assume that MUAs are going to be as lenient as possible.
Viral detection software should alway imply the strictest decoding
rules, and for properly encoded data, this is a safe bet.  I personally
have never seen a case of base64 encoded entities that included
non-base64 characters or '='s in the middle of encoded data.  Therefore,
if such a case does happen, it is either a transmission problem or
something malicious being attempted.

--ewh
-- 
Earl Hood, <earl@xxxxxxxxxxxx>
Web: <http://www.earlhood.com/>
PGP Public Key: <http://www.earlhood.com/gpgpubkey.txt>