Re: Second critical mremap() bug found in all Linux kernels
On Wed, 18 Feb 2004, Paul Starzetz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Synopsis: Linux kernel do_mremap VMA limit local privilege escalation
> vulnerability
> Product: Linux kernel
> Version: 2.2 up to 2.2.25, 2.4 up to 2.4.24, 2.6 up to 2.6.2
> Vendor: http://www.kernel.org/
> URL: http://isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt
> CVE: CAN-2004-0077
> Author: Paul Starzetz <ihaquer@xxxxxxx>
> Date: February 18, 2004
>
>
> Issue:
> ======
>
> A critical security vulnerability has been found in the Linux kernel
> memory management code inside the mremap(2) system call due to missing
> function return value check. This bug is completely unrelated to the
> mremap bug disclosed on 05-01-2004 except concerning the same internal
> kernel function code.
>
>
> Details:
> ========
>
> The Linux kernel manages a list of user addressable valid memory
> locations on a per process basis. Every process owns a single linked
> list of so called virtual memory area descriptors (called from now on
> just VMAs). Every VMA describes the start of a valid memory region, its
> length and moreover various memory flags like page protection.
>
> Every VMA in the list corresponds to a part of the process's page table.
> The page table contains descriptors (in short page table entries PTEs)
> of physical memory pages seen by the process. The VMA descriptor can be
> thus understood as a high level description of a particular region of
> the process's page table storing PTE properties like page R/W flag and
> so on.
>
> The mremap() system call provides resizing (shrinking or growing) as
> well as moving of existing virtual memory areas or any of its parts
> across process's addressable space.
>
> Moving a part of the virtual memory from inside a VMA area to a new
> location requires creation of a new VMA descriptor as well as copying
> the underlying page table entries described by the VMA from the old to
> the new location in the process's page table.
>
> To accomplish this task the do_mremap code calls the do_munmap()
> internal kernel function to remove any potentially existing old memory
> mapping in the new location as well as to remove the old virtual memory
> mapping. Unfortunately the code doesn't test the return value of the
> do_munmap() function which may fail if the maximum number of available
> VMA descriptors has been exceeded. This happens if one tries to unmap
> middle part of an existing memory mapping and the process's limit on the
> number of VMAs has been reached (which is currently 65535).
>
> One of the possible situations can be illustrated with the following
> picture. The corresponding page table entries (PTEs) have been marked
> with o and x:
>
> Before mremap():
>
> (oooooooooooooooooooooooo) (xxxxxxxxxxxx)
> [----------VMA1----------] [----VMA2----]
> [REMAPPED-VMA] <---------------|
>
>
> After mremap() without VMA limit:
>
> (oooo)(xxxxxxxxxxxx)(oooo)
> [VMA3][REMAPPED-VMA][VMA4]
>
>
> After mremap() but VMA limit:
>
> (ooooxxxxxxxxxxxxxxoooo)
> [---------VMA1---------]
> [REMAPPED-VMA]
>
>
> After the maximum number of VMAs in the process's VMA list has been
> reached do_munmap() will refuse to create the necessary VMA hole because
> it would split the original VMA in two disjoint VMA areas exceeding the
> VMA descriptor limit.
>
> Due to the missing return value check after trying to unmap the middle
> of the VMA1 (this is the first invocation of do_munmap inside do_mremap
> code) the corresponding page table entries from VMA2 are still inserted
> into the page table location described by VMA1 thus being subject to
> VMA1 page protection flags. It must be also mentioned that the original
> PTEs in the VMA1 are lost thus leaving the corresponding page frames
> unusable for ever.
>
> The kernel also tries to insert the overlapping VMA area into the VMA
> descriptor list but this fails due to further checks in the low level
> VMA manipulation code. The low level VMA list check in the 2.4 and 2.6
> kernel versions just call BUG() therefore terminating the malicious
> process.
>
> There are also two other unchecked calls to do_munmap() inside the
> do_mremap() code and we believe that the second occurrence of unchecked
> do_munmap is also exploitable. The second occurrence takes place if the
> VMA to be remapped is beeing truncated in place. Note that do_munmap can
> also fail on an exceptional low memory condition while trying to
> allocate a VMA descriptor.
>
> We were able to create a robust proof-of-concept exploit code giving
> full super-user privileges on all vulnerable kernel versions. The
> exploit code will be released next week.
>
>
> Impact:
> =======
>
> Since no special privileges are required to use the mremap(2) system
> call any process may use its unexpected behavior to disrupt the kernel
> memory management subsystem.
>
> Proper exploitation of this vulnerability leads to local privilege
> escalation giving an attacker full super-user privileges. The
> vulnerability may also lead to a denial-of-service attack on the
> available system memory.
>
> Tested and known to be vulnerable kernel versions are all <= 2.2.25, <=
> 2.4.24 and <= 2.6.1. The 2.2.25 version of Linux kernel does not
> recognize the MREMAP_FIXED flag but this does not prevent the bug from
> being successfully exploited. All users are encouraged to patch all
> vulnerable systems as soon as appropriate vendor patches are released.
> There is no hotfix for this vulnerablity. Limited per user virtual
> memory still permits do_munmap() to fail.
>
>
> Credits:
> ========
>
> Paul Starzetz <ihaquer@xxxxxxx> has identified the vulnerability and
> performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF
> INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF
> ONE OF THE AUTHORS.
>
>
> Disclaimer:
> ===========
>
> This document and all the information it contains are provided "as is",
> for educational purposes only, without warranty of any kind, whether
> express or implied.
>
> The authors reserve the right not to be responsible for the topicality,
> correctness, completeness or quality of the information provided in
> this document. Liability claims regarding damage caused by the use of
> any information provided, including any kind of information which is
> incomplete or incorrect, will therefore be rejected.
>
> - --
> Paul Starzetz
> iSEC Security Research
> http://isec.pl/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.7 (GNU/Linux)
>
> iD8DBQFAM1QzC+8U3Z5wpu4RAqXzAKCMOkFu1mXzzRgLyuFYp4ORpQCQDgCfe4M2
> 3IjbGvzniOjv/Hc7KKAzMtU=
> =GJds
> -----END PGP SIGNATURE-----
>
>
Attached patch fixes this bug for kernel 2.2.25. It should also apply cleanly
to
kernels since at least 2.2.21.
--
Sincerely Your, Dan.
--- linux/mm/mremap.c.security Sun Mar 25 20:31:03 2001
+++ linux/mm/mremap.c Thu Feb 19 05:10:34 2004
@@ -9,6 +9,7 @@
#include <linux/shm.h>
#include <linux/mman.h>
#include <linux/swap.h>
+#include <linux/file.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -25,7 +26,7 @@
if (pgd_none(*pgd))
goto end;
if (pgd_bad(*pgd)) {
- printk("move_one_page: bad source pgd (%08lx)\n",
pgd_val(*pgd));
+ printk("copy_one_page: bad source pgd (%08lx)\n",
pgd_val(*pgd));
pgd_clear(pgd);
goto end;
}
@@ -34,7 +35,7 @@
if (pmd_none(*pmd))
goto end;
if (pmd_bad(*pmd)) {
- printk("move_one_page: bad source pmd (%08lx)\n",
pmd_val(*pmd));
+ printk("copy_one_page: bad source pmd (%08lx)\n",
pmd_val(*pmd));
pmd_clear(pmd);
goto end;
}
@@ -57,34 +58,22 @@
return pte;
}
-static inline int copy_one_pte(pte_t * src, pte_t * dst)
+static int copy_one_page(struct mm_struct *mm, unsigned long old_addr,
unsigned long new_addr)
{
- int error = 0;
- pte_t pte = *src;
+ pte_t * src, * dst;
- if (!pte_none(pte)) {
- error++;
- if (dst) {
- pte_clear(src);
- set_pte(dst, pte);
- error--;
+ src = get_one_pte(mm, old_addr);
+ if (src && !pte_none(*src)) {
+ if ((dst = alloc_one_pte(mm, new_addr))) {
+ set_pte(dst, *src);
+ return 0;
}
+ return 1;
}
- return error;
-}
-
-static int move_one_page(struct mm_struct *mm, unsigned long old_addr,
unsigned long new_addr)
-{
- int error = 0;
- pte_t * src;
-
- src = get_one_pte(mm, old_addr);
- if (src)
- error = copy_one_pte(src, alloc_one_pte(mm, new_addr));
- return error;
+ return 0;
}
-static int move_page_tables(struct mm_struct * mm,
+static int copy_page_tables(struct mm_struct * mm,
unsigned long new_addr, unsigned long old_addr, unsigned long len)
{
unsigned long offset = len;
@@ -99,7 +88,7 @@
*/
while (offset) {
offset -= PAGE_SIZE;
- if (move_one_page(mm, old_addr + offset, new_addr + offset))
+ if (copy_one_page(mm, old_addr + offset, new_addr + offset))
goto oops_we_failed;
}
return 0;
@@ -113,8 +102,6 @@
*/
oops_we_failed:
flush_cache_range(mm, new_addr, new_addr + len);
- while ((offset += PAGE_SIZE) < len)
- move_one_page(mm, new_addr + offset, old_addr + offset);
zap_page_range(mm, new_addr, len);
flush_tlb_range(mm, new_addr, new_addr + len);
return -1;
@@ -129,7 +116,9 @@
if (new_vma) {
unsigned long new_addr = get_unmapped_area(addr, new_len);
- if (new_addr && !move_page_tables(current->mm, new_addr, addr,
old_len)) {
+ if (new_addr && !copy_page_tables(current->mm, new_addr, addr,
old_len)) {
+ unsigned long ret;
+
*new_vma = *vma;
new_vma->vm_start = new_addr;
new_vma->vm_end = new_addr+new_len;
@@ -138,9 +127,19 @@
new_vma->vm_file->f_count++;
if (new_vma->vm_ops && new_vma->vm_ops->open)
new_vma->vm_ops->open(new_vma);
+ if ((ret = do_munmap(addr, old_len))) {
+ if (new_vma->vm_ops && new_vma->vm_ops->close)
+ new_vma->vm_ops->close(new_vma);
+ if (new_vma->vm_file)
+ fput(new_vma->vm_file);
+ flush_cache_range(current->mm, new_addr,
new_addr + old_len);
+ zap_page_range(current->mm, new_addr, old_len);
+ flush_tlb_range(current->mm, new_addr, new_addr
+ old_len);
+ kmem_cache_free(vm_area_cachep, new_vma);
+ return ret;
+ }
insert_vm_struct(current->mm, new_vma);
merge_segments(current->mm, new_vma->vm_start,
new_vma->vm_end);
- do_munmap(addr, old_len);
current->mm->total_vm += new_len >> PAGE_SHIFT;
if (new_vma->vm_flags & VM_LOCKED) {
current->mm->locked_vm += new_len >> PAGE_SHIFT;
@@ -176,9 +175,9 @@
* Always allow a shrinking remap: that just unmaps
* the unnecessary pages..
*/
- ret = addr;
if (old_len >= new_len) {
- do_munmap(addr+new_len, old_len - new_len);
+ if (!(ret = do_munmap(addr+new_len, old_len - new_len)))
+ ret = addr;
goto out;
}