HVCI/kCET-aware bugcheck suppressor PoC
Catches Windows kernel bugchecks before they BSOD the system. Uses SEH
unwind through __finally handlers instead of patching kernel code.
Tested on Windows 11 24H2 build 26200 with HVCI + kCET enabled.
Treat this PoC as a thought experiment: "how far can we bend kernel to suppress a bugcheck without writing any code into the kernel at runtime under HVCI?" The longer-term motivation is exploring whether a true PatchGuard suppressor is feasible on HVCI+kCET systems.
This implementation is a foothold for that question, not the answer to it.
It shows that bugcheck dispatch can be intercepted with data-only hooks,
that recovery can be driven entirely through RtlUnwindEx, and that the resulting flow is shadow-stack clean.
That's the easy half of the PatchGuard problem: catching the bugcheck.
The hard half is surviving after the catch when the bugcheck originates inside ntoskrnl itself
(e.g. PG's own KeBugCheckEx 0x109 from a DPC, or the kernel's own
list-entry validation int 29h from PspProcessDelete).
When a kernel bugcheck would normally take the system down, BugcheckSuppressor:
- Hooks the dispatch table.
HalPrivateDispatchTable[HalpPrepareForBugcheck]and[HalNotifyProcessorFreeze]are data pointers, so we can hook them without triggering HVCI. - Catches the bugcheck on entry. When a bugcheck fires, the hook runs before the BSOD presentation.
- Releases the bugcheck globals so the slave CPUs that were spinning
in
KiFreezeTargetExecutionexit their wait loop. The system is no longer frozen. - Unwinds the kernel stack via
RtlUnwindExfrom the hook frame back to the bad driver's caller (typicallyIopLoadDriver), running every__finallyhandler on the way. That cleans up theIopLoadDriverside resources. - Resumes execution at the caller with
RAX = STATUS_ACCESS_VIOLATION. From the caller's perspective, the offending function returned with an error, the same way it would have if SEH had naturally caught the violation.
The bad driver fails to load, the user gets an ERROR_NOACCESS-class
status, and the system stays running.
BugcheckSuppressor/
├── BugcheckSuppressor/ # The suppressor driver
│ ├── source.cpp # Main hook + kernel state release + DPC watchdog reset
│ ├── hvci_seh_recovery.cpp # SEH-driven RtlUnwindEx recovery (bad-driver and kernel-origin)
│ ├── cet_stubs.asm # kCET-aware register restore + indirect JMP (asm fallback)
│ ├── header.h
│ └── BugcheckSuppressor.vcxproj
├── Trigger/ # Companion test driver
│ ├── Source.cpp # Demo: write to RX memory section to trigger 0x50 fault
│ └── Trigger.vcxproj
└── README.md
Requirements:
- Visual Studio 2022 with Spectre-mitigated MSVC (v143)
- Windows Driver Kit (WDK) 10.0.26100.0 or newer
- A test-signing certificate, or
bcdedit /set TESTSIGNING ON
Enable Memory Integrity and Kernel-mode Hardware-Enforced Stack Protection in Core Isolation settings.
bcdedit /set TESTSIGNING ON
shutdown /r /t 0
sc create BugcheckSuppressor type= kernel binPath= C:\path\to\BugcheckSuppressor.sys
sc start BugcheckSuppressor
sc create Trigger type= kernel binPath= C:\path\to\Trigger.sys
sc start Trigger
Or load the drivers with OSR Loader.
Trigger.sys writes 0xC3 (RET) into the prologue of nt!KeBugCheckEx.
Under HVCI, the EPT enforces W=0 on kernel code pages, so the write
faults with #PF. KiPageFault routes the unhandled exception into
KeBugCheck2. Without BugcheckSuppressor this is an instant BSOD. With
BugcheckSuppressor loaded, the bugcheck is caught, the trigger driver
fails to load (sc start returns ERROR_NOACCESS), and the system
stays up. Verify in DbgView (with Capture Kernel enabled). You should
see lines like:
[Suppressor] Recover: kOrigin=0 tRip=... tRsp=... tRax=00000000C0000005 ...
[Suppressor/SEH] target found: ip=... frame=... (bad-driver caller, depth=N)
[Suppressor/SEH] dispatching unwind to ip=... frame=...
[Suppressor] Bypass!
HalPrivateDispatchTable is a writable data page that exports function
pointers. HVCI does not protect data pages, only .text. We swap two
slots (HalpPrepareForBugcheck at offset 0x108,
HalNotifyProcessorFreeze at offset 0x1A8) atomically via
InterlockedExchangePointer.
When KeBugCheckEx is called:
KeBugCheckEx
KeBugCheck2
HalpPrepareForBugcheck <- our hook fires here
... freezes other CPUs via IPI ...
... captures CONTEXT into PRCB+0x8FC0 ...
The hook reads the saved CONTEXT from KPRCB, identifies the trap
RIP/RSP, and decides whether the fault originated in a third-party
driver (recoverable via SEH unwind) or inside ntoskrnl (best-effort
kernel-origin recovery).
Walks frames from the hook upward via RtlVirtualUnwind. Identifies
the "bad-driver frame" by image classification: the first transition
from known-image (ntoskrnl plus the suppressor) to unknown is the bad
driver's faulting RIP. Walks one more frame to find the bad driver's
caller (typically IopLoadDriver), then dispatches RtlUnwindEx to
that frame with RAX = STATUS_ACCESS_VIOLATION. Every __finally
between the hook frame and the target runs, releasing locks, dropping
references, restoring per-thread state.
This is the path the suppressor handles cleanly end-to-end.
Used when the bugcheck originates inside ntoskrnl (e.g a fault inside
PspProcessDelete). No bad driver image to anchor on, so the
cross-stack resolver picks a target frame from the saved RSP and we
RtlUnwindEx towards it, walking as deep as RtlVirtualUnwind reaches.
That maximizes the number of __finally handlers run. Limited by
ntoskrnl's lock acquisition style.
If RtlUnwindEx cannot deliver (.pdata gap, RSP mismatch, etc.) we
fall back to a kCET-aware asm stub (HvciKcetJmpRestoreFixed) that
pops the shadow stack by a precomputed count, restores registers from
CONTEXT, switches RSP, and indirect-JMPs to the target. It works,
but it does not run intermediate __finally blocks, so locks held by
walked-over frames leak.
Before unwinding, the hook:
- Releases
KiBugCheckActive/KiHardwareTrigger/KiFreezeExecutionLockso peer CPUs exit their freeze loops. - Resets per-PRCB DPC watchdog
Count = Periodso the post-recovery system doesn't trip bugcheck0x13310 to 20 seconds later from accumulated decrement. - Restores
nt!HalpTimerWatchdogfrom a known-good snapshot taken at driver init, so clock-tick servicing returns to normal. - Pre-stamps every
PRCB->IpiFrozen = 5so the freeze IPI's slave loop sees "already frozen with our marker" and exits immediately.
These cleanups are what allow the system to remain usable after the suppression. Without them, you'd see a brief survival followed by a 0x133 or another freeze a few seconds later.
-
End-to-end bugcheck suppression on HVCI+kCET (Win11 24H2 26200). Bad driver origin faults (write-to-RX, kernel exception, fastfail from third-party driver code) are caught and the system continues running. The bad driver fails to load with
STATUS_ACCESS_VIOLATION, the same way it would have if SEH had naturally caught the fault. -
All hooks are data-only. No write to ntoskrnl
.text, no patch of any.textbyte, nothing for HVCI to reject. -
Recovery uses
RtlUnwindEx, the kernel's own SEH dispatcher, which is aware of shadow stack. The 2022 ROP-via-suspended-thread approach is killed by kCET and this approach is not.
- Build-specific KPRCB offsets. The suppressor uses hardcoded
offsets for KPRCB context save area, debugger-saved IRQL, IpiFrozen, DPC watchdog, etc.
These are stable within a Windows servicing branch
but drift roughly 0x10 to 0x40 between LCUs. Verify against
ntoskrnl.pdbwhen porting.
- Kernel-origin bugchecks at arbitrary points are "best effort", not
guaranteed. Many ntoskrnl functions acquire locks via explicit
KeAcquireSpinLock/KeReleaseSpinLockpairs without surrounding__try/__finally. WhenRtlUnwindExwalks past those frames, the__finallyhandlers that would release them simply don't exist. The locks leak. Whatever next thread tries to acquire them deadlocks. This affects contrived test cases like injecting a null deref intoPspProcessDeletevia WinDbg'sr rdi=0. Real-world kernel-origin scenarios at well-defined points (PG bugchecks viaKeBugCheckEx 0x109, kASAN-style invariant checks) tend to be more recoverable because their callers are designed with "recovery friendly state". - Bugchecks during very early or very late system state (boot before HAL is ready, late shutdown after the driver is unloaded) are not in scope.
- Bugchecks during extremely deep stack contexts may exceed
RtlVirtualUnwind's ability to walk. The asm fallback handles these but leaks intermediate__finallys. - The system will freeze if BugcheckSuppressor cannot recover the exception.
- Unloading the driver is not fully clean. Unhooking restores the dispatch pointers atomically, but if a bugcheck is in flight at the moment of unload, behaviour is undefined. Don't unload while running stress tests.
- One suppressor instance per machine. Multiple instances would race on the dispatch-table swap.
/CETCOMPATbuild is required. Without it, the asm fallback'sincsspqinstructions trap.
The problem space draws on prior public work:
- Connor McGarr. The original kernel-stack ROP suppressor
(
No Code Execution? No Problem!, 2022) and the kCET teardown that explained why ROP-based recovery stopped working. AlsoSkBridge(2025) for fuzzing the NT/SK secure-call surface, which informs theMmDbgCopyMemoryanalysis here. - can1357. The original ByePG technique that showed bugcheck dispatch could be intercepted at the HAL layer. The IpiFrozen pre-stamp pattern is borrowed from that work.
- Yarden Shafir.
Secure Pool Internals(2020) for theExSecurePoolUpdateanalysis that informs how VTL1 makes write decisions about VTL0-visible memory. - Saar Amar & Daniel King.
Breaking VSM by Attacking SecureKernel(Black Hat USA 2020) for the canonical SK attack-surface methodology. - zer0condition.
BusterCallfor the PFN-swap technique that this PoC does not use, but is the right reference point for "execute kernel code without writing.text".