Skip to content

XaFF-XaFF/BugcheckSuppressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BugcheckSuppressor

HVCI/kCET-aware bugcheck suppressor PoC

Catches Windows kernel bugchecks before they BSOD the system. Uses SEH unwind through __finally handlers instead of patching kernel code. Tested on Windows 11 24H2 build 26200 with HVCI + kCET enabled.


Notice

Treat this PoC as a thought experiment: "how far can we bend kernel to suppress a bugcheck without writing any code into the kernel at runtime under HVCI?" The longer-term motivation is exploring whether a true PatchGuard suppressor is feasible on HVCI+kCET systems.

This implementation is a foothold for that question, not the answer to it. It shows that bugcheck dispatch can be intercepted with data-only hooks, that recovery can be driven entirely through RtlUnwindEx, and that the resulting flow is shadow-stack clean. That's the easy half of the PatchGuard problem: catching the bugcheck. The hard half is surviving after the catch when the bugcheck originates inside ntoskrnl itself (e.g. PG's own KeBugCheckEx 0x109 from a DPC, or the kernel's own list-entry validation int 29h from PspProcessDelete).


What it does

When a kernel bugcheck would normally take the system down, BugcheckSuppressor:

  1. Hooks the dispatch table. HalPrivateDispatchTable[HalpPrepareForBugcheck] and [HalNotifyProcessorFreeze] are data pointers, so we can hook them without triggering HVCI.
  2. Catches the bugcheck on entry. When a bugcheck fires, the hook runs before the BSOD presentation.
  3. Releases the bugcheck globals so the slave CPUs that were spinning in KiFreezeTargetExecution exit their wait loop. The system is no longer frozen.
  4. Unwinds the kernel stack via RtlUnwindEx from the hook frame back to the bad driver's caller (typically IopLoadDriver), running every __finally handler on the way. That cleans up the IopLoadDriver side resources.
  5. Resumes execution at the caller with RAX = STATUS_ACCESS_VIOLATION. From the caller's perspective, the offending function returned with an error, the same way it would have if SEH had naturally caught the violation.

The bad driver fails to load, the user gets an ERROR_NOACCESS-class status, and the system stays running.


Repository layout

BugcheckSuppressor/
├── BugcheckSuppressor/           # The suppressor driver
│   ├── source.cpp                # Main hook + kernel state release + DPC watchdog reset
│   ├── hvci_seh_recovery.cpp     # SEH-driven RtlUnwindEx recovery (bad-driver and kernel-origin)
│   ├── cet_stubs.asm             # kCET-aware register restore + indirect JMP (asm fallback)
│   ├── header.h
│   └── BugcheckSuppressor.vcxproj
├── Trigger/                      # Companion test driver
│   ├── Source.cpp                # Demo: write to RX memory section to trigger 0x50 fault
│   └── Trigger.vcxproj
└── README.md

Build

Requirements:

  • Visual Studio 2022 with Spectre-mitigated MSVC (v143)
  • Windows Driver Kit (WDK) 10.0.26100.0 or newer
  • A test-signing certificate, or bcdedit /set TESTSIGNING ON

Prepare the target

Enable Memory Integrity and Kernel-mode Hardware-Enforced Stack Protection in Core Isolation settings.

bcdedit /set TESTSIGNING ON
shutdown /r /t 0

Demo: suppress a bad-driver bugcheck

sc create BugcheckSuppressor type= kernel binPath= C:\path\to\BugcheckSuppressor.sys
sc start BugcheckSuppressor
sc create Trigger     type= kernel binPath= C:\path\to\Trigger.sys
sc start Trigger

Or load the drivers with OSR Loader.

Trigger.sys writes 0xC3 (RET) into the prologue of nt!KeBugCheckEx. Under HVCI, the EPT enforces W=0 on kernel code pages, so the write faults with #PF. KiPageFault routes the unhandled exception into KeBugCheck2. Without BugcheckSuppressor this is an instant BSOD. With BugcheckSuppressor loaded, the bugcheck is caught, the trigger driver fails to load (sc start returns ERROR_NOACCESS), and the system stays up. Verify in DbgView (with Capture Kernel enabled). You should see lines like:

[Suppressor] Recover: kOrigin=0 tRip=... tRsp=... tRax=00000000C0000005 ...
[Suppressor/SEH] target found: ip=... frame=... (bad-driver caller, depth=N)
[Suppressor/SEH] dispatching unwind to ip=... frame=...
[Suppressor] Bypass!

How it works (technical detail)

The hook is data-only

HalPrivateDispatchTable is a writable data page that exports function pointers. HVCI does not protect data pages, only .text. We swap two slots (HalpPrepareForBugcheck at offset 0x108, HalNotifyProcessorFreeze at offset 0x1A8) atomically via InterlockedExchangePointer.

Bugcheck path entry

When KeBugCheckEx is called:

KeBugCheckEx
  KeBugCheck2
    HalpPrepareForBugcheck     <- our hook fires here
      ... freezes other CPUs via IPI ...
      ... captures CONTEXT into PRCB+0x8FC0 ...

The hook reads the saved CONTEXT from KPRCB, identifies the trap RIP/RSP, and decides whether the fault originated in a third-party driver (recoverable via SEH unwind) or inside ntoskrnl (best-effort kernel-origin recovery).

Bad-driver-origin recovery (HvciSehUnwindRecovery)

Walks frames from the hook upward via RtlVirtualUnwind. Identifies the "bad-driver frame" by image classification: the first transition from known-image (ntoskrnl plus the suppressor) to unknown is the bad driver's faulting RIP. Walks one more frame to find the bad driver's caller (typically IopLoadDriver), then dispatches RtlUnwindEx to that frame with RAX = STATUS_ACCESS_VIOLATION. Every __finally between the hook frame and the target runs, releasing locks, dropping references, restoring per-thread state.

This is the path the suppressor handles cleanly end-to-end.

Kernel-origin recovery (HvciKernelOriginUnwindRecovery)

Used when the bugcheck originates inside ntoskrnl (e.g a fault inside PspProcessDelete). No bad driver image to anchor on, so the cross-stack resolver picks a target frame from the saved RSP and we RtlUnwindEx towards it, walking as deep as RtlVirtualUnwind reaches. That maximizes the number of __finally handlers run. Limited by ntoskrnl's lock acquisition style.

Asm fallback (cet_stubs.asm)

If RtlUnwindEx cannot deliver (.pdata gap, RSP mismatch, etc.) we fall back to a kCET-aware asm stub (HvciKcetJmpRestoreFixed) that pops the shadow stack by a precomputed count, restores registers from CONTEXT, switches RSP, and indirect-JMPs to the target. It works, but it does not run intermediate __finally blocks, so locks held by walked-over frames leak.

State releases on the way out

Before unwinding, the hook:

  • Releases KiBugCheckActive / KiHardwareTrigger / KiFreezeExecutionLock so peer CPUs exit their freeze loops.
  • Resets per-PRCB DPC watchdog Count = Period so the post-recovery system doesn't trip bugcheck 0x133 10 to 20 seconds later from accumulated decrement.
  • Restores nt!HalpTimerWatchdog from a known-good snapshot taken at driver init, so clock-tick servicing returns to normal.
  • Pre-stamps every PRCB->IpiFrozen = 5 so the freeze IPI's slave loop sees "already frozen with our marker" and exits immediately.

These cleanups are what allow the system to remain usable after the suppression. Without them, you'd see a brief survival followed by a 0x133 or another freeze a few seconds later.


Achievements

  1. End-to-end bugcheck suppression on HVCI+kCET (Win11 24H2 26200). Bad driver origin faults (write-to-RX, kernel exception, fastfail from third-party driver code) are caught and the system continues running. The bad driver fails to load with STATUS_ACCESS_VIOLATION, the same way it would have if SEH had naturally caught the fault.

  2. All hooks are data-only. No write to ntoskrnl .text, no patch of any .text byte, nothing for HVCI to reject.

  3. Recovery uses RtlUnwindEx, the kernel's own SEH dispatcher, which is aware of shadow stack. The 2022 ROP-via-suspended-thread approach is killed by kCET and this approach is not.


Limitations

Production-relevant limits

  • Build-specific KPRCB offsets. The suppressor uses hardcoded offsets for KPRCB context save area, debugger-saved IRQL, IpiFrozen, DPC watchdog, etc. These are stable within a Windows servicing branch but drift roughly 0x10 to 0x40 between LCUs. Verify against ntoskrnl.pdb when porting.

Recovery boundary

  • Kernel-origin bugchecks at arbitrary points are "best effort", not guaranteed. Many ntoskrnl functions acquire locks via explicit KeAcquireSpinLock / KeReleaseSpinLock pairs without surrounding __try/__finally. When RtlUnwindEx walks past those frames, the __finally handlers that would release them simply don't exist. The locks leak. Whatever next thread tries to acquire them deadlocks. This affects contrived test cases like injecting a null deref into PspProcessDelete via WinDbg's r rdi=0. Real-world kernel-origin scenarios at well-defined points (PG bugchecks via KeBugCheckEx 0x109, kASAN-style invariant checks) tend to be more recoverable because their callers are designed with "recovery friendly state".
  • Bugchecks during very early or very late system state (boot before HAL is ready, late shutdown after the driver is unloaded) are not in scope.
  • Bugchecks during extremely deep stack contexts may exceed RtlVirtualUnwind's ability to walk. The asm fallback handles these but leaks intermediate __finallys.
  • The system will freeze if BugcheckSuppressor cannot recover the exception.

Operational

  • Unloading the driver is not fully clean. Unhooking restores the dispatch pointers atomically, but if a bugcheck is in flight at the moment of unload, behaviour is undefined. Don't unload while running stress tests.
  • One suppressor instance per machine. Multiple instances would race on the dispatch-table swap.
  • /CETCOMPAT build is required. Without it, the asm fallback's incsspq instructions trap.

Acknowledgements & references

The problem space draws on prior public work:

  • Connor McGarr. The original kernel-stack ROP suppressor (No Code Execution? No Problem!, 2022) and the kCET teardown that explained why ROP-based recovery stopped working. Also SkBridge (2025) for fuzzing the NT/SK secure-call surface, which informs the MmDbgCopyMemory analysis here.
  • can1357. The original ByePG technique that showed bugcheck dispatch could be intercepted at the HAL layer. The IpiFrozen pre-stamp pattern is borrowed from that work.
  • Yarden Shafir. Secure Pool Internals (2020) for the ExSecurePoolUpdate analysis that informs how VTL1 makes write decisions about VTL0-visible memory.
  • Saar Amar & Daniel King. Breaking VSM by Attacking SecureKernel (Black Hat USA 2020) for the canonical SK attack-surface methodology.
  • zer0condition. BusterCall for the PFN-swap technique that this PoC does not use, but is the right reference point for "execute kernel code without writing .text".

About

HVCI/kCET-aware bugcheck suppressor PoC

Topics

Resources

License

Stars

Watchers

Forks

Contributors