Skip to content

Conversation

@jhult
Copy link

@jhult jhult commented Dec 15, 2025

Fix Memory Leaks in System Extension

Fixes #616, where 1+ memory leak(s) in the system extension could cause memory to grow from ~100 MB to 10+ GB, causing network failures.

Note: This was largerly generated by Claude. I ran many passes to try to get the best possible outcome. However, since I can't sign the extension and can't test, I can't vouch for whether this actually works (or is even the proper approach).


Problem

The system extension accumulates memory over time (versions 2.4.3 - 3.1.5), eventually consuming 10+ GB RAM and breaking network connectivity.

Root causes

  1. Retain cycles in XPC reply blocks
  2. Flows never released from pendingAlerts/relatedFlows dictionaries
  3. Flows not resumed before removal (bug in processRelatedFlow)
  4. Orphaned flows not cleaned up when processes exit
  5. Flows not cleaned up on extension shutdown

Solution

  1. Break Retain Cycles
    A. Use weak-strong dance pattern in alert reply blocks
    B. Prevents self from being retained by async XPC callbacks

  2. Track All Flows
    A. Add pendingAlerts dictionary to track flows awaiting user response
    B. Ensures flows aren't lost when user doesn't respond

  3. Fix processRelatedFlow Bug
    A. Resume flows BEFORE removing from dictionary
    B. Handle pause verdict correctly (keep in dictionary for next round)

  4. Cleanup Orphaned Flows
    A. Add isFlowOrphaned helper and cleanupOrphanedFlows method
    B. Resume orphaned flows with dropVerdict (critical for system to release)
    C. Triggers: every 5 minutes + on XPC failure

  5. Dealloc Cleanup
    A. Resume all flows during dealloc using resumeFlows helper
    B. Prevents flows from remaining paused on extension shutdown

  6. Race Condition Prevention
    A. Check if flow still in pendingAlerts before processing user response
    B. Prevents double-resume if cleanup already handled flow
    C. Ensures user's choice isn't applied to stale flows

All flow resume operations use shared resumeFlows helper to eliminate code duplication and ensure consistent error handling.


Testing

Unit tests

  • Passive mode tests: 9/9
  • Memory leak tests: 10/10

Run tests: cd LuLu/Tests && bash run_tests.sh

Integration testing

Requires system extension runtime (which I can't sign and thus cannot run/test).

# Cleanup activity
log stream --level debug --predicate 'subsystem == "com.objective-see.lulu" && composedMessage CONTAINS "cleanup"'

# Memory usage
watch -n 60 'ps aux | grep lulu.extension | awk "{print \$6/1024 \" MB\"}"'
Expected behavior
  • Cleanup runs every 5 minutes
  • Memory stays < 200 MB (no continuous growth)
  • Network remains functional

Edge Cases Handled

  1. Process exits before alert response (timer cleanup)
  2. XPC connection fails (immediate cleanup)
  3. User never responds (timer cleanup)
  4. High volume short-lived processes (periodic cleanup)
  5. Extension shutdown with pending flows (dealloc cleanup)
  6. Process exits during user response window (race prevention)
  7. Pause verdict during processing (flow kept in dictionary)
  8. Concurrent dictionary access (all @synchronized)

by breaking retain cycles, tracking flows, and cleaning up orphaned flows
@jhult
Copy link
Author

jhult commented Dec 18, 2025

@objective-see, when you get a chance, I'd appreciate your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak?

1 participant