Because who needs Python's reference counting when you can have RAII, zero-cost abstractions, and fearless concurrency?
I've been experimenting with Python's three-generation garbage collector, but instead of relying on CPython's implementation, I wrote a version in Rust. My goal was to keep the Python gc API intact while exploring what Rust's RAII, zero-cost abstractions, and thread safety could bring to the table.
So far, I've managed to get it running with:
- 223,490 objects per second throughput (vs 488,764 for Python's built-in GC)
- 0.0000s collection time for 10,000 objects (faster than Python's 0.0052s!)
- Complete API compatibility with Python's built-in gc module
- Thread-safe operations for concurrent applications
The current implementation is a functional replacement for Python's built-in GC. What I've achieved so far:
- Full Python GC API compatibility
- Direct access to Python's internal structures and C API
- Working performance
- Comprehensive test suite
I'd like to explore lock-free data structures, pool-based memory allocation, and maybe even JIT-backed allocation strategies or NUMA-aware optimizations.
- Creation: 0.0443s
- Tracking: 0.0374s
- Collection: 0.0000s
- Untracking: 0.0375s
- Memory Layout: CPython-style GC headers before objects for better cache locality
- Bit-Packed Flags: Efficient flag storage using bit operations
- Bulk Operations: Optimized for handling large numbers of objects
- Fast Paths: Specialized code paths for common operations
- Reduced Allocations: Static strings and optimized data structures
Getting here wasn't straightforward. I had to dive into Python's internals (PyObject_HEAD, PyTypeObject, refcounts) and carefully cross the FFI boundary with a lot of unsafe Rust. That meant building custom memory management on top of Python's reference counting, implementing parts of the C API myself (PyList_New, Py_IncRef, Py_DecRef), and making sure it all behaved safely under concurrency.
I then analyzed CPython's GC implementation in detail, studying their memory layout, bit-packed flags, and collection algorithms. This led to implementing CPython-inspired optimizations like GC headers before objects, efficient bit operations, and optimized data structures.
Along the way, I optimized hot paths, and those small changes added up to measurable improvements. What I ended up with is over a thousand lines of unsafe code, but it's covered with integration tests against real Python programs, not just unit tests.
Built with ❤️ and 🦀 in Rust