This document details the second phase of memory optimizations implemented in bsdiff, building upon the Streaming Compression introduced in Phase 1.
Even with streaming compression, the bsdiff and bspatch algorithms still required loading the entire old and new files into the process's heap (using malloc + read).
- Memory Model (Phase 1):
old + new + SA + CompressedBuffers≈ 6N. - Issue: For very large files, this
2Noverhead (the raw data itself) creates significant heap pressure and can lead to OOM (Out Of Memory) or excessive system swapping.
Phase 2 replaces explicit heap allocation for file data with Memory-Mapped IO. This leverages the OS's virtual memory subsystem to handle file access.
- mmap Stream: A new stream type (
bsdiff_open_mmap_stream) was implemented usingmmapon Unix andCreateFileMappingon Windows. - Zero-Copy Interface: The
bsdiff_streaminterface was extended with aget_buffermethod. If a stream supports it, the algorithm accesses the file data directly from the mapped address. - Elimination of malloc: The core
bsdiffandbspatchfunctions now skip themalloc(oldsize)andmalloc(newsize)steps ifmmapis available.
Important
Virtual Memory vs. Physical Memory
It is critical to note that mmap does not reduce the Virtual Address Space usage. The process still maps the full old and new file sizes into its address space. The optimization lies in the Physical Memory (RSS) management: Unlike malloc, mmap buffers do not require physical RAM until they are actually accessed (Demand Paging), and the OS maintains the right to reclaim these pages under memory pressure since they are backed by the filesystem.
| Metric | Phase 1 (Streaming) | Phase 2 (mmap) | SAVED |
|---|---|---|---|
| Peak Heap Memory | 426.6 MB | 285.7 MB | 140.9 MB |
| Memory Model | 6N | 4N | 2N |
| Execution Time | 6.918s | 6.252s | ~9.6% Speedup |
-
Heap Reduction: The saving of 140.9 MB corresponds exactly to
oldsize + newsize. This memory is now managed by the OS Page Cache rather than the process heap. - Performance Gain: Total wall-clock time improved by nearly 10%. This is due to the elimination of full-file reads at the start and the efficiency of the OS demand-paging systems.
-
Theoretical Limit: For the standard
bsdiffalgorithm, 4N is the theoretical floor for heap memory as long as the Suffix Array (SA) is stored in memory ($4 \times \text{oldsize}$ ).
include/bsdiff.h: Addedbsdiff_open_mmap_stream.source/stream_mmap.c: Cross-platform implementation of mapped streams.source/bsdiff.c/source/bspatch.c: Refactored to useget_bufferfor zero-copy access.
[ Process Heap (RSS - Hard Limit) ]
- Suffix Array (SA) : 4N (Locked in RAM)
- Compressed Streams : ~0.1N - 0.2N
- Misc. Buffers : Constant
[ OS Page Cache / Virtual Memory (RSS - Managed/Soft Limit) ]
- old file (mapped) : 1N (Mapped VM, RSS scales with algorithm access)
- new file (mapped) : 1N (Mapped VM, RSS stays low due to sequential access)
During execution, the OS maintains mmap buffers as "Clean Pages".
- Demand Paging: Pages are loaded into RAM only when the algorithm reads a specific offset.
- Sequential Access: For the
newfile, OS read-ahead and eviction are highly efficient. - Random Access: For the
oldfile, SA lookups cause random jumps. The OS tries to keep these pages in memory, but if RAM is scarce, it can swap them out without writing to Disk Swap (since they can be re-read from the source file). This prevents the "OOM Death" typical of largemallocbuffers.
Phase 2 successfully reduced the heap memory requirement from 6N to 4N. Combined with the Phase 1 streaming optimization, we have reduced the memory footprint by approximately 50% from the original baseline, while simultaneously improving execution speed.