Alloc profile performance improvements #6

This _drastically_ speeds up the tests, for reasons I don't exactly understand.. I wonder if it was messing up some heuristics and deciding to interpret the code instead of compiling it, and had some weird corneer cases in the interpreted code or something? I dunno! But anyway, this drastically speeds it up, so 🤷 sounds like not our problem 😊

Instead of allocating a maximum-sized buffer for each backtrace, we keep a single max-sized buffer as a scratch space, write the backtrace to it, and then once we know the size, we allocate a right-sized buffer for the backtrace and copy it over. Benchmark results (measured time for profiling allocations on internal Arroyo benchmark, with `skip_every=0`): This only slightly improves the time to record an alloctions profile: - Before: 275.082525 seconds - After: 245.891006 seconds But it drastically improves the memory usage once the profiling is completed, according to System Activity Monitor: - Before: 17.35 GB - After: 6.92 GB - (Compared to 350 MB for the same task without profiling) We could probably slightly improve the time overhead still furthur by using a single big vector instead of a bunch of individual allocated buffers, but this is probably about the best we could do in terms of space usage. This would allow us to eliminate the redundant copying, and would also amortize away the allocations of the buffers, both of which should reduce the performance impact. But I'm guessing the time is mostly dominated by just how long the stack traces are, and there's no getting around that. At best, we could expect maybe like a 2x-3x improvement from those changes, I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alloc profile performance improvements #6

Alloc profile performance improvements #6

Commits on Dec 21, 2021