Skip to content

Conversation

@gefjon
Copy link
Contributor

@gefjon gefjon commented Apr 10, 2025

Description of Changes

Heap profiling performed by @jsdt revealed that we had a memory leak in InstanceEnv::datastore_index_scan_range_bsatn_chunks. A bit of spelunking pointed to ChunkPool, a pool of buffers, which prior to this commit was allowed to grow without limit. ChunkPool is per-database, and so it is easy to imagine a scenario in which a large number of databases each occasionally allocate, use and never free very large chunks, resulting in the overall system's memory usage increasing over time.

This commit adds some very simple limits on the heap usage of a ChunkPool. Specifically, we introduce two constants,
MAX_CHUNKS_IN_POOL and MAX_CHUNK_SIZE_IN_BYTES, and when returning a chunk to the pool, we free it if it violates these limits. The values of these constants, 32 and 4xROW_ITER_CHUNK_SIZE respectively, were chosen entirely arbitrarily. Using a global pool would be helpful here.

This also makes a change to use the ChunkPool when creating a new ChunkedWriter. Before this, we were allocating a fresh chunk every time, which would eventually go into the chunkpool. That is probably why the chunk pool was growing so quickly.

In the future we should at least add some metrics around the size of this pool.

API and ABI breaking changes

None.

Expected complexity level and risk

2 ish? Seems possible that my chosen values for the two constants are somehow pathologically bad, and this PR regresses performance.

Testing

  • Run a BitCraft bot test and observe that performance is not significantly reduced.
  • Run heap profiling and observe that we no longer leak memory.

Heap profiling performed by @jsdt revealed that we had a memory leak in
`InstanceEnv::datastore_index_scan_range_bsatn_chunks`.
A bit of spelunking pointed to `ChunkPool`, a pool of buffers,
which prior to this commit was allowed to grow without limit.
`ChunkPool` is per-database, and so it is easy to imagine a scenario in which
a large number of databases each occasionally
allocate, use and never free very large chunks,
resulting in the overall system's memory usage increasing over time.

This commit adds some very simple limits on the heap usage of a `ChunkPool`.
Specifically, we introduce two constants,
`MAX_CHUNKS_IN_POOL` and `MAX_CHUNK_SIZE_IN_BYTES`,
and when returning a chunk to the pool, we free it if it violates these limits.
The values of these constants, 32 and 1024 respectively, were chosen entirely arbitrarily.
We should experiment at least to demonstrate that they are not pathologically bad,
though I doubt we actually care to optimize them beyond that.
@gefjon gefjon requested review from jdetter and jsdt April 10, 2025 15:28
@gefjon gefjon self-assigned this Apr 10, 2025
@gefjon gefjon added the release-any To be landed in any release window label Apr 10, 2025
With the goal of determining whether these values for the limits make sense.
Our memory leak was caused by this impl.
It's no longer used, and should not be used again in the future.
@gefjon gefjon added this pull request to the merge queue Apr 11, 2025
Merged via the queue into master with commit 789b74a Apr 11, 2025
16 checks passed
@Centril Centril deleted the phoebe/chunk-pool-limit-size branch April 11, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-any To be landed in any release window

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants