Proposal: a better way to track memory for chunks. #14358
Description
Feature Request
Is your feature request related to a problem? Please describe:
It's hard to estimate memory usage of a chunk.Chunk
during sql execution.
We often use these sort of code, and it's make us think every time we call a method of Chunk
.
mSize := output.chk.MemoryUsage()
chk.SwapColumns(output.chk)
e.memTracker.Consume(output.chk.MemoryUsage() - mSize)
Describe the feature you'd like:
I propose a more friendly and object-oriented way to handle memory trakcer for chunk.Chunk
,
when a chunk created, we should
chk := chunk.NewChunk(memTrackerFromExecutor)
And a method of Chunk
is called, it's Chunk
's responsibility to track the memory difference. For example,
// AppendRow appends a row to the chunk.
func (c *Chunk) AppendRow(row Row) {
c.AppendPartialRow(0, row)
c.numVirtualRows++
c.memTracker.Consume(row.Size()) // we adjust the memory usage inside this method.
}
But how we release the memory usage when the Chunk
is not used anymore.
- We can add a
Close
method and call it dumbly. - We can consider trying
runtime.SetFinalizer
when the memory is actually garbage collected (like destructors in C++). But we need ensure that it does not regress performance.
Describe alternatives you've considered:
We have many components that need to track memory during a SQL execution. And this method can be generalized to other components.
components | For |
---|---|
chunk.Chunk |
Storing a chunk in memory, a basic unit in executor framework. |
chunk.Column |
Storing a column in memory, a basic unit in chunk.Chunk . |
chunk.List |
Storing multiple chunks in memory. |
chunk.ListInDisk |
Storing multiple chunks in temporary directory. |
chunk.RowContainer |
Storing multiple chunks, and handle their spilling. |
chunk.hashRowContainer |
Storing chunk.RowContainer and a hash map |
Teachability, Documentation, Adoption, Migration Strategy:
Activity