-
-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk: memory safety #492
Comments
I think in this case the use of copy can be delegated to the user of the library, since the most straightforward way to make the implementation safe is to copy the collection values like: func Chunk[T any, Slice ~[]T](collection Slice, size int) []Slice {
if size <= 0 {
panic("Second parameter must be greater than 0")
}
chunksNum := len(collection) / size
if len(collection)%size != 0 {
chunksNum += 1
}
collectionCopy := make(Slice, len(collection))
copy(collectionCopy, collection)
result := make([]Slice, 0, chunksNum)
for i := 0; i < chunksNum; i++ {
last := (i + 1) * size
if last > len(collectionCopy) {
last = len(collectionCopy)
}
result = append(result, collectionCopy[i*size:last:last])
}
return result
} the one thing I think should be improved is the documentation for the function to reflect this behaviour. Also cause the only scenario for the shared memory to become a problem is if the user makes a change in either the original array or the chunks like |
This is a case of efficiency versus safety, and we need to choose the better option. I would prefer to prioritize safety to avoid potential issues with modifying the collections slice. |
My point is precisely that there's no 'vs', both options are possible, the documentation just need to reflect the potential issues, cause there may be many cases in which there's no need for the safety. You can have a |
I totally agree creating a ChunkSafe function and the user chooses based on what he want to use them for. |
@samber what do you think about this? I can create a PR with regards to these suggestions. |
Could you please share the code for the original benchmark? In general, copy shouldn't take that long, it has a tone of optimizations. cpu: Apple M1 Pro
Benchmark_Chunk_int
Benchmark_Chunk_int/Chunk_100
Benchmark_Chunk_int/Chunk_100-8 17508577 58.67 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_100
Benchmark_Chunk_int/ChunkSafeTheRightWay_100-8 13027635 96.56 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_100
Benchmark_Chunk_int/ChunkSafeTheSlowWay_100-8 5968666 192.8 ns/op
Benchmark_Chunk_int/Chunk_1000
Benchmark_Chunk_int/Chunk_1000-8 3095260 376.4 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_1000
Benchmark_Chunk_int/ChunkSafeTheRightWay_1000-8 2336620 464.6 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_1000
Benchmark_Chunk_int/ChunkSafeTheSlowWay_1000-8 709839 1658 ns/op
Benchmark_Chunk_int/Chunk_10000
Benchmark_Chunk_int/Chunk_10000-8 395826 2939 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_10000
Benchmark_Chunk_int/ChunkSafeTheRightWay_10000-8 309154 3974 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_10000
Benchmark_Chunk_int/ChunkSafeTheSlowWay_10000-8 76291 15568 ns/op
PASS This can vary dramatically between architectures, and sizes of slice, but in general a copy is something that happens, e.g. when appending to an array such that it requires more space. This can have even more awkward leak when appending to the chunked slice before setting at index s := make([]int, 1000)
chunks := lo.Chunk(s, 10)
firstChunk := chunks[0]
firstChunk = append(firstChunk, 666) // might or might not realloc the chunk, decided internally.
firstChunk[0] = 666 // might or might not also modify "s" I think the default behavior should be safe, and if someone wants a cutting edge optimizations then they should implement it themselves and take full responsibility. So given all of that, I vote for putting the safe as the default |
Currently, the slices returned by
lo.Chunk
share the same memory space that the collection sent to the helper.It seems not very safe to me. I open this issue to discuss about an improvement of lo.Chunk. Using
copy()
would be slower, and consume more memory, but much safer.Discussion started here: #491
The text was updated successfully, but these errors were encountered: