Chunk: memory safety #492

samber · 2024-07-15T17:15:53Z

Currently, the slices returned by lo.Chunk share the same memory space that the collection sent to the helper.

It seems not very safe to me. I open this issue to discuss about an improvement of lo.Chunk. Using copy() would be slower, and consume more memory, but much safer.

Discussion started here: #491

BenchmarkChunk
BenchmarkChunk/strings_10
BenchmarkChunk/strings_10-12         	14886344	        80.03 ns/op
BenchmarkChunk/strings_100
BenchmarkChunk/strings_100-12        	 1814036	       660.8 ns/op
BenchmarkChunk/strings_1000
BenchmarkChunk/strings_1000-12       	  183382	      6143 ns/op
BenchmarkChunk/ints10
BenchmarkChunk/ints10-12             	20346310	        57.41 ns/op
BenchmarkChunk/ints100
BenchmarkChunk/ints100-12            	 2672658	       433.6 ns/op
BenchmarkChunk/ints1000
BenchmarkChunk/ints1000-12           	  312517	      3871 ns/op
Benchmark without using copy:

BenchmarkChunk
BenchmarkChunk/strings_10
BenchmarkChunk/strings_10-12         	52708929	        22.42 ns/op
BenchmarkChunk/strings_100
BenchmarkChunk/strings_100-12        	12681414	        93.45 ns/op
BenchmarkChunk/strings_1000
BenchmarkChunk/strings_1000-12       	 2022960	       599.6 ns/op
BenchmarkChunk/ints10
BenchmarkChunk/ints10-12             	52086441	        22.50 ns/op
BenchmarkChunk/ints100
BenchmarkChunk/ints100-12            	12925526	        90.06 ns/op
BenchmarkChunk/ints1000
BenchmarkChunk/ints1000-12           	 1978053	       591.1 ns/op

The text was updated successfully, but these errors were encountered:

dsolerh · 2024-07-20T22:01:55Z

I think in this case the use of copy can be delegated to the user of the library, since the most straightforward way to make the implementation safe is to copy the collection values like:

func Chunk[T any, Slice ~[]T](collection Slice, size int) []Slice {
	if size <= 0 {
		panic("Second parameter must be greater than 0")
	}

	chunksNum := len(collection) / size
	if len(collection)%size != 0 {
		chunksNum += 1
	}

	collectionCopy := make(Slice, len(collection))
	copy(collectionCopy, collection)
	result := make([]Slice, 0, chunksNum)

	for i := 0; i < chunksNum; i++ {
		last := (i + 1) * size
		if last > len(collectionCopy) {
			last = len(collectionCopy)
		}
		result = append(result, collectionCopy[i*size:last:last])
	}

	return result
}

the one thing I think should be improved is the documentation for the function to reflect this behaviour. Also cause the only scenario for the shared memory to become a problem is if the user makes a change in either the original array or the chunks like original[i]=<some value> or chunks[i][j]=<some value>.

dagenius007 · 2024-07-22T22:40:50Z

This is a case of efficiency versus safety, and we need to choose the better option. I would prefer to prioritize safety to avoid potential issues with modifying the collections slice.

dsolerh · 2024-07-23T06:01:44Z

My point is precisely that there's no 'vs', both options are possible, the documentation just need to reflect the potential issues, cause there may be many cases in which there's no need for the safety. You can have a Chunk and a ChunkSafe (maybe not with these names) and the user can choose which one is better suited for his need.

dagenius007 · 2024-07-25T10:01:52Z

I totally agree creating a ChunkSafe function and the user chooses based on what he want to use them for.

dagenius007 · 2024-07-26T07:32:21Z

@samber what do you think about this? I can create a PR with regards to these suggestions.

shoham-b · 2024-10-08T14:47:11Z

Could you please share the code for the original benchmark?
I think maybe it copies each slice individually.
Copying as @dsolerh suggested is way faster as it saves the copy calls and enables using SIMD and unroll.

In general, copy shouldn't take that long, it has a tone of optimizations.
On my computer I get this results -

cpu: Apple M1 Pro
Benchmark_Chunk_int
Benchmark_Chunk_int/Chunk_100
Benchmark_Chunk_int/Chunk_100-8                    	17508577	        58.67 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_100
Benchmark_Chunk_int/ChunkSafeTheRightWay_100-8         	13027635	        96.56 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_100
Benchmark_Chunk_int/ChunkSafeTheSlowWay_100-8          	 5968666	       192.8 ns/op
Benchmark_Chunk_int/Chunk_1000
Benchmark_Chunk_int/Chunk_1000-8                       	 3095260	       376.4 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_1000
Benchmark_Chunk_int/ChunkSafeTheRightWay_1000-8        	 2336620	       464.6 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_1000
Benchmark_Chunk_int/ChunkSafeTheSlowWay_1000-8         	  709839	      1658 ns/op
Benchmark_Chunk_int/Chunk_10000
Benchmark_Chunk_int/Chunk_10000-8                      	  395826	      2939 ns/op
Benchmark_Chunk_int/ChunkSafeTheRightWay_10000
Benchmark_Chunk_int/ChunkSafeTheRightWay_10000-8       	  309154	      3974 ns/op
Benchmark_Chunk_int/ChunkSafeTheSlowWay_10000
Benchmark_Chunk_int/ChunkSafeTheSlowWay_10000-8        	   76291	     15568 ns/op
PASS

This can vary dramatically between architectures, and sizes of slice, but in general a copy is something that happens, e.g. when appending to an array such that it requires more space.

This can have even more awkward leak when appending to the chunked slice before setting at index

	s := make([]int, 1000)
	chunks := lo.Chunk(s, 10)
	firstChunk := chunks[0]
	firstChunk = append(firstChunk, 666) // might or might not realloc the chunk, decided internally.
	firstChunk[0] = 666                  // might or might not also modify "s"

I think the default behavior should be safe, and if someone wants a cutting edge optimizations then they should implement it themselves and take full responsibility.

So given all of that, I vote for putting the safe as the default Chunk function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk: memory safety #492

Chunk: memory safety #492

samber commented Jul 15, 2024

dsolerh commented Jul 20, 2024

dagenius007 commented Jul 22, 2024

dsolerh commented Jul 23, 2024

dagenius007 commented Jul 25, 2024

dagenius007 commented Jul 26, 2024

shoham-b commented Oct 8, 2024

Chunk: memory safety #492

Chunk: memory safety #492

Comments

samber commented Jul 15, 2024

dsolerh commented Jul 20, 2024

dagenius007 commented Jul 22, 2024

dsolerh commented Jul 23, 2024

dagenius007 commented Jul 25, 2024

dagenius007 commented Jul 26, 2024

shoham-b commented Oct 8, 2024