Add SetNbWorkers api to the writer code (see #108) #117

bsergean · 2022-05-05T21:30:15Z

No description provided.

bsergean · 2022-05-05T21:36:17Z

zstd might support parallel decompression, if this is the case maybe we could put the same api on the reader.

Viq111 · 2022-05-06T20:31:26Z

Thanks for your contribution!
Would you mind adding a test that check that the error returned is nil and maybe via ZSTD_CCtx_getParameter that the parameter is correctly set ?

bsergean · 2022-05-06T23:19:34Z

Thanks for the tip, I’ll add a test, and check how ZSTD_CCtx_getParameter works.

…

On May 6, 2022, at 1:31 PM, Vianney Tran ***@***.***> wrote: Thanks for your contribution! Would you mind adding a test that check that the error returned is nil and maybe via ZSTD_CCtx_getParameter that the parameter is correctly set ? — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UMCU5KBMUTSEW452RTVIV6STANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

…ssionDecompression internal function, defaulting the value to 1

bsergean · 2022-05-07T04:23:38Z

zstd$ go test -run TestStreamSetNbWorkers
--- FAIL: TestStreamSetNbWorkers (0.00s)
    zstd_stream_test.go:407: Expected SetNbWorkers to succeed, got Unsupported parameter instead

Interestingly my unittest started failing as is, as the parameter was seen as unsupported. This is because the library was not built in threading mode, which I changed. I don't know if that's acceptable, but it's a requirement for this PR.

bsergean · 2022-05-07T04:27:08Z

I'm getting that error in the unittest in the efence build -> https://app.circleci.com/pipelines/github/DataDog/zstd/179/workflows/5f082f6d-b02f-4434-981c-9bacb0297e85/jobs/916

    zstd_stream_test.go:19: Failed writing to compress object: Unsupported parameter

If I try to run efence locally, I'm getting a panic ...

GODEBUG=efence=1 go test -run TestStreamSetNbWorkers

Viq111

Sorry for the delayed answer.

For the dynamic library not built with -DZSTD_MULTITHREAD=1, I think it's fair to assume that we may encounter it from time to time in the wild a library.
I think it's fine but we should make the error more user friendly, what about introducing an error: ErrNoParallelSupport and return it when the caller uses SetNbWorkers but the underlying library doesn't support it ?
In tests you can then do a:

if err := writer.SetNbWorkers(nbWorkers); err == ErrNoParallelSupport {
  t.Skip()
}

What do you think ?

Viq111 · 2022-05-11T19:53:49Z

zstd_stream_test.go

 	writer := NewWriterLevelDict(&w, DefaultCompression, dict)
-	_, err := writer.Write(payload)
+
+	err := writer.SetNbWorkers(nbWorkers)


I would probably only call this is nbWorkers > 1 so you can test the (most common) path where the user doesn't call this method

I think this is addressed.

Viq111 · 2022-05-11T19:54:21Z

zstd_stream_test.go

-	_, err := writer.Write(payload)
+
+	err := writer.SetNbWorkers(nbWorkers)
+	failOnError(t, "Failed writing to compress object", err)


as mentioned, you might want to skip the test (for nbWorkers > 1) if the error is that the library doesn't support multi-threading

I think this is addressed.

zstd_stream_test.go

bsergean · 2022-05-11T21:21:50Z

No worries, thanks for the feedback. Before starting to implement the changes I want to clarify one thing: I didn’t know that a shared library mode was supported, I thought that all the C files in zstd were built with the cgo module. Doesn’t the commit below imply that we build all the C code ? ``` commit af19ae6 Author: Josh Carter ***@***.***> Date: Wed Sep 7 12:24:53 2016 -0600 Adding C sources for zStd 1.0 in favor of shared lib. ```

…

On May 11, 2022, at 12:58 PM, Vianney Tran ***@***.***> wrote: @Viq111 commented on this pull request. Sorry for the delayed answer. For the dynamic library not built with -DZSTD_MULTITHREAD=1, I think it's fair to assume that we may encounter it from time to time in the wild a library. I think it's fine but we should make the error more user friendly, what about introducing an error: ErrNoParallelSupport and return it when the caller uses SetNbWorkers but the underlying library doesn't support it ? In tests you can then do a: if err := writer.SetNbWorkers(nbWorkers); err == ErrNoParallelSupport { t.Skip() } What do you think ? In zstd_stream_test.go <#117 (comment)>: > var w bytes.Buffer writer := NewWriterLevelDict(&w, DefaultCompression, dict) - _, err := writer.Write(payload) + + err := writer.SetNbWorkers(nbWorkers) I would probably only call this is nbWorkers > 1 so you can test the (most common) path where the user doesn't call this method In zstd_stream_test.go <#117 (comment)>: > var w bytes.Buffer writer := NewWriterLevelDict(&w, DefaultCompression, dict) - _, err := writer.Write(payload) + + err := writer.SetNbWorkers(nbWorkers) + failOnError(t, "Failed writing to compress object", err) as mentioned, you might want to skip the test (for nbWorkers > 1) if the error is that the library doesn't support multi-threading In zstd_stream_test.go <#117 (comment)>: > @@ -398,6 +403,27 @@ func TestStreamWriteNoGoPointers(t *testing.T) { }) } +func TestStreamSetNbWorkers(t *testing.T) { + // Build a big string first + sb := strings.Builder{} you are not testing the strings.Builder{} for a one-liner, I would just s := strings.Repeat("foobaa", 1000*1000) — Reply to this email directly, view it on GitHub <#117 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UNUBN7IIMCXVNYKCIDVJQGPPANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

Viq111 · 2022-05-11T21:26:59Z

This PR added support to either compile with the dynamic library or static library (added in the README here).
The default (no compile flags) is to build statically

bsergean · 2022-05-11T21:29:29Z

Got it, thanks for digging that PR. I’ll try to implement the change you suggested, and might poke the PR if I fail to do so.

…

On May 11, 2022, at 2:27 PM, Vianney Tran ***@***.***> wrote: This PR <#109> added support to either compile with the dynamic library or static library (added in the README here <https://github.com/DataDog/zstd#building-against-an-external-libzstd>). The default (no compile flags) is to build statically — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UJVAMPKZA7CKVDDS43VJQQ25ANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

bsergean · 2022-05-20T23:33:03Z

I introduced the new error type you suggested (made it global by giving it a starting capital letter).

All the tests are passing now, I wonder if there was a fluke previously, or if one env has an external library that was built without parallel support, which would be great.

Lastly, I have not tested that code very much, to measure the performance impact (I know we use multiple threads, and have to use C to do that).

Viq111

Thanks!
I will do some benchmarking on different hardware and let you know but PR looks good!

bsergean · 2022-05-26T14:43:02Z

Ah excellent. I’ve been too lazy to do the benchmarking myself, I’ll try to do it but I can’t promise anything.

…

On May 26, 2022, at 6:05 AM, Vianney Tran ***@***.***> wrote: @Viq111 approved this pull request. Thanks! I will do some benchmarking on different hardware and let you know but PR looks good! — Reply to this email directly, view it on GitHub <#117 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UMOW6HBHZJ2F7E7VXDVL5ZJDANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

Viq111 · 2022-06-06T17:03:26Z

edit: this comment is incorrect, see latest

I have been benchmarking with the mozilla (unbzip2 then tarred) payload (the biggest one).

On x86, the performance is always better, albeit much more variable.
Without parallelism:

goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkStreamCompression-12    	      85	 392153074 ns/op	 135.01 MB/s
BenchmarkStreamCompression-12    	      82	 389099060 ns/op	 136.07 MB/s
BenchmarkStreamCompression-12    	      86	 396593782 ns/op	 133.50 MB/s
BenchmarkStreamCompression-12    	      88	 387180328 ns/op	 136.74 MB/s
BenchmarkStreamCompression-12    	      88	 388614650 ns/op	 136.24 MB/s

With parallelism = 8:

goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkStreamCompression-12    	     100	 323739454 ns/op	 163.54 MB/s
BenchmarkStreamCompression-12    	     132	 260639012 ns/op	 203.13 MB/s
BenchmarkStreamCompression-12    	     130	 266204210 ns/op	 198.89 MB/s
BenchmarkStreamCompression-12    	     139	 263071863 ns/op	 201.25 MB/s
BenchmarkStreamCompression-12    	     159	 442939701 ns/op	 119.53 MB/s

Summary:

name                  old time/op   new time/op   delta
StreamCompression-12    391ms ± 2%    311ms ±42%   ~     (p=0.151 n=5+5)

name                  old speed     new speed     delta
StreamCompression-12  136MB/s ± 1%  177MB/s ±33%   ~     (p=0.151 n=5+5)

Howerver on ARM (testing on a c7g.4xlarge), with parellization=8 on 16 cores, I unfortunately get OOMs so looking into it:

goos: linux
goarch: arm64
pkg: github.com/DataDog/zstd
BenchmarkStreamCompression-16    	Killed

bsergean · 2022-06-06T17:11:16Z

In general I expect multi-threaded work to use more memory, maybe your arm test machine has less RAM than your Intel test machine ? I wonder if the zstd cli has a nb worker option, this could be used also to debug this.

…

On Jun 6, 2022, at 10:03 AM, Vianney Tran ***@***.***> wrote: I have been benchmarking with the mozilla <https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2> (unbzip2 then tarred) payload (the biggest one). On x86, the performance is always better, albeit much more variable. Without parallelism: goos: darwin goarch: amd64 pkg: github.com/DataDog/zstd cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz BenchmarkStreamCompression-12 85 392153074 ns/op 135.01 MB/s BenchmarkStreamCompression-12 82 389099060 ns/op 136.07 MB/s BenchmarkStreamCompression-12 86 396593782 ns/op 133.50 MB/s BenchmarkStreamCompression-12 88 387180328 ns/op 136.74 MB/s BenchmarkStreamCompression-12 88 388614650 ns/op 136.24 MB/s With parallelism = 8: goos: darwin goarch: amd64 pkg: github.com/DataDog/zstd cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz BenchmarkStreamCompression-12 100 323739454 ns/op 163.54 MB/s BenchmarkStreamCompression-12 132 260639012 ns/op 203.13 MB/s BenchmarkStreamCompression-12 130 266204210 ns/op 198.89 MB/s BenchmarkStreamCompression-12 139 263071863 ns/op 201.25 MB/s BenchmarkStreamCompression-12 159 442939701 ns/op 119.53 MB/s Summary: name old time/op new time/op delta StreamCompression-12 391ms ± 2% 311ms ±42% ~ (p=0.151 n=5+5) name old speed new speed delta StreamCompression-12 136MB/s ± 1% 177MB/s ±33% ~ (p=0.151 n=5+5) Howerver on ARM (testing on a c7g.4xlarge), with parellization=8 on 16 cores, I unfortunately get OOMs so looking into it: goos: linux goarch: arm64 pkg: github.com/DataDog/zstd BenchmarkStreamCompression-16 Killed — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UI2NOQYS7LDVS2RLSLVNYVOZANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

Viq111 · 2022-06-06T17:20:40Z

Yeah same expectation here but the mozilla payload is ~50Mb and with 8 concurrent threads, I would not expect it to starve the 32GiB of memory a c7g.4xlargehave

bsergean · 2022-06-06T17:28:24Z

BTW I have an apple M1 so I should be able to reproduce the problem possibly. I think my Mac actually has 32G of ram.

…

On Jun 6, 2022, at 10:20 AM, Vianney Tran ***@***.***> wrote: Yeah same expectation here but the mozilla payload is ~50Mb and with 8 concurrent threads, I would not expect it to starve the 32GiB of memory a c7g.4xlargehave — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UIOLKA5ALPHJ52J7UDVNYXPTANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

Viq111 · 2022-06-06T17:36:34Z

👍 sounds good, here are my current testing steps:

Add w.SetNbWorkers(8) to the benchmark

then

wget https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2
tar -xf mozilla.bz2 mozilla
tar cf mozilla.tar mozilla
export PAYLOAD=`pwd`/mozilla.tar
go test -c
./zstd.test -test.run None -test.bench BenchmarkStreamCompression -test.benchtime 30s -test.count 5

On a c7.4xlarge, you can see it uses all (32Gib) of the memory:

$ grep ^VmPeak /proc/$(pidof zstd.test)/status
VmPeak:	33758272 kB

bsergean · 2022-06-06T18:00:57Z

I think I can reproduce the high memory usage pb: zstd.test memory usage went as high as 28Gig, I interrupted the test because I didn’t want my main laptop to die ...

…

On Jun 6, 2022, at 10:36 AM, Vianney Tran ***@***.***> wrote: 👍 sounds good, here are my current testing steps: Add w.SetNbWorkers(8) to the benchmark <https://github.com/DataDog/zstd/blob/1.x/zstd_stream_test.go#L406> then wget https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2 tar -xf mozilla.bz2 mozilla tar cf mozilla.tar mozilla export PAYLOAD=`pwd`/mozilla.tar go test -c ./zstd.test -test.run None -test.bench BenchmarkStreamCompression -test.benchtime 30s -test.count 5 On a c7.4xlarge, you can see it uses all (32Gib) of the memory: $ grep ^VmPeak /proc/$(pidof zstd.test)/status VmPeak: 33758272 kB — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UJDGBV2OOXO4S5LAETVNYZLPANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

bsergean · 2022-06-06T18:06:40Z

With only one worker, the peak memory usage is 148Mib, so something odd is going on.

>> 30000 / 148.

202.7027027027027 200 is the max number of workers that is set by zstd. Maybe it isn’t a coincidence ?

…

On Jun 6, 2022, at 11:00 AM, Benjamin Sergeant ***@***.***> wrote: I think I can reproduce the high memory usage pb: zstd.test memory usage went as high as 28Gig, I interrupted the test because I didn’t want my main laptop to die ... > On Jun 6, 2022, at 10:36 AM, Vianney Tran ***@***.*** ***@***.***>> wrote: > > > 👍 sounds good, here are my current testing steps: > > Add w.SetNbWorkers(8) to the benchmark <https://github.com/DataDog/zstd/blob/1.x/zstd_stream_test.go#L406> > then > > wget https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2 <https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2> > tar -xf mozilla.bz2 mozilla > tar cf mozilla.tar mozilla > export PAYLOAD=`pwd`/mozilla.tar > go test -c > ./zstd.test -test.run None -test.bench BenchmarkStreamCompression -test.benchtime 30s -test.count 5 > On a c7.4xlarge, you can see it uses all (32Gib) of the memory: > > $ grep ^VmPeak /proc/$(pidof zstd.test)/status > VmPeak: 33758272 kB > — > Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UJDGBV2OOXO4S5LAETVNYZLPANCNFSM5VGKHUPA>. > You are receiving this because you authored the thread. >

Viq111 · 2022-06-06T18:18:04Z

Sorry for my previous message, on a c6i.4xlarge (x86) I have the same consumption:

grep ^VmPeak /proc/$(pidof zstd.test)/status
VmPeak:	33356928 kB

so going to look on the side of the C code indeed

Viq111 · 2022-06-06T19:04:03Z

Ok I think I found the "issue".
See 9dd8a8a
A gotcha was when setting nbWorkers >1, the compress C calls becomes asynchronous instead of synchronous. That means with Go benchmark trying to do benchmarks with an increasing N number, we would buffer a lot of data on the C side.

By doing 9dd8a8a and forcing a flush every 1GiB, we force the C zstd buffer to not have more than 1GiB at a time.

This works:

grep ^VmPeak /proc/12092/status                                                                                                                                                                                           Mon Jun  6 18:54:05 2022

VmPeak:  8307060 kB

Memory usage is 8GiB which is exactly 1GiB for 8 workers.

So there is nothing wrong with the code, just the benchmark did not take the async into account.

The good news is the performance with 8 workers is much better than 1 worker:

name                  old time/op   new time/op   delta
StreamCompression-16    414ms ± 1%     98ms ± 2%   -76.37%  (p=0.008 n=5+5)

name                  old speed     new speed     delta
StreamCompression-16  124MB/s ± 1%  524MB/s ± 2%  +323.25%  (p=0.008 n=5+5)

Maybe just add a comment to SetNbWorkers to warn about this behavior ?

bsergean · 2022-06-06T19:11:35Z

Up to you. Maybe we can add a note to the documentation about this odd arm64 behavior, and a link to a ticket ?

…

On Jun 6, 2022, at 10:36 AM, Vianney Tran ***@***.***> wrote: 👍 sounds good, here are my current testing steps: Add w.SetNbWorkers(8) to the benchmark <https://github.com/DataDog/zstd/blob/1.x/zstd_stream_test.go#L406> then wget https://sun.aei.polsl.pl//~sdeor/corpus/mozilla.bz2 tar -xf mozilla.bz2 mozilla tar cf mozilla.tar mozilla export PAYLOAD=`pwd`/mozilla.tar go test -c ./zstd.test -test.run None -test.bench BenchmarkStreamCompression -test.benchtime 30s -test.count 5 On a c7.4xlarge, you can see it uses all (32Gib) of the memory: $ grep ^VmPeak /proc/$(pidof zstd.test)/status VmPeak: 33758272 kB — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UJDGBV2OOXO4S5LAETVNYZLPANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

bsergean · 2022-06-06T19:11:37Z

Also, maybe that file is special, but the speedup of 33% on your benchmark is a bit disappointing. (Maybe because 8 cores isn’t big enough ?) For compressing a 4.6 gig file of mine, I almost get a 6x speedup. 4.63 GiB => 716 MiB zstd 8.58s user 1.11s system 113% cpu 8.528 total zstdmt (multi core) 9.85s user 1.29s system 612% cpu 1.819 total 2. And when using the Mozilla.tar file, it look like a 2x speedup. zstd$ time zstd mozilla.tar mozilla.tar : 36.56% ( 48.8 MiB => 17.9 MiB, mozilla.tar.zst) zstd mozilla.tar 0.15s user 0.05s system 119% cpu 0.166 total zstd$ rm mozilla.tar.zst zstd$ time zstdmt mozilla.tar mozilla.tar : 36.56% ( 48.8 MiB => 17.9 MiB, mozilla.tar.zst) zstdmt mozilla.tar 0.21s user 0.05s system 326% cpu 0.080 total

…

On Jun 6, 2022, at 11:18 AM, Vianney Tran ***@***.***> wrote: Sorry for my previous message, on a c6i.4xlarge (x86) I have the same consumption: grep ^VmPeak /proc/$(pidof zstd.test)/status VmPeak: 33356928 kB so going to look on the side of the C code indeed — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UKRORTX3TIGF2OHVNLVNY6GPANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

Viq111 · 2022-06-06T19:21:14Z

Sorry, the one above is actually a benchmark on x86 (editing my message to reflect) it:

x86 (c6i.4xlarge):

name                  old time/op   new time/op   delta
StreamCompression-16    414ms ± 1%     98ms ± 2%   -76.37%  (p=0.008 n=5+5)

name                  old speed     new speed     delta
StreamCompression-16  124MB/s ± 1%  524MB/s ± 2%  +323.25%  (p=0.008 n=5+5)

so 3.2x increase (for 8 workers)

ARM (c7g.4xlarge):

name                  old time/op   new time/op   delta
StreamCompression-16    513ms ± 1%     74ms ± 1%   -85.61%  (p=0.008 n=5+5)

name                  old speed     new speed     delta
StreamCompression-16  100MB/s ± 1%  694MB/s ± 1%  +594.94%  (p=0.008 n=5+5)

So ~6x increase (for 8 workers)

So this is not a behavior only on ARM but that calls to write.Write(...) actually becomes async.
So maybe add a comment of the sort:

// Set the number of workers to run the compression in parallel using multiple threads
// If > 1, the Write() call will become asynchronous. This means data will be buffered until processed.
// If you call Write() too fast, you might incur a memory buffer up to as large as your input.
// Consider calling Flush() periodically if you need to compress a very large file that would not fit all in memory.

bsergean · 2022-06-06T19:23:58Z

I just updated the documentation based on your recommendation.

…

On Jun 6, 2022, at 12:21 PM, Vianney Tran ***@***.***> wrote: Sorry, the one above is actually a benchmark on x86 (editing my message to reflect) it: x86 (c6i.4xlarge): name old time/op new time/op delta StreamCompression-16 414ms ± 1% 98ms ± 2% -76.37% (p=0.008 n=5+5) name old speed new speed delta StreamCompression-16 124MB/s ± 1% 524MB/s ± 2% +323.25% (p=0.008 n=5+5) so 3.2x increase (for 8 workers) ARM (c7g.4xlarge): name old time/op new time/op delta StreamCompression-16 513ms ± 1% 74ms ± 1% -85.61% (p=0.008 n=5+5) name old speed new speed delta StreamCompression-16 100MB/s ± 1% 694MB/s ± 1% +594.94% (p=0.008 n=5+5) So ~6x increase (for 8 workers) So this is not a behavior only on ARM but that calls to write.Write(...) actually becomes async. So maybe add a comment of the sort: // Set the number of workers to run the compression in parallel using multiple threads // If > 1, the Write() call will become asynchronous. This means data will be buffered until processed. // If you call Write() too fast, you might incur a memory buffer up to as large as your input. // Consider calling Flush() periodically if you need to compress a very large file that would not fit all in memory. — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UIXJMXFVV652KJISULVNZFTNANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

bsergean · 2022-06-06T19:24:36Z

Also it’s interesting that the speedup is bigger on arm cpus ...

…

On Jun 6, 2022, at 12:23 PM, Benjamin Sergeant ***@***.***> wrote: I just updated the documentation based on your recommendation. > On Jun 6, 2022, at 12:21 PM, Vianney Tran ***@***.*** ***@***.***>> wrote: > > > Sorry, the one above is actually a benchmark on x86 (editing my message to reflect) it: > > x86 (c6i.4xlarge): > > name old time/op new time/op delta > StreamCompression-16 414ms ± 1% 98ms ± 2% -76.37% (p=0.008 n=5+5) > > name old speed new speed delta > StreamCompression-16 124MB/s ± 1% 524MB/s ± 2% +323.25% (p=0.008 n=5+5) > so 3.2x increase (for 8 workers) > > ARM (c7g.4xlarge): > > name old time/op new time/op delta > StreamCompression-16 513ms ± 1% 74ms ± 1% -85.61% (p=0.008 n=5+5) > > name old speed new speed delta > StreamCompression-16 100MB/s ± 1% 694MB/s ± 1% +594.94% (p=0.008 n=5+5) > So ~6x increase (for 8 workers) > > So this is not a behavior only on ARM but that calls to write.Write(...) actually becomes async. > So maybe add a comment of the sort: > > // Set the number of workers to run the compression in parallel using multiple threads > // If > 1, the Write() call will become asynchronous. This means data will be buffered until processed. > // If you call Write() too fast, you might incur a memory buffer up to as large as your input. > // Consider calling Flush() periodically if you need to compress a very large file that would not fit all in memory. > — > Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UIXJMXFVV652KJISULVNZFTNANCNFSM5VGKHUPA>. > You are receiving this because you authored the thread. >

bsergean · 2022-06-06T19:27:32Z

I actually use io.Copy in my own code. writer := zstd.NewWriter(f) w := io.MultiWriter(writer, h) if err != nil { return -1, "", err } if _, err = io.Copy(w, reader); err != nil { return -1, "", err } writer.Flush() checksum = fmt.Sprintf("%x", h.Sum64()) writer.Close() (I’m computing a checksum at the same time) I wonder how we should update this code to periodically call Flush, if io.Copy will deal with that under the hood, or if I’ll run in the same memory problems as the benchmark.

…

On Jun 6, 2022, at 12:24 PM, Benjamin Sergeant ***@***.***> wrote: Also it’s interesting that the speedup is bigger on arm cpus ... > On Jun 6, 2022, at 12:23 PM, Benjamin Sergeant ***@***.*** ***@***.***>> wrote: > > I just updated the documentation based on your recommendation. > >> On Jun 6, 2022, at 12:21 PM, Vianney Tran ***@***.*** ***@***.***>> wrote: >> >> >> Sorry, the one above is actually a benchmark on x86 (editing my message to reflect) it: >> >> x86 (c6i.4xlarge): >> >> name old time/op new time/op delta >> StreamCompression-16 414ms ± 1% 98ms ± 2% -76.37% (p=0.008 n=5+5) >> >> name old speed new speed delta >> StreamCompression-16 124MB/s ± 1% 524MB/s ± 2% +323.25% (p=0.008 n=5+5) >> so 3.2x increase (for 8 workers) >> >> ARM (c7g.4xlarge): >> >> name old time/op new time/op delta >> StreamCompression-16 513ms ± 1% 74ms ± 1% -85.61% (p=0.008 n=5+5) >> >> name old speed new speed delta >> StreamCompression-16 100MB/s ± 1% 694MB/s ± 1% +594.94% (p=0.008 n=5+5) >> So ~6x increase (for 8 workers) >> >> So this is not a behavior only on ARM but that calls to write.Write(...) actually becomes async. >> So maybe add a comment of the sort: >> >> // Set the number of workers to run the compression in parallel using multiple threads >> // If > 1, the Write() call will become asynchronous. This means data will be buffered until processed. >> // If you call Write() too fast, you might incur a memory buffer up to as large as your input. >> // Consider calling Flush() periodically if you need to compress a very large file that would not fit all in memory. >> — >> Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UIXJMXFVV652KJISULVNZFTNANCNFSM5VGKHUPA>. >> You are receiving this because you authored the thread. >> >

Viq111 · 2022-06-06T19:31:08Z

I would leave the API as is because it gives the user a choice to call Flush whenever. As a headsup, calling Flush() makes the payload a bit bigger as you are forcing zstd to flush as much data as possible.
For your code, you need to ensure either f is smaller enough to fit in memory (the pathological case of the benchmark is it was generating massive files) OR you can create an intermediate io.Write (similar to LimitedReader) that periodically calls flush but I think that should live outside of the zstd package

Viq111

Thanks for bearing with me as we benchmarked!
The PR as it stands looks good to me and if you are fine with the API, we can merge in its current state

bsergean · 2022-06-06T20:10:03Z

Yes the API is fine with me, I think you did most of the hard work here, thanks for reviewing / working on the PR !

…

On Jun 6, 2022, at 12:36 PM, Vianney Tran ***@***.***> wrote: @Viq111 approved this pull request. Thanks for bearing with me as we benchmarked! The PR as it stands looks good to me and if you are fine with the API, we can merge in its current state — Reply to this email directly, view it on GitHub <#117 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2O6UMWVKGQCSPY6EBPXPTVNZHKBANCNFSM5VGKHUPA>. You are receiving this because you authored the thread.

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/DataDog/zstd](https://togithub.com/DataDog/zstd) | indirect | patch | `v1.5.0` -> `v1.5.2` | --- ### Release Notes <details> <summary>DataDog/zstd</summary> ### [`v1.5.2`](https://togithub.com/DataDog/zstd/releases/tag/v1.5.2): zstd 1.5.2 [Compare Source](https://togithub.com/DataDog/zstd/compare/v1.5.2...v1.5.2) This release updates the upstream zstd version to [1.5.2](https://togithub.com/facebook/zstd/releases/tag/v1.5.2) ([https://github.com/DataDog/zstd/pull/116](https://togithub.com/DataDog/zstd/pull/116)) The update `1.5.0` -> `1.5.2` overall has a similar performance profile. Please note that depending on the workload, performance could vary by -10% / +10% ### [`v1.5.2+patch1`](https://togithub.com/DataDog/zstd/releases/tag/v1.5.2%2Bpatch1): zstd 1.5.2 - wrapper patches 1 [Compare Source](https://togithub.com/DataDog/zstd/compare/v1.5.0...v1.5.2) #### What's Changed - Fix unneededly allocated large decompression buffer by [@XiaochenCui](https://togithub.com/XiaochenCui) ([#118](https://togithub.com/DataDog/zstd/issues/118)) & [@Viq111](https://togithub.com/Viq111) in [https://github.com/DataDog/zstd/pull/120](https://togithub.com/DataDog/zstd/pull/120) - Add SetNbWorkers api to the writer code (see [#108](https://togithub.com/DataDog/zstd/issues/108)) by [@bsergean](https://togithub.com/bsergean) in [https://github.com/DataDog/zstd/pull/117](https://togithub.com/DataDog/zstd/pull/117) - For large workloads, the performance can be improved by 3-6x (see [https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767](https://togithub.com/DataDog/zstd/pull/117#issuecomment-1147812767)) - `Write()` becomes async with workers > 1, make sure you read the method documentation before using #### New Contributors - [@bsergean](https://togithub.com/bsergean) made their first contribution in [https://github.com/DataDog/zstd/pull/117](https://togithub.com/DataDog/zstd/pull/117) - [@XiaochenCui](https://togithub.com/XiaochenCui) for his work on [https://github.com/DataDog/zstd/pull/118](https://togithub.com/DataDog/zstd/pull/118) that led to [#120](https://togithub.com/DataDog/zstd/issues/120) **Full Changelog**: DataDog/zstd@v1.5.2...v1.5.2+patch1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).

Add SetNbWorkers api to the writer code.

0ec6068

bsergean added 3 commits May 6, 2022 21:20

enable multi-threaded build of zstd

abd3a62

add a docstring for Writer::SetNbWorkers

0d4dce0

add unitest for SetNbWorkers + add a nbWorker parameter to testCompre…

46f6615

…ssionDecompression internal function, defaulting the value to 1

Viq111 reviewed May 11, 2022

View reviewed changes

bsergean added 2 commits May 20, 2022 16:26

New error code + some unittest fixes

622e79a

Simplify big string creation in TestStreamSetNbWorkers

2ee47be

Viq111 approved these changes May 26, 2022

View reviewed changes

Viq111 mentioned this pull request Jun 6, 2022

[zstd] Fix unneededly allocate large decompression buffer #120

Merged

bsergean mentioned this pull request Jun 6, 2022

Add support for zstd multi-threading #17

Open

update documentation

13d5b10

Merge branch '1.x' into patch-1

c798238

Viq111 approved these changes Jun 6, 2022

View reviewed changes

Viq111 merged commit fd035e5 into DataDog:1.x Jun 6, 2022

Add SetNbWorkers api to the writer code (see #108) #117

Add SetNbWorkers api to the writer code (see #108) #117

Uh oh!

Conversation

bsergean commented May 5, 2022

Uh oh!

bsergean commented May 5, 2022

Uh oh!

Viq111 commented May 6, 2022

Uh oh!

bsergean commented May 6, 2022 via email

Uh oh!

bsergean commented May 7, 2022

Uh oh!

bsergean commented May 7, 2022

Uh oh!

Viq111 left a comment

Choose a reason for hiding this comment

Uh oh!

Viq111 May 11, 2022

Choose a reason for hiding this comment

Uh oh!

bsergean May 20, 2022

Choose a reason for hiding this comment

Uh oh!

Viq111 May 11, 2022

Choose a reason for hiding this comment

Uh oh!

bsergean May 20, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bsergean commented May 11, 2022 via email

Uh oh!

Viq111 commented May 11, 2022

Uh oh!

bsergean commented May 11, 2022 via email

Uh oh!

bsergean commented May 20, 2022

Uh oh!

Viq111 left a comment

Choose a reason for hiding this comment

Uh oh!

bsergean commented May 26, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

edit: this comment is incorrect, see latest

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

bsergean commented Jun 6, 2022 via email

Uh oh!

Viq111 commented Jun 6, 2022

Uh oh!

Viq111 left a comment

Viq111 commented Jun 6, 2022 •

edited

Loading