-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bundle: Preallocate buffers for file contents. #6818
bundle: Preallocate buffers for file contents. #6818
Conversation
5839c4b
to
bb9f8e1
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 😃
c95cb9e
to
85ad3a6
Compare
This commit adds logic to preallocate buffers when loading files from both tarballs and on-disk bundle directories. The change results in lower max RSS memory usage at runtime, and better garbage collector performance, especially when at lower values of GOMAXPROCS. For very large bundles (>1 GB in size), this change can lower startup times for OPA by as much as a full second. The performance analysis was different than for most changes-- heap usage increased by about 10% during bundle loading, which made the change look bad at first. Some of the effect appears to be from the Go compiler no longer inlining as far up the call chain during bundle loading (visible in the `pprof` graphs). Running with `GODEBUG=gctrace=1` and varying GOMAXPROCS allowed seeing a fuller picture of how performance changes from preallocation, which results in much less garbage for the collector, and a noticeable speedup in wall-clock time the GC burns during bundle loading. Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
85ad3a6
to
bb22028
Compare
@johanfylling If you think a Golang benchmark or something would be valuable to add to this PR to help the case for merging, please LMK, and I'll allocate some time for that. Thanks again for the helpful review comments! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
If you have the bandwidth, a benchmark wouldn't hurt, like you suggest. Performance is the major claim of this PR after all. Changes look legit, though, so I won't hold you up if you have more on your plate.
The change will improve the performance when loading large (giant) bundles...I think it might be hard to do this in a benchmark. The PR's instructions outline the procedure used to verify it...that might have to be sufficient for now, I suppose. |
58e355a
683167e
to
be80a64
Compare
Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
be80a64
to
99d6181
Compare
@johanfylling @srenatus Benchmarks are up! 😄 It took me a sec to find the right way to exercise the changes-- I wound up settling on calling Example CLI invocations of the new benchmarks: go test -benchmem -run=^$ -bench ^BenchmarkTarballLoader$ github.com/open-policy-agent/opa/bundle -count=30 > tar.txt
go test -benchmem -run=^$ -bench ^BenchmarkDirectoryLoader$ github.com/open-policy-agent/opa/bundle -count=30 > dir.txt Benchstat results for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
Why were the changes in this PR made?
This PR reduces the max RSS memory usage of OPA during bundle loading by about 5-6%, and greatly improves GC performance at lower values of GOMAXPROCS.
This improves OPA's performance particularly for container and low-resource use cases by reducing the peak of the spike in memory usage, and also reduces garbage collector overheads.
What are the changes in this PR?
This PR adds logic to preallocate buffers when loading files from both tarballs and on-disk bundle directories. The change results in lower max RSS memory usage at runtime, and better garbage collector performance, especially when at lower values of
GOMAXPROCS
.For very large bundles (>1 GB in size), this change can lower startup times for OPA by as much as a full second.
Benchmarking Details & Instructions
Huge thanks to @charlesdaniels for help getting the Awk script working! It's very handy for the summary reports about what the GC traces mean.
Instructions and scripts are provided in the expandable Details block below.
Scripts used for generating bundle contents/collecting results
Generating worst-case, large bundles
I used the following Python script to generate arbitrarily large, but highly compressible OPA bundles.
genbig.py
:python3 genbig.py > data.json
(Adjust theN
variable to dial in the size in KB you want the file to be.)Getting
pprof
results quickly from an OPA serverprofile.sh
./profile.sh pr-1gb
(Generatesallocs-pr-1gb.png
andheap-pr-1gb.png
)I ran OPA with the
pprof
endpoints enabled, using the CLI commands shown earlier in the graph sections. After collecting the pprof traces and generating the graphs withprofile.sh
, I then switched over to garbage collector trace work.Collecting and summarizing GC trace results
With some help from @charlesdaniels, I put together this Awk script for reporting the aggregate timing stats for a GC trace log.
summarize-gctrace.awk
:The script can be run in-line with OPA by
tee
-ing stderr over to the Awk process, and thenpkill
'ing OPA with aSIGINT
signal. Example with the PR build of OPA:Terminal 1:
Terminal 2:
Performance Analysis and Benchmarking Results
pprof
graphs).GODEBUG=gctrace=1
and varyingGOMAXPROCS
allows seeing a fuller picture of how performance changes from preallocation. There is much less garbage for the collector, and a noticeable reduction in how much wall-clock time the garbage collector burns during bundle loading./usr/bin/time
(GNU time, not the shell builtintime
command seen on some platforms!), and also collected the usual stats withpprof
.Results from
main
:CLI command:
/usr/bin/time -v ./opa-main run -s -b bundle-1gb.tar.gz --pprof
Results
Printout from
time
(noteMaximum resident set size
is ~3.8 GB)GODEBUG=gctrace=1
logs frommain
This is a representative sample of loading a 1GB bundle with
GODEBUG=gctrace=1
enabled.Summary stats:
Allocs graph from
main
Heap graph from
main
Results from this PR:
CLI command:
/usr/bin/time -v ./opa-pr run -s -b bundle-1gb.tar.gz --pprof
Results
Printout from
time
(noteMaximum resident set size
is ~3.6 GB)GODEBUG=gctrace=1
logs from PRThis is a representative sample of loading a 1GB bundle with
GODEBUG=gctrace=1
enabled.Summary stats:
Allocs graph from PR
Heap graph from PR