Skip to content

Comments

Separate jobs for large memory tests with sanitizers#3161

Merged
zuiderkwast merged 6 commits intovalkey-io:unstablefrom
sarthakaggarwal97:skip-tests-asan
Feb 10, 2026
Merged

Separate jobs for large memory tests with sanitizers#3161
zuiderkwast merged 6 commits intovalkey-io:unstablefrom
sarthakaggarwal97:skip-tests-asan

Conversation

@sarthakaggarwal97
Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Feb 4, 2026

We have been seeing github actions runners being OOM when large memory tests are run with ASan. The operation eventually is being canceled during the test.

This change moves the large-memory tests with ASan and UBSan to separate jobs, so we get a dedicated runner with its own timeout. We can tweak the number of simultaneous test clients for these tests without affecting the other test jobs.

Some recent failure examples:

  1. https://github.com/valkey-io/valkey/actions/runs/21573028934/job/62155215331#step:10:425
  2. https://github.com/valkey-io/valkey/actions/runs/21553319484/job/62105370976#step:10:425

@sarthakaggarwal97
Copy link
Contributor Author

@madolson I skipped the tests right now for scheduled runs. Let me know if you think we should skip the tests altogether. Thanks!

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.94%. Comparing base (87caeb7) to head (ae55f09).
⚠️ Report is 4 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3161      +/-   ##
============================================
+ Coverage     74.90%   74.94%   +0.03%     
============================================
  Files           129      129              
  Lines         71327    71329       +2     
============================================
+ Hits          53429    53457      +28     
+ Misses        17898    17872      -26     

see 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@dvkashapov dvkashapov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons to keep running large memory with asan is to keep new bugs from being introduced, here #1748 large string bug was fixed because we did not have those runs before. Also if someone will see failure in extra tests they will not have reference to wether this change introduced new bug or it was already present.

@sarthakaggarwal97
Copy link
Contributor Author

@dvkashapov that's a good example but over there it affected a specific platform. But currently ASan tests with daily are only run on single machine / platform. So we are anyways not covering all the platforms.
I am more worried if these large memory tests might hide important failures since the operation just gets canceled due to machines going OOM.

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

What do you think?

@dvkashapov
Copy link
Member

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

Yes, this makes sense, what tests did affect the most?

@zuiderkwast
Copy link
Contributor

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

@sarthakaggarwal97
Copy link
Contributor Author

sarthakaggarwal97 commented Feb 5, 2026

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

Like as a part of daily itself? In another workflow? Can try it out.

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, this makes sense, what tests did affect the most?

test_quicklistCompressAndDecompressQuicklistListpackNode

@sarthakaggarwal97
Copy link
Contributor Author

Also, multiple sanitizer runs failed again yesterday: https://github.com/valkey-io/valkey/actions/runs/21693609685/job/62558879812#step:10:423

@zuiderkwast
Copy link
Contributor

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, but if this new job (a separate job in Daily) only runs the large-memory tests and nothing else, then it shouldn't take too long I guess.

@sarthakaggarwal97
Copy link
Contributor Author

@zuiderkwast I can try doing this and maybe run the tests and observe.

@sarthakaggarwal97
Copy link
Contributor Author

@zuiderkwast I have separated them into different jobs. Let me give it a few tries to run daily in my forked repo and see if it works.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97
Copy link
Contributor Author

@zuiderkwast the asan tests passed in my local forked repo after separating the jobs: https://github.com/sarthakaggarwal97/valkey/actions/runs/21768844363

@sarthakaggarwal97 sarthakaggarwal97 changed the title Skip large memory tests in ASan Separate Jobs for large memory tests in ASan Feb 9, 2026
Copy link
Member

@dvkashapov dvkashapov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@zuiderkwast
Copy link
Contributor

It looks like unit test was running but regular tests and module API tests were skipped in your repo:

image

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97
Copy link
Contributor Author

@zuiderkwast thanks for the call out. I think as I am running these large memory tests we are seeing multiple errors. (#3184, #3174). Good to get this green before release.

@sarthakaggarwal97
Copy link
Contributor Author

@zuiderkwast zuiderkwast changed the title Separate Jobs for large memory tests in ASan Separate jobs for large memory tests with sanitizers Feb 10, 2026
@zuiderkwast zuiderkwast merged commit 42e6a0b into valkey-io:unstable Feb 10, 2026
24 checks passed
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
We have been seeing github actions runners being OOM when large memory
tests are run with ASan. The operation eventually is being canceled
during the test.

This change moves the large-memory tests with ASan and UBSan to separate
jobs, so we get a dedicated runner with its own timeout. We can tweak
the number of simultaneous test clients for these tests without
affecting the other test jobs.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants