Skip to content

chore: Reduce flakiness in tests#1716

Open
Pijukatel wants to merge 2 commits intomasterfrom
run-resource-sensitive-tests-alone-on-mac
Open

chore: Reduce flakiness in tests#1716
Pijukatel wants to merge 2 commits intomasterfrom
run-resource-sensitive-tests-alone-on-mac

Conversation

@Pijukatel
Copy link
Collaborator

@Pijukatel Pijukatel commented Feb 5, 2026

Description

  • Properly consume the first SystemInfoEvent in the fixture to ensure no interference with tests
  • Add extra wait buffer to flaky test sensitive to timing issues
  • Take into account that asyncio.sleep time can be slightly shorter than expected
  • Run resource-sensitive tests alone on Mac. Mac executor is the most sensitive to resource-dependent tests. Reduce flakiness by running some of them alone on Mac only. This should reduce the flakiness, without hiding issues too much. (Mac executor on GitHub has its own issues, which we do not need to deal with. If the test is flaky on other platforms, it should be investigated. If flaky on Mac only, try to run it alone first. )

Issues

@github-actions github-actions bot added this to the 133rd sprint - Tooling team milestone Feb 5, 2026
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 5, 2026
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.44%. Comparing base (44bc85a) to head (7a72bad).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1716      +/-   ##
==========================================
- Coverage   92.46%   92.44%   -0.03%     
==========================================
  Files         156      156              
  Lines       10464    10464              
==========================================
- Hits         9676     9673       -3     
- Misses        788      791       +3     
Flag Coverage Δ
unit 92.44% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pijukatel Pijukatel force-pushed the run-resource-sensitive-tests-alone-on-mac branch from 6df4ada to af6889a Compare February 5, 2026 14:32
@Pijukatel Pijukatel added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 5, 2026
@Pijukatel Pijukatel force-pushed the run-resource-sensitive-tests-alone-on-mac branch 3 times, most recently from 20ae690 to 48f27cf Compare February 6, 2026 09:06
@Pijukatel Pijukatel changed the title chore: Run resource sensititve tests alone on Mac chore: Reduce flakiness in some tests Feb 6, 2026
@Pijukatel Pijukatel changed the title chore: Reduce flakiness in some tests chore: Reduce flakiness in tests Feb 6, 2026
@Pijukatel Pijukatel force-pushed the run-resource-sensitive-tests-alone-on-mac branch from 48f27cf to 6734fa0 Compare February 6, 2026 09:20
@Pijukatel Pijukatel marked this pull request as ready for review February 6, 2026 09:24
@Pijukatel Pijukatel force-pushed the run-resource-sensitive-tests-alone-on-mac branch from 6734fa0 to 710e437 Compare February 6, 2026 10:55
Add extra wait buffer to flaky test sensitive to timing issues
Properly consume first `SystemInfoEvent` in fixture to ensure no interference with tests
Take into account that `asyncio.sleep` time can sleep shorter than expected
@Pijukatel Pijukatel force-pushed the run-resource-sensitive-tests-alone-on-mac branch from 710e437 to ad10f55 Compare February 6, 2026 10:59
@Pijukatel Pijukatel requested a review from Mantisus February 6, 2026 10:59
Copy link
Collaborator

@Mantisus Mantisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one suggestion. Otherwise, LGTM.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces unit test flakiness by isolating resource-sensitive tests on macOS and adding timing tolerances/buffers where the tests depend on scheduler timing or periodic background events.

Changes:

  • Added a run_alone_on_mac helper to mark specific tests as run_alone only on macOS.
  • Increased the asyncio.timeout(...) buffer in a flaky BasicCrawler timeout test.
  • Made timing-based tests more tolerant (asyncio sleep tolerance) and updated the Snapshotter fixture to consume the initial system info event before yielding.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/utils.py Adds macOS-only decorator helper for run_alone tests.
tests/unit/crawlers/_basic/test_basic_crawler.py Increases timeout buffer to reduce flakiness in test_timeout_in_handler.
tests/unit/browsers/test_browser_pool.py Marks one BrowserPool test to run alone on macOS.
tests/unit/_utils/test_recurring_task.py Marks RecurringTask execution test to run alone on macOS.
tests/unit/_statistics/test_request_max_duration.py Adds tolerance for asyncio.sleep undersleep in duration assertion.
tests/unit/_autoscaling/test_snapshotter.py Consumes initial SYSTEM_INFO event in fixture to avoid cross-test interference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vdusek vdusek self-requested a review February 6, 2026 14:41
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 3 things to ask.

  1. Aren't these tests flaky on Windows as well? I think they are, and I'd suggest running them in isolation on Windows too.

  2. AFAIK the following tests are also flaky:

  • test_system.py::test_memory_estimation_does_not_overestimate_due_to_shared_memory
  • test_sitemap_request_loader.py::test_sitemap_traversal[httpx / impit]
  • test_playwright_crawler.py::test_isolation_cookies[without use_incognito_pages]

Do you plan to investigate these later or take any action on them? I'd be in favor of adding them to the "run alone" group as well.

  1. Max recently added the pytest-rerunfailures package with @pytest.mark.flaky decorators to six tests. This means we now have two different approaches to handling flakiness. Are you aware of this? Personally, I'd prefer removing the automatic retries and relying only on isolated execution (or I would give it a try for now).

@Pijukatel
Copy link
Collaborator Author

Pijukatel commented Feb 6, 2026

I have 3 things to ask.

1. Aren't these tests flaky on Windows as well? I think they are, and I'd suggest running them in isolation on Windows too.

I looked only at some recent failures. I tried to solve those that seem to be flaky everywhere and use the marker where I saw only Mac failures.

AFAIK the following tests are also flaky:

test_system.py::test_memory_estimation_does_not_overestimate_due_to_shared_memory
test_sitemap_request_loader.py::test_sitemap_traversal[httpx / impit]
test_playwright_crawler.py::test_isolation_cookies[without use_incognito_pages]

Maybe. This is not an attempt to fix all, just the ones I was aware of from my recent runs. It can be incremental. If some test is flaky too often, I probably look into it.

test_sitemap_request_loader was fixed by Max yesterday

Max recently added the pytest-rerunfailures package with @pytest.mark.flaky decorators to six tests. This means we now have two different approaches to handling flakiness. Are you aware of this? Personally, I'd prefer removing the automatic retries and relying only on isolated execution (or I would give it a try for now).

I see it like this:
@pytest.mark.flaky is for unknown generic flakiness, and I think it should be used as a last resort before skipping the test. run_alone should be used for tests:

  • That, by design, can't run in parallel
  • For resource-sensitive tests. Since I saw how the Mac executor is the worst affected by those resource issues, I think it will be more common to use it than Windows or Linux.

Maybe we could do something like this in case of flaky tests (and go to next step if the first did not help) run_alone_on_mac-> run_alone -> @pytest.mark.flaky --> skip
(not a rule, but a good enough recommendation?)

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 6, 2026

test_playwright_crawler.py::test_isolation_cookies[without use_incognito_pages]

I will study this test separately, as it should not be linked to resources. There may be a problem with the test design or some other issue.

Max recently added the pytest-rerunfailures package with @pytest.mark.flaky decorators to six tests. This means we now have two different approaches to handling flakiness.

Some of the tests I marked with @pytest.mark.flake are resource-sensitive. And I assume that on a Mac, they may even exhibit instability with run_alone_on_mac.

Maybe we could do something like this in case of flaky tests (and go to next step if the first did not help) run_alone_on_mac-> run_alone -> @pytest.mark.flaky --> skip

I like this strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test test_timeout_in_handler is flaky

3 participants