Skip to content

Conversation

tony19
Copy link
Contributor

@tony19 tony19 commented Aug 3, 2022

related #9492

Description

This updates the CI test jobs to use retry.ts, which reruns the given command (up to 3 times by default) if its output contains one of the errors indicated in #9492. I picked 3 for the default retry count because in my experiments, the same error can appear twice in a row.

The script also adds a warning annotation to the GitHub workflow run to easily identify the flakiness from the Summary view:

Screen Shot 2022-08-03 at 12 45 41 PM

This script can be run locally like this:

# retry 3 times
npx tsx ./scripts/retry.ts pnpm --color=always test

# retry 5 times
npx tsx ./scripts/retry.ts -r 5 pnpm --color=always test

Testing retry.ts

Here's a script that runs pnpm --color=always test-build 10 times in a row, which causes one of the known errors in Node 16:

#!/usr/bin/env bash -e

for i in `seq 10`; do
  echo "************* test run ${i} *************"
  npx tsx ./scripts/retry.ts pnpm --color=always test-build
done

(Tested on macOS Big Sur with Node 16.16.0)

Screenshot:

Screen Shot 2022-08-03 at 12 24 26 AM

Additional context


What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Before submitting the PR, please make sure you do the following

  • Read the Contributing Guidelines.
  • Read the Pull Request Guidelines and follow the Commit Convention.
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. fixes #123).
  • Ideally, include relevant tests that fail without this PR but pass with it.

@sapphi-red sapphi-red added p1-chore Doesn't change code behavior (priority) test labels Aug 3, 2022
bluwy
bluwy previously approved these changes Aug 6, 2022
Copy link
Member

@bluwy bluwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! 💚

patak-dev
patak-dev previously approved these changes Aug 8, 2022
Copy link
Member

@patak-dev patak-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @tony19! I think this is a good idea, at least for the node segfault we can't use something from Vitest side (even if there is a retry mechanism). @antfu @sheremet-va looping you to see if this is a good approach (at least for the moment) for the Vite repo, as I think other projects may end up copying what we do here.

@dominikg maybe we should do the same on Vite ecosystem CI?

@dominikg
Copy link
Contributor

dominikg commented Aug 8, 2022

not a fan of a bash script wrapper in general

  • hard to maintain
  • breaks easily
  • potentially abusable if someone snuck in output that it is looking for

other remarks

  • ci timeout for the job would need to be increased
  • really love the warning annotation, going to add that one to my toolbox 👍
  • instead of working around it, we should put more preassure on upstream to fix it. random crashes are not acceptable

@bluwy bluwy linked an issue Aug 9, 2022 that may be closed by this pull request
7 tasks
@tony19
Copy link
Contributor Author

tony19 commented Aug 9, 2022

@dominikg Thanks for the feedback!

I did actually experiment with a Node/zx script, but the startup speed with the shell script seemed superior to me. But points taken, I can switch from shell script to TypeScript to address your concerns.

I also agree with pushing for an upstream fix, but it doesn't look like the Node team is anywhere close to a fix, and they're not sure the fix is even feasible:

@bnoordhuis commented on Jun 29

For reference: the fix is v8/v8@fcdf35e but that commit can't be back-ported because it sits on top of a bigger, backwards incompatible change, v8/v8@578f6be.

For the time being, an automated retry seems like a reasonable workaround to keep the bots going without babysitting them. I'm completely open to other ideas though!


Update: Fix for Node 16 identified here

@tony19 tony19 dismissed stale reviews from patak-dev and bluwy via 3ad9750 August 9, 2022 05:45
@antfu
Copy link
Member

antfu commented Aug 9, 2022

Looks great!

A side note, I guess it might be good to create a CLI wrapper like vitest-retry for others to reuse. Or even we could have it built-in in Vitest's CLI, (cc @sheremet-va WDYT?)

@sheremet-va
Copy link
Member

Or even we could have it built-in in Vitest's CLI, (cc @sheremet-va WDYT?)

Sure, something like --segfault-retry? We have an issue for .flaky and --retry 3 for regular tests, so I don't want to use it for this case.

@antfu
Copy link
Member

antfu commented Aug 9, 2022

Yeah --segfault-retry sounds good to me. And maybe we could have a default to 3 or so, aware of CI env etc.

@tony19
Copy link
Contributor Author

tony19 commented Aug 9, 2022

@antfu @sheremet-va Yes, please! 😄 A native vitest fix would be so helpful! Let me know if I could help.

@dominikg
Copy link
Contributor

dominikg commented Aug 9, 2022

A flag on vitest bin makes it reusable which is great!

Not sure how i feel about it being enabled by default, tbh i havn't seen this segfault happening in vite-plugin-svelte suite (which is a bit smaller but uses the same setup as vite).

It should be something users conciously opt-in to, so as they are aware of it. If it is enabled and can detect github action event to log the warning card, that would be awesome.

@antfu
Copy link
Member

antfu commented Aug 15, 2022

https://github.com/vitest-dev/vitest/releases/tag/v0.22.0

Shipped in Vitest v0.22.0.

Could enable via a env: https://github.com/vitest-dev/vitest/blob/8f24c2f65e436aa98a9e1c77eddf5d082c2ca60d/.github/workflows/ci.yml#L12-L13

@tony19
Copy link
Contributor Author

tony19 commented Aug 17, 2022

Node published a v16 fix in 16.17.0. I verified using that version eliminates the segfaults in CI.

@tony19 tony19 deleted the fix/retry-flaky-tests-in-ci branch October 23, 2022 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1-chore Doesn't change code behavior (priority) test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky tests
7 participants