Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: only use cypress.io when triggered manually #29077

Merged
merged 17 commits into from
Jun 5, 2024

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Jun 4, 2024

Preset is picking up the bill for Cypress.io and it's getting pricey for the usage we have for it. In this PR, I'm moving normal pull_request and push events to run in local GHA mode without the --record flag. This means we can't use the --parallel feature, but I wrote a little python script that segments the jobs and make them run in parallel.

I'm also providing a workflow_dispatch option where you can trigger a cypress.io run in dashboard mode, so you can still go use their stuff on a given branch if/where needed.

@github-actions github-actions bot added the github_actions Pull requests that update GitHub Actions code label Jun 4, 2024
@mistercrunch mistercrunch changed the title chore: only use cypress.io when triggered manually chore: only use cypress.io when triggered manually Jun 4, 2024
@mistercrunch
Copy link
Member Author

Screenshot 2024-06-04 at 11 24 27 AM

ok so --parallel is a paid feature, seems we'll have to find some other way to segment/parallelize

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 4, 2024
@rusackas
Copy link
Member

rusackas commented Jun 4, 2024

Doing it on master only might make sense too.

@mistercrunch
Copy link
Member Author

I think I'm finding a solution. Turns out cypress doesn't do much of the compute/work here, and the dashboard is not super useful, so I'm moving to trigger the dashboard manually using workflow_dispatch, only when useful.

The other thing is the parallelization is also part of the service, and it's "intelligent" meaning it know about recent runs and tries to balance / segment the jobs properly. I've been workign with GPT at segmenting the files based on a hash of the filename, so as long as we don't have some files much bigger than other it should distribute all write. If not we can just break down the slower files into smaller chunks.

@mistercrunch mistercrunch marked this pull request as ready for review June 5, 2024 00:40
# Run using cypress.io service
cypress_record_key = (
subprocess.check_output(
["echo", os.getenv("CYPRESS_KEY") or "DUMMY", "|", "base64", "--decode"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever there's an action that runs a script, and the script uses a KEY, it raises my security flag now :) CC @dpgaspar to make sure this is OK :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is triggered by pull_request GHA so should be ok, and I'm assuming CYPRESS_KEY is not a secret right?
yes script can be "abused" by a PR but will it have privileges to do anything destructive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, I didn't realize the key is public.

# leaving the Dashboard hanging ...
# https://github.com/cypress-io/github-action/issues/48
fail-fast: false
matrix:
containers: [1, 2, 3]
parallel_id: [0, 1, 2, 3, 4, 5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, since the setup for these runners takes a while, I was keeping the number of workers low enough to see a higher ROI on parallelization. This might shave off a minute or two, but will burn twice the coal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, there's like 7 minutes of setup and in aggregate maybe 45 minutes of parallelizable stuff. At 6 workers it's about 50/50 of the time on overhead VS work. I thought bumping this up made some sense as it has now become the bottleneck as the longest CI workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that could be nice would be to have a single worker prepare some stuff for the matrix as a whole, so that overhead could remain 7 minutes, but not be done on N workers... Not sure what GHA offers around that.

Copy link
Member

@rusackas rusackas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

submodules: recursive
- name: Checkout using PR ID (workflow_dispatch)
if: github.event_name == 'workflow_dispatch' && github.event.inputs.pr_id != ''
uses: actions/checkout@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might wanna bump this?

@mistercrunch mistercrunch merged commit b5d9ac0 into master Jun 5, 2024
34 of 39 checks passed
@mistercrunch mistercrunch deleted the cypress_when_needed branch June 5, 2024 18:10
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.1.0 labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels github_actions Pull requests that update GitHub Actions code preset-io size/L 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants