[DNM] Make self-hosted GPU workflow run again #129

ijpulidos · 2025-10-17T18:10:39Z

Solves #126

The only difference I saw was the micromamba setup version, updating it in these changes.

codecov · 2025-10-17T18:20:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.87%. Comparing base (639b258) to head (5e2602f).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #129   +/-   ##
=======================================
  Coverage   86.86%   86.87%           
=======================================
  Files          11       11           
  Lines        1439     1440    +1     
=======================================
+ Hits         1250     1251    +1     
  Misses        189      189

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ijpulidos · 2025-10-17T18:47:27Z

From the latest actions with self runners they started running again all of a sudden. But we are getting some JSON decoding errors, either way I think the current changes in this branch should be worth it (for the micromamba setup action).

I'll let the self-host test finish in this PR and then change it back to a cron-scheduled automated test.

mikemhenry · 2025-10-17T22:01:41Z

I am a bit worried the EC2 instance is running but won't ever pick up the job https://github.com/OpenFreeEnergy/feflow/actions/runs/18601132394/job/53046333313?pr=129

mikemhenry · 2025-10-17T22:20:53Z

Try updating the ami to ami-076a54ed41e67782d which will have the latest cuda drivers

mikemhenry · 2025-10-17T22:21:27Z

We actually don't have any running so it must have crashed

ijpulidos · 2025-10-18T21:59:31Z

@mikemhenry Thanks, that's a good tip. Will try running with that ami in the future.

From latest re-run https://github.com/OpenFreeEnergy/feflow/actions/runs/18601132394/job/53091442796?pr=129 I believe the instance is getting stopped abruptly, my guess is that it's hitting some resources limit, I quick test could be to run the tests in serial (not in parallel), and see.

ijpulidos added 2 commits October 17, 2025 14:08

updating micromamba setup action version

f9816f9

TEMP Run GPU workflow on push to main

5e2602f

ijpulidos linked an issue Oct 17, 2025 that may be closed by this pull request

Fix/recover self-hosted GPU CI #126

Open

ijpulidos changed the title ~~Make self-hosted GPU workflow run again~~ [DNM] Make self-hosted GPU workflow run again Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DNM] Make self-hosted GPU workflow run again #129

[DNM] Make self-hosted GPU workflow run again #129

Uh oh!

ijpulidos commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025

Uh oh!

ijpulidos commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

ijpulidos commented Oct 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DNM] Make self-hosted GPU workflow run again #129

Are you sure you want to change the base?

[DNM] Make self-hosted GPU workflow run again #129

Uh oh!

Conversation

ijpulidos commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025

Codecov Report

Uh oh!

ijpulidos commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

ijpulidos commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ijpulidos commented Oct 18, 2025 •

edited

Loading