Fix eval in regression test #1305

ebsmothers · 2024-08-11T04:30:50Z

The config name and the results parsing in our regression test job are incorrect.

This is actually a bit awkward to test now that we (a) don't allow creating PRs from a fork, and (b) don't let forks access the S3 bucket containing regression test artifacts.

So for now I've tested it locally, which I guess is better than nothing?

pytest tests/regression_tests/test_llama2_7b.py -m slow_integration_test
...
====== 1 passed, 1 warning in 267.94s (0:04:27) ======

pytorch-bot · 2024-08-11T04:30:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1305

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 56ce980 with merge base 00bbd53 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SalmanMohammadi · 2024-08-11T10:07:10Z

tests/regression_tests/test_llama2_7b.py

@@ -50,7 +50,7 @@ def test_finetune_and_eval(self, tmpdir, caplog, monkeypatch):
        runpy.run_path(TUNE_PATH, run_name="__main__")
        eval_cmd = f"""
        tune run eleuther_eval \
-            --config eleuther_eval \
+            --config eleuther_evaluation \


How was this test working before??

It wasn’t, it’s been failing in CI for a while now. But it doesn’t run on PRs or anything, only nightly

SalmanMohammadi · 2024-08-11T13:13:51Z

This is actually a bit awkward to test now that we (a) don't allow creating PRs from a fork, and (b) don't let forks access the S3 bucket containing regression test artifacts.

I missed this, but we're no longer running integration tests in our CI?

ebsmothers · 2024-08-11T13:50:57Z

This is actually a bit awkward to test now that we (a) don't allow creating PRs from a fork, and (b) don't let forks access the S3 bucket containing regression test artifacts.

I missed this, but we're no longer running integration tests in our CI?

So we actually have two types of integration tests: recipe tests and regression tests. Recipe tests always run on PRs but only use small checkpoints. Regression tests use the full-size model and run nightly. Currently this is the only regression test we have, but we’ve been wanting to add more and just haven’t had time (e.g. it’d be nice if we could test memory or perf of some of our models too).

SalmanMohammadi · 2024-08-11T14:10:07Z

I'll raise an issue : )

Fix eval config name in regression test

dc99d55

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2024

ebsmothers requested a review from joecummings August 11, 2024 04:31

ebsmothers marked this pull request as draft August 11, 2024 04:40

ebsmothers changed the title ~~Fix eval config name in regression test~~ [wip] Fix eval config name in regression test Aug 11, 2024

ebsmothers added 4 commits August 10, 2024 21:48

test with workflow dispatch trigger

ed46e21

remove testing code

d8d64b3

other changes to regression test

4376563

fix re search

56ce980

ebsmothers marked this pull request as ready for review August 11, 2024 05:10

ebsmothers changed the title ~~[wip] Fix eval config name in regression test~~ Fix eval config name in regression test Aug 11, 2024

SalmanMohammadi reviewed Aug 11, 2024

View reviewed changes

SalmanMohammadi approved these changes Aug 11, 2024

View reviewed changes

SalmanMohammadi mentioned this pull request Aug 11, 2024

Regression testing in torchtune #1306

Open

ebsmothers changed the title ~~Fix eval config name in regression test~~ Fix eval in regression test Aug 11, 2024

ebsmothers merged commit cc988f2 into pytorch:main Aug 11, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix eval in regression test #1305

Fix eval in regression test #1305

Uh oh!

ebsmothers commented Aug 11, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 11, 2024 •

edited

Loading

Uh oh!

SalmanMohammadi Aug 11, 2024

Uh oh!

ebsmothers Aug 11, 2024

Uh oh!

SalmanMohammadi commented Aug 11, 2024

Uh oh!

ebsmothers commented Aug 11, 2024

Uh oh!

SalmanMohammadi commented Aug 11, 2024

Uh oh!

Uh oh!

Uh oh!

Fix eval in regression test #1305

Fix eval in regression test #1305

Uh oh!

Conversation

ebsmothers commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1305

✅ No Failures

Uh oh!

SalmanMohammadi Aug 11, 2024

Choose a reason for hiding this comment

Uh oh!

ebsmothers Aug 11, 2024

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi commented Aug 11, 2024

Uh oh!

ebsmothers commented Aug 11, 2024

Uh oh!

SalmanMohammadi commented Aug 11, 2024

Uh oh!

Uh oh!

Uh oh!

ebsmothers commented Aug 11, 2024 •

edited

Loading

pytorch-bot bot commented Aug 11, 2024 •

edited

Loading