Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to native traceback in cuml #6078

Merged
merged 12 commits into from
Dec 18, 2024
Merged

Conversation

galipremsagar
Copy link
Contributor

In cudf we have observed a ~10% speed up of pytest suite execution by switching pytest traceback to --native:

currently:

102474 passed, 2117 skipped, 902 xfailed in 892.16s (0:14:52)

--tb=short:

102474 passed, 2117 skipped, 902 xfailed in 898.99s (0:14:58)

--tb=no:

102474 passed, 2117 skipped, 902 xfailed in 815.98s (0:13:35)

--tb=native:

102474 passed, 2117 skipped, 902 xfailed in 820.92s (0:13:40)

This PR makes similar change to cuml.

xref: rapidsai/cudf#16851

@galipremsagar galipremsagar requested a review from a team as a code owner September 23, 2024 20:35
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Sep 23, 2024
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 23, 2024
@dantegd
Copy link
Member

dantegd commented Sep 23, 2024

@galipremsagar those times are for the cuDF pytest suite? It might be worth to check the time savings for cuML, I got a fresh container so can check those times in the next couple of days if so

@galipremsagar
Copy link
Contributor Author

@dantegd Yes, the timing improvements I posted above are of cudf pytest suite. cuml seems to have even better speedups, almost ~20% faster:

This PR:

==== 466 skipped, 9 xfailed, 201 xpassed, 25 warnings in 1432.43s (0:23:52) ====

link: https://github.com/rapidsai/cuml/actions/runs/11002114744/job/30550052027?pr=6078#step:7:1802

Other most recently merged PR:

==== 466 skipped, 9 xfailed, 201 xpassed, 3 warnings in 1696.39s (0:28:16) =====

link: https://github.com/rapidsai/cuml/actions/runs/10999896809/job/30543198516#step:7:1622

@trivialfis
Copy link
Member

Super nice! I should try this for XGBoost as well.

@dantegd dantegd changed the base branch from branch-24.10 to branch-24.12 October 6, 2024 18:59
@dantegd
Copy link
Member

dantegd commented Oct 17, 2024

/merge

@divyegala
Copy link
Member

@galipremsagar could you please check what's going wrong with the pytests in this PR? I can't quite figure out what the error is exactly.

@bdice bdice changed the base branch from branch-24.12 to branch-25.02 December 11, 2024 14:20
@bdice
Copy link
Contributor

bdice commented Dec 11, 2024

I merged in the upstream. Dask tests (and maybe others) still appear to be failing without a clear indication of the problem. It will probably be necessary to try a local reproduction of CI with more pytest verbosity.

Maybe we can skip this change to Dask tests if it's just Dask tests failing. Still waiting on CI to find out.

@betatim
Copy link
Member

betatim commented Dec 12, 2024

I am less excited about saving five minutes on 30minutes than everyone else here. Both those times are basically "infinite" in terms of developer experience, which means my workflow is something like "Make a change, start tests, context switch to something else, eventually come back to it later. Reconstruct context, act". This means you need a lot of information when you return to the test output to be able to reconstruct the context and figure out the failure. I had a quick google around but couldn't find examples of what the default (long?) vs native tracebacks look like in terms of information you get.

Do we have some examples for cuml or cudf?

The five minute saving is quickly wiped out if you need to re-run the test suite more often because you are missing information.

@bdice
Copy link
Contributor

bdice commented Dec 12, 2024

@betatim We had a conversation about this on Slack and there was consensus that there is effectively no loss of information by making this change. It takes a lot of time for pytest to walk through all the traces and pretty-print things. I’ll link you to the discussions in Slack.

See also: rapidsai/cudf#16851 (comment)

@betatim
Copy link
Member

betatim commented Dec 12, 2024

I guess it is simple enough to switch back to full tracebacks if it turns out to be annoying.

Though it seems the real solution is to not have so many xfailed tests. If anyone wants to tackle those :D

@vyasr
Copy link
Contributor

vyasr commented Dec 12, 2024

Both those times are basically "infinite" in terms of developer experience, which means my workflow is something like "Make a change, start tests, context switch to something else, eventually come back to it later. Reconstruct context, act".

Keep in mind that unfortunately this isn't really true when CI is so backed up that you come back and it hasn't run yet, which has historically been a fairly common occurrence. I's especially not true when we're up against a release timeline and we can't get things done fast enough, or if we have a global breaking change that requires rerunning CI on all PRs and it takes days to get caught up.

@bdice
Copy link
Contributor

bdice commented Dec 17, 2024

Maybe we can skip this change to Dask tests if it's just Dask tests failing.

I made this change in bf65363 and merged in the upstream. Hopefully CI will pass now.

@galipremsagar
Copy link
Contributor Author

Maybe we can skip this change to Dask tests if it's just Dask tests failing.

I made this change in bf65363 and merged in the upstream. Hopefully CI will pass now.

I tried reproing locally without timeouts but the tests are still running after 8hrs+. If these tests are designed to run under 2hrs on CI and they finish we should be good

@rapids-bot rapids-bot bot merged commit 191f7ef into rapidsai:branch-25.02 Dec 18, 2024
61 of 62 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 9, 2025
fixes #6194

Wheel tests in this project are emitting tons of warnings like this:

> test_random_forest.py:1247
  /__w/cuml/cuml/python/cuml/cuml/tests/test_random_forest.py:1247: PytestUnknownMarkWarning: Unknown pytest.mark.memleak - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.memleak

I think that's because the introduction of a `pytest.ini` file in #6078 resulted in all of the `pytest` options from `pyproject.toml` being ignored.

From https://docs.pytest.org/en/stable/reference/customize.html#pytest-ini

> pytest.ini files take precedence over other files, even when empty.

I think "take precedence" there means that if `pytest` finds a `pytest.ini`, it stops searching for other configuration files.

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Tim Head (https://github.com/betatim)
  - Jake Awe (https://github.com/AyodeAwe)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #6201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants