Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] Add xgboost release test for silver tier(10-node case). #26460

Merged
merged 18 commits into from
Jul 15, 2022

Conversation

xwjiang2010
Copy link
Contributor

@xwjiang2010 xwjiang2010 commented Jul 12, 2022

Why are these changes needed?

Add release test for air xgboost benchmark.
Will take a look at the number first running in release test environment and add a proper triggering threshold (time * 1.1).

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@xwjiang2010 xwjiang2010 added this to the Ray AIR milestone Jul 12, 2022
@xwjiang2010
Copy link
Contributor Author

@xwjiang2010
Copy link
Contributor Author

Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@richardliaw
Copy link
Contributor

nice! How do we get this onto the dashboard?

@xwjiang2010
Copy link
Contributor Author

xwjiang2010 commented Jul 12, 2022

@Yard1 Do you know why the result shows

training_time = 1850.115315906 --> this seems too slow to me
prediction_time = 86.38269096899967 --> this seems too fast to me

There is a ray.shutdown in the middle.

@xwjiang2010
Copy link
Contributor Author

nice! How do we get this onto the dashboard?

waiting for Kai to set something up in his PR and will just follow the practice.

@Yard1
Copy link
Member

Yard1 commented Jul 12, 2022

Actually to be 100% sure can we run the training part and prediction part as separate subprocesses?

@Yard1
Copy link
Member

Yard1 commented Jul 12, 2022

also @xwjiang2010 we are missing the EBS config, I think that may be a cause for slowdown

@Yard1
Copy link
Member

Yard1 commented Jul 12, 2022

@xwjiang2010 we probably can bring down the disk size to like 200 GBs to save a little money!

@xwjiang2010
Copy link
Contributor Author

@Yard1 haha sounds good :)
do you know how can I know how much is being spilled?

@Yard1
Copy link
Member

Yard1 commented Jul 12, 2022

@xwjiang2010
Copy link
Contributor Author

training_time = 1663.45228263
prediction_time = 77.18809864599962

@Yard1

@xwjiang2010
Copy link
Contributor Author

xwjiang2010 commented Jul 14, 2022

pretty big deviation here
run1:

2022-07-13 11:21:17,897      INFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=260,000,000 in 822.28 seconds (701.52 pure XGBoost training time).
2022-07-13 11:21:19,560 INFO tune.py:738 -- Total run time: 827.66 seconds (826.38 seconds for the tuning loop).

run2:

[RayXGBoost] Finished XGBoost training on training data with total N=260,000,000 in 950.20 seconds (718.92 pure XGBoost training time).
2022-07-13 22:20:12,407 INFO tune.py:738 -- Total run time: 955.67 seconds (954.53 seconds for the tuning loop).

The data loading part is the cause of slowdown in 2nd run. @Yard1 , if you are OK, I am going to relax the criteria a bit more, say 1000s.

Relax the threshold.
@Yard1
Copy link
Member

Yard1 commented Jul 14, 2022

That should be fine, thanks

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw merged commit a241e6a into ray-project:master Jul 15, 2022
xwjiang2010 added a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022
…oject#26460)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…oject#26460)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
@xwjiang2010 xwjiang2010 deleted the xgboost_release branch July 26, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants