-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Use runtime profiling to replace manual memory analyzers #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@WoosukKwon This PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuohan123 for this PR. Please check my comments.
Additionally, I've found that the PR changes the output of the examples in |
@zhuohan123 I will start to merge the other PRs first, so that this PR does not block any. |
@WoosukKwon I fixed all the review comments and merged with the latest master. Please review again. Feel free to merge it. One thing I noticed is this bug in the sampler. With this bug, I discover that these two sorts (link1, link2) take a lot of memory (4-5GB) when sampling for 2560 tokens at the same time. In this case, they are sorting a (2560, 51200) shape tensor. I'm not sure why these two sorts take that much memory and whether there are more memory efficient way to implement topp and topk. |
I haven't added it in the current PR since it looks pretty redundant to add an additional "set_random_seed" call to all the workers after profiling. It will be a nested ray call and complicates the code. I think this will confuse future people when they read the code. |
What do you mean by "nested ray call"? I didn't actually get it. And I think the purpose of resetting the random state is to make sure everything that happens before starting serving should not affect the outputs. In this sense, I think we should reset the random states after the profiling run, while we don't have to reset the states before profiling run. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* remove elementwise kernel * fix lint
…se_v3 habana_main rebase
…io-vllm-cpu-v2-19 Red Hat Konflux update vllm-cpu-v2-19
Fix #59.
Previously we used a manual memory profiler, which must be implemented separately for each model. This PR replaces it with a general memory profiler.