Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor - Fix the cgroup version checking logic. #502

Merged
merged 2 commits into from
Apr 3, 2023

Conversation

guoshzhao
Copy link
Contributor

Description
Looks grep cgroup /proc/filesystems doesn't work for NDv4 whose cgroup version is v1, but the result of this command got v2 for NDv4. Instead, checking the file existence to judge the cgroup version.

@guoshzhao guoshzhao added bug Something isn't working monitor labels Mar 30, 2023
@guoshzhao guoshzhao requested a review from a team as a code owner March 30, 2023 09:09
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #502 (56542f7) into release/0.8 (97c9a41) will decrease coverage by 0.03%.
The diff coverage is 37.50%.

@@               Coverage Diff               @@
##           release/0.8     #502      +/-   ##
===============================================
- Coverage        87.33%   87.31%   -0.03%     
===============================================
  Files               89       89              
  Lines             5946     5944       -2     
===============================================
- Hits              5193     5190       -3     
- Misses             753      754       +1     
Flag Coverage Δ
cpu-python3.6-unit-test 73.58% <0.00%> (+0.02%) ⬆️
cpu-python3.7-unit-test 73.58% <0.00%> (+0.02%) ⬆️
cpu-python3.8-unit-test 74.05% <0.00%> (+0.02%) ⬆️
cuda-unit-test 87.24% <37.50%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superbench/monitor/monitor.py 62.65% <37.50%> (-1.04%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@cp5555 cp5555 mentioned this pull request Mar 30, 2023
23 tasks
@guoshzhao guoshzhao requested a review from RyoYang March 30, 2023 10:56
@guoshzhao guoshzhao merged commit 26373ed into release/0.8 Apr 3, 2023
@guoshzhao guoshzhao deleted the guzhao/fix_monitor_path branch April 3, 2023 01:22
abuccts pushed a commit that referenced this pull request Apr 14, 2023
**Description**
Looks `grep cgroup /proc/filesystems` doesn't work for NDv4 whose cgroup
version is v1, but the result of this command got v2 for NDv4. Instead,
checking the file existence to judge the cgroup version.
abuccts added a commit that referenced this pull request Apr 14, 2023
**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)

Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working monitor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants