-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add the parsing part for the profiling tool #7043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The profiling report looks like: And this pull request is ready for review. @qingqing01 @reyoung |
paddle/platform/profiler.cc
Outdated
| void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
| GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
| dev_ctx); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to refine RecordEvent::RecordEvent() and RecordEvent::~RecordEvent() to reuse these two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| std::cout << "Place: GPU" << std::endl; | ||
| #else | ||
| std::cout << "Place: CPU" << std::endl; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If compiling with GPU, the users also can run tasks on CPU only. So here should not use the PADDLE_WITH_CUDA macro. You can use g_state to determine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.
paddle/platform/profiler.cc
Outdated
| double event_time = rit->CudaElapsedMs(events[i][j]); | ||
| #else | ||
| double event_time = rit->CpuElapsedMs(events[i][j]); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem with above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| pushed_events.erase((++rit).base()); | ||
| } else { | ||
| std::cout << "Warning: can not find the start marker of event " | ||
| << events[i][j].name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG(WARNING)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| << std::setw(data_width) << event_item.max_time | ||
| << std::setw(data_width) << event_item.ave_time << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the ParseEvents and print can be two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed all the comments. Thanks!
paddle/platform/profiler.cc
Outdated
| void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
| GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
| dev_ctx); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| std::cout << "Place: GPU" << std::endl; | ||
| #else | ||
| std::cout << "Place: CPU" << std::endl; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.
paddle/platform/profiler.cc
Outdated
| double event_time = rit->CudaElapsedMs(events[i][j]); | ||
| #else | ||
| double event_time = rit->CpuElapsedMs(events[i][j]); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| pushed_events.erase((++rit).base()); | ||
| } else { | ||
| std::cout << "Warning: can not find the start marker of event " | ||
| << events[i][j].name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| << std::setw(data_width) << event_item.max_time | ||
| << std::setw(data_width) << event_item.ave_time << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| #include "paddle/platform/profiler.h" | ||
| #include <iomanip> | ||
| #include <map> | ||
| #include "gflags/gflags.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gflags is not used. This header can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| return a.max_time > b.max_time; | ||
| default: | ||
| return a.ave_time > b.ave_time; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this lambda expression can be selected before the for loop in line 187, and the sort_domain can be saved at the same time. Then pass the sort_domain as a string argument to PrintProfilingReport function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified. Thanks!
paddle/platform/profiler.cc
Outdated
| #include "paddle/platform/profiler.h" | ||
| #include <iomanip> | ||
| #include <map> | ||
| #include "gflags/gflags.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
| return a.max_time > b.max_time; | ||
| default: | ||
| return a.ave_time > b.ave_time; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| return ms; | ||
| #else | ||
| PADDLE_THROW("CUDA is not enabled"); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not comment on line 110 but only write the comments here.
line 110 should be before line 109.
| // The profiler state, the initial value is ProfilerState::kDisabled | ||
| static ProfilerState g_state = ProfilerState::kDisabled; | ||
| // To record which timer the profiler used, CUDA or CPU. | ||
| static std::string g_profiler_place = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_profiler_place and g_state should be thread_local, just like g_thread_id.
| name_(std::move(name)), | ||
| thread_id_(thread_id), | ||
| has_cuda_(false) { | ||
| : kind_(kind), name_(name), thread_id_(thread_id), has_cuda_(false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do not you use std::move?
| double Event::CpuElapsedUs(const Event& e) const { | ||
| return (e.cpu_ns_ - cpu_ns_) / (1000.0); | ||
| double Event::CpuElapsedMs(const Event& e) const { | ||
| return (e.cpu_ns_ - cpu_ns_) / (1000000.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ii will be better to replace / (1000000.0) with *0.000001.
See Part I in #6701
Please review this part after the Part I merged.