Skip to content

Conversation

@kuke
Copy link
Contributor

@kuke kuke commented Dec 26, 2017

See Part I in #6701

Please review this part after the Part I merged.

@kuke kuke requested review from qingqing01 and reyoung December 26, 2017 12:39
@kuke
Copy link
Contributor Author

kuke commented Jan 5, 2018

The profiling report looks like:

W0105 14:48:05.080145  1552 profiler.cc:232] Cannot find the push marker of event 'event_without_push', which will be ignored in profiling report.
W0105 14:48:05.080199  1552 profiler.cc:266] Cannot find the pop marker of event 'event_without_pop', which will be ignored in profiling report.

------------------------->     Profiling Report     <-------------------------

Place: CUDA
Time unit: ms
Sorted by total time in descending order in the same thread

Event                Calls       Total       Min.        Max.        Ave.
thread0::op_1        3           0.006565    0.001816    0.002444    0.00218833
thread0::op_3        3           0.005658    0.001816    0.001956    0.001886
thread0::op_4        3           0.005518    0.001816    0.001886    0.00183933
thread0::op_2        3           0.005446    0.001815    0.001816    0.00181533
thread0::evs_op_3    1           0.002165    0.002165    0.002165    0.002165
thread0::evs_op_2    1           0.002026    0.002026    0.002026    0.002026
thread0::evs_op_1    1           0.001956    0.001956    0.001956    0.001956
thread0::evs_op_4    1           0.001816    0.001816    0.001816    0.001816

And this pull request is ready for review. @qingqing01 @reyoung

void PopEvent(const std::string& name, DeviceContext* dev_ctx) {
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id,
dev_ctx);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to refine RecordEvent::RecordEvent() and RecordEvent::~RecordEvent() to reuse these two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::cout << "Place: GPU" << std::endl;
#else
std::cout << "Place: CPU" << std::endl;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If compiling with GPU, the users also can run tasks on CPU only. So here should not use the PADDLE_WITH_CUDA macro. You can use g_state to determine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.

double event_time = rit->CudaElapsedMs(events[i][j]);
#else
double event_time = rit->CpuElapsedMs(events[i][j]);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem with above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pushed_events.erase((++rit).base());
} else {
std::cout << "Warning: can not find the start marker of event "
<< events[i][j].name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG(WARNING)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<< std::setw(data_width) << event_item.max_time
<< std::setw(data_width) << event_item.ave_time << std::endl;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the ParseEvents and print can be two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed all the comments. Thanks!

void PopEvent(const std::string& name, DeviceContext* dev_ctx) {
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id,
dev_ctx);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::cout << "Place: GPU" << std::endl;
#else
std::cout << "Place: CPU" << std::endl;
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.

double event_time = rit->CudaElapsedMs(events[i][j]);
#else
double event_time = rit->CpuElapsedMs(events[i][j]);
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pushed_events.erase((++rit).base());
} else {
std::cout << "Warning: can not find the start marker of event "
<< events[i][j].name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<< std::setw(data_width) << event_item.max_time
<< std::setw(data_width) << event_item.ave_time << std::endl;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#include "paddle/platform/profiler.h"
#include <iomanip>
#include <map>
#include "gflags/gflags.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gflags is not used. This header can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return a.max_time > b.max_time;
default:
return a.ave_time > b.ave_time;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this lambda expression can be selected before the for loop in line 187, and the sort_domain can be saved at the same time. Then pass the sort_domain as a string argument to PrintProfilingReport function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Thanks!

#include "paddle/platform/profiler.h"
#include <iomanip>
#include <map>
#include "gflags/gflags.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return a.max_time > b.max_time;
default:
return a.ave_time > b.ave_time;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kuke kuke merged commit 5e90f5e into PaddlePaddle:develop Jan 8, 2018
return ms;
#else
PADDLE_THROW("CUDA is not enabled");
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not comment on line 110 but only write the comments here.
line 110 should be before line 109.

// The profiler state, the initial value is ProfilerState::kDisabled
static ProfilerState g_state = ProfilerState::kDisabled;
// To record which timer the profiler used, CUDA or CPU.
static std::string g_profiler_place = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_profiler_place and g_state should be thread_local, just like g_thread_id.

name_(std::move(name)),
thread_id_(thread_id),
has_cuda_(false) {
: kind_(kind), name_(name), thread_id_(thread_id), has_cuda_(false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do not you use std::move?

double Event::CpuElapsedUs(const Event& e) const {
return (e.cpu_ns_ - cpu_ns_) / (1000.0);
double Event::CpuElapsedMs(const Event& e) const {
return (e.cpu_ns_ - cpu_ns_) / (1000000.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ii will be better to replace / (1000000.0) with *0.000001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants