enable a oneDNN ITT feature #73
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is an oneDNN feature to enable ITT tagging for oneDNN primitives on Intel VTune Profiler.
Here is the RFCS for this feature.
https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20201014-VTune-ITT-tagging
Intel VTune Profiler is a performance analysis tool for x86 based machines and Intel Data Center GPUs.
VTune helps on finding your performance bottlenecks and provide details Intel platform information.
We work with VTune Team to have some BKMs and tensorboard pluging for the feature.
We would like to enable this feature by default on TensorFlow, so users could identify platform bottlenecks with details information such as L1 caches misses or level of AVX512 vectorization.
We manually built the TF pkg with ITT features, and then benchmark with Intel Model Zoo w/ & w/o this feature.
This feature has no impact on performance based on our perf benchmarking on different Intel Model Zoo models.
If users don't want this feature, they could also disable it on runtime via an environment flag "DNNL_ITT_TASK_LEVEL"