Skip to content

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented Jun 10, 2021

PR types

Performance optimization

PR changes

OPs

Describe

  1. Refactoring of BatchNorm and Activation to comply with Acquire API.
  2. isCached was removed and  isCachedNonBlocking was renamed to isCached. Same for Acquire functions.
  3. All ops are now to use simplified isCached().

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

arogowie-intel
arogowie-intel previously approved these changes Jun 11, 2021
arlesniak
arlesniak previously approved these changes Jun 11, 2021
Copy link
Contributor

@arlesniak arlesniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


bool isBwdCached() {
const std::string key_pd = key_ + "@bwd_pd";
bwd_pd_ = std::static_pointer_cast<typename TBackward::primitive_desc>(
Copy link
Contributor

@lidanqing-vv lidanqing-vv Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I saw you name key_fpd, Maube you could name here backward pd key as key_bpd ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally key_pd is used everywhere it just at that particular moment I needed two keys so I added key_fpd.

const std::string key_fpd = key_ + "@fwd_pd";
fwd_pd_ = std::static_pointer_cast<typename TForward::primitive_desc>(
dev_ctx_.GetBlob(key_fpd));
return true;
Copy link
Contributor

@lidanqing-vv lidanqing-vv Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwd_pd_ get value by dev_ctx_.GetBlob, then could it be null_ptr? If it is null will it still return true or something like "return bwd_pd_ == nullptr ? false : true" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if bwd_pd_ is valid so then should be fwd_pd_ . But it is a valid point to add some PADDLE_ENFORCE making sure that fwd_pd_ is also valid .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidanqing-intel I added PADDLE_ENFORCE to make sure fwd_pd_ is set when bwd_pd_ is cached. Please sontinue your review

template <typename Arg, typename... Args>
void AcquireForwardPrimitiveDescriptorNonBlocking(Arg&& first_arg,
Args&&... args) {
// This is used when we can recreate FWD PD in BWD so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines comments are describing void AcquireForwardPrimitiveDescriptor(Arg&& first_arg, Args&&... args) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still a relevant comment to this function

void AcquireBackwardPrimitiveDescriptorNonBlocking(Args&&... args) {
// fwd_pd_ is set during grad by calling
// AcquireForwardPrimitiveDescriptorNonBlocking
// AcquireForwardPrimitiveDescriptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does these comments "// fwd_pd_ is set during grad by calling // AcquireForwardPrimitiveDescriptor" still suit here? Here it looks only bwd_pd_ is created and cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AcquireBackwardPrimitiveDescriptor is creating BWD_PD only that is correct , but when implementing Handler for grad(backward) you also call AcquireForwardPrimitiveDescriptor which will create FWD_PD. So in grad handler we call AcquireForwardPrimitiveDescriptor and AcquireBackwardPrimitiveDescriptor

@jczaja jczaja dismissed stale reviews from arlesniak and arogowie-intel via 5c19bb2 June 11, 2021 14:35
lidanqing-vv
lidanqing-vv previously approved these changes Jun 11, 2021
Copy link
Contributor

@lidanqing-vv lidanqing-vv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot for explanation !

@jczaja
Copy link
Contributor Author

jczaja commented Jun 14, 2021

@juncaipeng Hi, There is a problem PR-CI-Windows build. I had it rerunning three times and still it failes due to not related to PR changes errors:

2021-06-12 22:50:51 failed with:
2021-06-12 22:50:51 ninja: error: failed recompaction: Permission denied
2021-06-12 22:50:51 CMake Error at C:/Program Files/CMake/share/cmake-3.17/Modules/CheckIncludeFileCXX.cmake:92 (try_compile):
2021-06-12 22:50:51 Failed to generate test project build system.

Please help

arlesniak
arlesniak previously approved these changes Jun 14, 2021
Copy link
Contributor

@arlesniak arlesniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jakpiase
jakpiase previously approved these changes Jun 14, 2021
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidanqing-vv
Copy link
Contributor

lidanqing-vv commented Jun 15, 2021

@jczaja Wangzhen said it was Windows machine issue last week and now it's recovered.

But me and wanghzen both can not rerun it because CI says: "This commit exceeds rerun limitation."

Please change this PR a bit and submit a new commit so CI could restart.

@jczaja jczaja dismissed stale reviews from jakpiase, arlesniak, and lidanqing-vv via 891032f June 15, 2021 08:57
Copy link
Contributor

@lidanqing-vv lidanqing-vv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja jczaja closed this Jun 16, 2021
@jczaja jczaja reopened this Jun 16, 2021
@jczaja
Copy link
Contributor Author

jczaja commented Jun 16, 2021

@juncaipeng This CheckPRTemplate is being running for many hours . Could you please advice how to make it finished?

@jczaja jczaja closed this Jun 16, 2021
@jczaja jczaja reopened this Jun 16, 2021
@jczaja jczaja changed the title [oneDNN] Further refactoring of oneDNN cache accesses [oneDNN] Further ops refactoring of oneDNN cache accessses Jun 16, 2021
@jczaja jczaja closed this Jun 16, 2021
@jczaja jczaja reopened this Jun 16, 2021
@lidanqing-vv lidanqing-vv changed the title [oneDNN] Further ops refactoring of oneDNN cache accessses Further ops refactoring of oneDNN cache access Jun 16, 2021
@lidanqing-vv lidanqing-vv changed the title Further ops refactoring of oneDNN cache access [oneDNN] Further ops refactoring of oneDNN cache access Jun 16, 2021
@lidanqing-vv lidanqing-vv reopened this Jun 16, 2021
@juncaipeng juncaipeng merged commit f9ce1b1 into PaddlePaddle:develop Jun 16, 2021
@juncaipeng
Copy link
Contributor

@juncaipeng This CheckPRTemplate is being running for many hours . Could you please advice how to make it finished?

In this case, it is necessary to modify the describe of the PR, so the CheckPRTemplate will rerun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants