Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about use_mkldnn flag #10765

Closed
luotao1 opened this issue May 18, 2018 · 15 comments · Fixed by #11319
Closed

Problem about use_mkldnn flag #10765

luotao1 opened this issue May 18, 2018 · 15 comments · Fixed by #11319
Labels

Comments

@luotao1
Copy link
Contributor

luotao1 commented May 18, 2018

This issue is from Intel sides.

The problem we have with use_mkldnn flag is that we need to modify all models to enable newly integrated MKL-DNN ops. It would be great, if we can only add an integration and in case when Fluid is run with use mkldnn parameter, all available Ops are used without modification of model scripts, to pass the option.

@luotao1 luotao1 added the Intel label May 18, 2018
@Superjomn
Copy link
Contributor

I think that makes sense.

@luotao1
Copy link
Contributor Author

luotao1 commented May 21, 2018

How about using MKLDNNPlace() to call all available OPs?

place=fluid.MKLDNNPlace()

@kbinias
Copy link
Contributor

kbinias commented May 21, 2018

It would be great to have global use_mkldnn flag. Now it is very difficult to fallback this flag to some ops, e.g.: elementwise_add, because the python code is auto generate with generate_layer_fn(_OP). Also it would be nice to have another flag not_use_mkldnn on layer level, to switch off MKL-DNN from some layers/ops.

@mozga-intel
Copy link
Contributor

mozga-intel commented May 21, 2018

@luotao1 Thank you for this issue.

I will use the sum operator as an example to describe this problem. So, during the implementation of this operator I have encountered a few difficulties. The current version of the sum operator of MKLDNN algorithm is using inside of the other operators. Therefore the flag can not be uses in this way. For instance:
Backward.py

To that end I agree with @kbinias.

@Superjomn
Copy link
Contributor

it would be nice to have another flag not_use_mkldnn on layer level, to switch off MKL-DNN from some layers/ops.

That makes sense, the user can

  • globally set a default config, ON or OFF
  • set MKLDNN config separately for specific operator

For example, such use case (the name of APIs is not important)

set_default_use_mkldnn(True)

v1 = op0(v0) # use mkldnn by default if supported
# ...
v2 = op1(v1, use_mkldnn=False) # turn off MKLDNN for this operator
# ...

The above function is better.

@luotao1
Copy link
Contributor Author

luotao1 commented May 22, 2018

We will have a discussion about how to add this global flag function in our framework ASAP.

@mrysztow
Copy link

mrysztow commented Jun 6, 2018

@luotao1 Any ETA for this?

@luotao1
Copy link
Contributor Author

luotao1 commented Jun 6, 2018

@mrysztow Discussed with @Superjomn and @tensor-tang, the newest design is:

  • use an environment variable export FLAGS_use_mkldnn=True to control it.
  • add EnableMKLDNN function (maybe rename it) in paddle/fluid/platform/mkldnn_helper.h
    void EnableMKLDNN(
    const std::unique_ptr<paddle::framework::ProgramDesc>& program) {
    for (size_t bid = 0; bid < program->Size(); ++bid) {
    auto* block = program->MutableBlock(bid);
    for (auto* op : block->AllOps()) {
    if (op->HasAttr("use_mkldnn")) {
    op->SetAttr("use_mkldnn", true);
    }
    }
    }
    }
  • call EanbleMKLDNN function in Executer:Run (paddle/fluid/framework/executer.cc) to transpiler the program.

We will finish it ASAP before next meeting.

@tpatejko
Copy link

tpatejko commented Jun 8, 2018

@luotao1 What's the reason for using an environment variable? Does it mean that some parameters will be passed as script's arguments and some of them will be passed as environment variables?

@Superjomn
Copy link
Contributor

Does it mean that some parameters will be passed as script's arguments and some of them will be passed as environment variables?

I think an environment variable might be more general for both python scripts/c++ bin.

Or do you have any suggestion?
@tpatejko

@mrysztow
Copy link

mrysztow commented Jun 8, 2018

@Superjomn I think it is a good idea, as now we need to change all model scripts to add this option. But will it be consistent with focing usage of other backends, like gpu?

@Superjomn
Copy link
Contributor

Superjomn commented Jun 9, 2018

That depends, the MKLDNN might be the first one that has a global switch in runtime.

I suggest we use the environment variable first, with two singleton functions in both python and C++ to wrap it, called static bool DoUseGlobalMKLDNN() and def do_use_global_mkldnn()(name need to futher considered), and use this way first.

If the environment variable's name or this plan is changed, just update those two function is easier and flexible.
@mrysztow

@luotao1
Copy link
Contributor Author

luotao1 commented Jun 11, 2018

@mrysztow You can use FLAGS_use_mkldnn=true to control it now, see #11319 for details.

@tpatejko
Copy link

@Superjomn My main concern was some sort of consistency and intuition behind passing parameters to the models scripts. I mean, a user will have to set an environment variable if they want to run something on CPU with MKLDNN, which is different from running something on GPU, that requires passing an argument to a model script.

However, it's true what you're saying, this approach is more flexible, as it doesn't require changing scripts with models.

@Superjomn
Copy link
Contributor

Yeah, I understand your concern.

the MKLDNN might be the first one that has a global switch in runtime.

The environment variable is a low-level API and easy to hide in many ways.

We can concentrate on the implementation of MKLDNN functions first, and have another discussion about this API after MKLDNN functions are ready and release to users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants