Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use symbol dtype #5641

Merged
merged 139 commits into from
Aug 16, 2021
Merged

Use symbol dtype #5641

merged 139 commits into from
Aug 16, 2021

Conversation

MARD1NO
Copy link
Contributor

@MARD1NO MARD1NO commented Jul 28, 2021

在api层面,使用Symbol DType

目前已知的一个问题

单纯使用Symbol会导致,即使在x.dtype = oneflow.float32的情况下,id(x.dtype)!=id(oneflow.float32)

这个问题在之前的DType*版本并没有。对比了其他框架,paddle也存在类似的问题,而pytorch没有

houjiang的解决方案

(特别感谢houjiang跟我大晚上结对编程)
在api/python/framework/tensor.cpp做如下修改

const Symbol<DType>* GetTensorDType(const Tensor& tensor) { return &CHECK_JUST(DType::Get(tensor.dtype()->data_type())); }

在api/python/framework/dtype.cpp中,将返回值修改为 std::shared_ptr<Symbol>
并且导出类型修改为

m.attr("char") = &CHECK_JUST(DType::Get(DataType::kChar));
m.attr("float16") = &CHECK_JUST(DType::Get(DataType::kFloat16));
m.attr("float") = &CHECK_JUST(DType::Get(DataType::kFloat));

直接使用Get来返回Symbol对象,并取其地址,以保证对象唯一性。

仅仅使用引用,可能在别的地方存在拷贝,导致前面的问题

而在python_args.cpp这里,我们将对象转换为Symbol*

template<>
Maybe<Symbol<DType>> PythonArg::ObjectAs<Symbol<DType>>() const {
  return *JUST(detail::cast<Symbol<DType>*>(Borrow()));
}

代码走读

DType::data_type 构造函数

现在的DType的构造函数,是由一个宏,来去构造一个Symbol对象

#define DEFINE_GET_DATA_TYPE_FUNCTION(data_type)                                   \
  const Symbol<DType>& DType::data_type() {                                        \
    static const auto& dtype = SymbolOf(DType(OF_PP_CAT(DataType::k, data_type))); \
    return dtype;                                                                  \
  }

这个宏传入一个data_type,比如是Float32,那么底下定义的构造函数,会返回一个const引用

const Symbol<DType>& DType::Float32(){
    static const auto& dtype = SymbolOf(DType(DataType::kFloat32)); 
    return dtype; 
}

DType::Get 函数

该函数构造了一个静态的Hashmap,使用的时候我们可以直接调用Get来查找,避免一次次构造对象,也能保证唯一性

Maybe<const Symbol<DType>&> DType::Get(DataType data_type) {
  static HashMap<DataType, const Symbol<DType>> data_type2dtype{
#define MAKE_ENTRY(data_type) {OF_PP_CAT(DataType::k, data_type), data_type()},
      OF_PP_FOR_EACH_TUPLE(MAKE_ENTRY, DTYPE_SEQ)
#undef MAKE_ENTRY

它的key是 DataType::kxxx,对应的是DType::xxx()构造函数返回的对象。以Float32为例子,它的键值对是:
DataType::kFloat32 : DType::Float32()

使用方法

现在在Tensor内使用的都是Symbol<DType>,在其他底层实现依旧保持原有DataType

而Tensor的dtype方法返回的也改为Symbol,如果想从Tensor取得DataType,可以:

x->dtype()->data_type(); 

@MARD1NO MARD1NO marked this pull request as ready for review July 29, 2021 06:16
@MARD1NO MARD1NO requested a review from lixinqi July 29, 2021 06:17
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 06:32
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 07:35
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 16, 2021 08:24
@MARD1NO MARD1NO requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 08:37
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 16, 2021 09:37
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 16, 2021 09:37
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 10:28
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 11:31
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 16, 2021 12:25
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 16, 2021 13:31
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 142.2ms (= 7108.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6416.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 142.2ms / 128.3ms)

PyTorch resnet50 time: 85.7ms (= 4285.8ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3719.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 85.7ms / 74.4ms)

PyTorch resnet50 time: 57.5ms (= 2872.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.4ms (= 2372.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 57.5ms / 47.4ms)

PyTorch resnet50 time: 50.6ms (= 2527.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.0ms (= 1949.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.30 (= 50.6ms / 39.0ms)

PyTorch resnet50 time: 51.3ms (= 2564.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.2ms (= 1909.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.34 (= 51.3ms / 38.2ms)

@oneflow-ci-bot oneflow-ci-bot merged commit a30d9fc into master Aug 16, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the use_symbol_dtype branch August 16, 2021 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants