Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix dot(csr.T, dns)=dns can't be called on cpu and gpu #11087

Merged
merged 1 commit into from
May 31, 2018
Merged

fix dot(csr.T, dns)=dns can't be called on cpu and gpu #11087

merged 1 commit into from
May 31, 2018

Conversation

XiaotaoChen
Copy link
Contributor

@XiaotaoChen XiaotaoChen commented May 29, 2018

Description

This PR is to fix a bug that cause the implement code of dot(csr.T, dns)=dns can't be called on CPU and GPU. This is 10 times faster than before when test dot(csr.T, dns)=dns with density=0.0001.

The reason for this bug is the function of DotForwardInferStorageType in dot-ink.h haven't map this situation that input stype=[csr, default], output stype=[default] and transpose_a=True to the implement of dot(csr.T, dns)=dns

Details

The following code can reproduce this bug.

import mxnet as mx
from mxnet.test_utils import rand_ndarray

def testBug(dev):
    shape_lhs = (200, 200)
    shape_rhs = (200, 200)
    mx_sparse = rand_ndarray(shape_lhs, 'csr', density=0.01).as_in_context(dev)
    mx_dns = rand_ndarray(shape_rhs, 'default', density=1.0).as_in_context(dev)
    mx.nd.dot(mx_sparse, mx_dns, transpose_a=True, transpose_b=False
              , forward_stype='default')
    mx.nd.waitall()
    
if __name__ == "__main__":
    print('test dot(csr.T, dns)=dns on cpu')
    testBug(mx.cpu())
    print('test dot(csr.T, dns)=dns on gpu')
    testBug(mx.gpu())

Here is the log info. It tells the storage types of dot(csr.T, dns)=dns would fallback, and then the actually running code is dot(dns,dns)=dns

test dot(csr.T, dns)=dns on cpu
[21:56:09] src/operator/nn/./../../common/utils.h:416: 
Storage type fallback detected:
operator = dot
input storage types = [csr, default, ]
output storage types = [default, ]
params = {"forward_stype" : default, "transpose_b" : False, "transpose_a" : True, }
context.dev_mask = cpu
The operator with default storage type will be dispatched for execution. You're seeing this warning message because the operator above is unable to process the given ndarrays with specified storage types, context and parameter. Temporary dense ndarrays are generated in order to execute the operator. You can set environment variable MXNET_STORAGE_FALLBACK_LOG_VERBOSE to 0 to suppress this warning.
test dot(csr.T, dns)=dns on gpu
[21:57:01] src/operator/nn/./../../common/utils.h:416: 
Storage type fallback detected:
operator = dot
input storage types = [csr, default, ]
output storage types = [default, ]
params = {"forward_stype" : default, "transpose_b" : False, "transpose_a" : True, }
context.dev_mask = gpu
The operator with default storage type will be dispatched for execution. You're seeing this warning message because the operator above is unable to process the given ndarrays with specified storage types, context and parameter. Temporary dense ndarrays are generated in order to execute the operator. You can set environment variable MXNET_STORAGE_FALLBACK_LOG_VERBOSE to 0 to suppress this warning.

speed up after fixing the bug

test dot(csr.T, dns)=dns on cpu
1.00 % with fallback: 4.566107, without fallback: 4.191508
0.50 % with fallback: 8.580608, without fallback: 8.105466
0.10 % with fallback: 8.982658, without fallback: 8.352470
0.05 % with fallback: 9.716504, without fallback: 9.003553
0.01 % with fallback: 11.551130, without fallback: 10.684023
test dot(csr.T, dns)=dns on gpu
1.00 % with fallback: 0.574556, without fallback: 0.559032
0.50 % with fallback: 1.073330, without fallback: 1.044696
0.10 % with fallback: 3.619832, without fallback: 3.529394
0.05 % with fallback: 4.883098, without fallback: 4.761823
0.01 % with fallback: 9.730866, without fallback: 9.499284

The benchmark script is here https://github.com/XiaotaoChen/incubator-mxnet/blob/Mytest/example/sparse/temp_test/testBug.py
@pengzhao-intel @TaoLv

@pengzhao-intel
Copy link
Contributor

@haojin2

@eric-haibin-lin eric-haibin-lin merged commit 8be4b8e into apache:master May 31, 2018
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants