This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Wrong gradients on Windows-GPU #20471
Open
Description
sym.zip
I only see this on Windows. Download the symbol file and run this script:
import mxnet as mx
json_path = 'sym.json'
sym = mx.sym.load(json_path)
def run_example(ctx, reqs):
ex = sym._bind(
ctx,
{
'.Inputs.Input': mx.ndarray.array([[1, 2, 3]], ctx=ctx),
'.Inputs.Target': mx.ndarray.array([[4, 5, 6]], ctx=ctx),
'seq_715248120': mx.ndarray.array([3], ctx=ctx)
},
args_grad={
'.Inputs.Input': mx.ndarray.zeros([1, 3], ctx=ctx),
'.Inputs.Target': mx.ndarray.zeros([1, 3], ctx=ctx),
'seq_715248120': mx.ndarray.zeros([1], ctx=ctx)
},
grad_req=dict(zip(['.Inputs.Input', '.Inputs.Target', 'seq_715248120'], reqs))
)
ex.forward()
ex.backward(out_grads=[mx.ndarray.array([1], ctx=ctx), mx.ndarray.array([1], ctx=ctx)])
print(ex.grad_dict)
print('Input + Target gradient, CPU (OK):')
run_example(mx.cpu(), ['write', 'write', 'null'])
print('\n')
print('Input + Target gradient, GPU (OK):')
run_example(mx.gpu(), ['write', 'write', 'null'])
print('\n')
print('Target gradient only, CPU (OK):')
run_example(mx.cpu(), ['null', 'write', 'null'])
print('\n')
print('Target gradient only, GPU (WRONG):')
run_example(mx.gpu(), ['null', 'write', 'null'])
Output is:
Input + Target gradient, CPU (OK):
{'.Inputs.Input':
[[-0.33333334 -0.33333334 -0.33333334]]
<NDArray 1x3 @cpu(0)>, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @cpu(0)>, 'seq_715248120': None}
Input + Target gradient, GPU (OK):
{'.Inputs.Input':
[[-0.33333334 -0.33333334 -0.33333334]]
<NDArray 1x3 @gpu(0)>, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @gpu(0)>, 'seq_715248120': None}
Target gradient only, CPU (OK):
{'.Inputs.Input': None, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @cpu(0)>, 'seq_715248120': None}
Target gradient only, GPU (WRONG):
{'.Inputs.Input': None, '.Inputs.Target':
[[-0.33333334 -0.33333334 -0.33333334]]
<NDArray 1x3 @gpu(0)>, 'seq_715248120': None}
The Target
gradient has the sign flipped in the last example.
Activity