Incorrect Inference.infer results when running with multiple GPUs.

Inference.infer will have incorrect results when running with multiple GPUs.

Below is the output data of the first convolution layer for 4 example instances, when trainer_count=1 (upper figure) and trainer_count=2 (lower figure).

<img width="495" alt="screen shot 2017-07-26 at 6 27 09 pm" src="https://user-images.githubusercontent.com/7038341/28616654-1a02941e-7230-11e7-9bc8-ad36cbc3170a.png">

<img width="513" alt="screen shot 2017-07-26 at 6 27 48 pm" src="https://user-images.githubusercontent.com/7038341/28616665-2861c6ba-7230-11e7-8399-9c40c7a324e4.png">

The output results are wrong if trainer_count > 1 (use_gpu=True).

If we just print the input data layer, no difference is found between the two cases, indicating that the problem might exist in models instead of data allocation across GPUs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect Inference.infer results when running with multiple GPUs. #3073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect Inference.infer results when running with multiple GPUs. #3073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions