Skip to content

Commit

Permalink
Fix PE Unit Test Failure, test=develop (PaddlePaddle#25693)
Browse files Browse the repository at this point in the history
Based on the comment here https://github.com/PaddlePaddle/Paddle/blob/b5f8784cab94eae785659787fc529870c87b254c/paddle/fluid/framework/details/build_strategy.h#L49

The unit test which compares Reduce and AllReduce must have diff. The PR_CI_Night runs on P40 machine and it has 8GB GPU, which is smaller than the 16GB normal CI machines. So we decrease the batch size in the past to make it runnable: https://github.com/PaddlePaddle/Paddle/pull/24651/files  . Decreasing the batch size makes the difference occurs often. So this PR replace the absolute delta by relative delta.

Before this PR, the unit test failure happens with probability about 1/100. After this PR it doesn't happen.
  • Loading branch information
zhhsplendid authored Jul 24, 2020
1 parent cea5086 commit 4fd5585
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def _compare_reduce_and_allreduce(self, use_cuda, delta2=1e-5):
for loss in zip(all_reduce_first_loss, reduce_first_loss):
self.assertAlmostEquals(loss[0], loss[1], delta=1e-5)
for loss in zip(all_reduce_last_loss, reduce_last_loss):
self.assertAlmostEquals(loss[0], loss[1], delta=delta2)
self.assertAlmostEquals(loss[0], loss[1], delta=loss[0] * delta2)

if not use_cuda:
return
Expand Down Expand Up @@ -72,17 +72,17 @@ def _compare_reduce_and_allreduce(self, use_cuda, delta2=1e-5):
for loss in zip(all_reduce_first_loss, all_reduce_first_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=1e-5)
for loss in zip(all_reduce_last_loss, all_reduce_last_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=delta2)
self.assertAlmostEquals(loss[0], loss[1], delta=loss[0] * delta2)

for loss in zip(reduce_first_loss, reduce_first_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=1e-5)
for loss in zip(reduce_last_loss, reduce_last_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=delta2)
self.assertAlmostEquals(loss[0], loss[1], delta=loss[0] * delta2)

for loss in zip(all_reduce_first_loss_seq, reduce_first_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=1e-5)
for loss in zip(all_reduce_last_loss_seq, reduce_last_loss_seq):
self.assertAlmostEquals(loss[0], loss[1], delta=delta2)
self.assertAlmostEquals(loss[0], loss[1], delta=loss[0] * delta2)


class TestResnetWithReduceCPU(TestResnetWithReduceBase):
Expand Down

0 comments on commit 4fd5585

Please sign in to comment.