Skip to content

Commit

Permalink
Fix pipeline parallel error of using loss in if statement and fix err…
Browse files Browse the repository at this point in the history
…or of numel (#66980)

* fix pipeline parallel error of using loss in if statement and fix error of numel

* return False and add explanation
  • Loading branch information
jeff41404 authored Aug 13, 2024
1 parent 3aff5b1 commit 29204c6
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
2 changes: 1 addition & 1 deletion paddle/phi/infermeta/spmd_rules/numel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ SpmdInfo NumelInferSpmd(const DistMetaTensor& x) {
"dims_mapping size [%d] are not matched.",
x_ndim,
x_dims_mapping.size()));
TensorDistAttr out_dist_attr;
TensorDistAttr out_dist_attr = CopyTensorDistAttrForOutput(x_dist_attr_src);
out_dist_attr.set_dims_mapping({});
std::vector<int64_t> partial_on_dims;
const auto& dim_mapping = x_dims_mapping;
Expand Down
5 changes: 5 additions & 0 deletions python/paddle/base/dygraph/tensor_patch_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -942,6 +942,11 @@ def __nonzero__(self: Tensor) -> bool:
assert (
numel == 1
), "When Variable is used as the condition of if/while , Variable can only contain one element."
# resolve the error issue in scenario of pipeline parallel
# where some devices do not have this data, return True or False does not affect
# the execution result in those devices, so currently we return False
if self.is_dist() and not self._is_initialized():
return False
assert self._is_initialized(), "tensor not initialized"
return bool(np.array(self) > 0)

Expand Down

0 comments on commit 29204c6

Please sign in to comment.