Skip to content

Conversation

dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented Jul 3, 2017

In client.go:SendGrads:

for err := range errCh {
		if err != nil {
			return err
		}

		recv++
		if recv == len(grads) {
			break
		}
	}

should not return err, or the function will stop waiting for other goroutines to finish, causing unexpected errors, and should log the error.

@typhoonzero found that return error immediately will lead to some goroutine get out of control, this PR is a fix.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Merge after @helinwang's review!

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! one comment.
It's good to log the errors, but I don't get why there is problem before?
The result channel is buffered channel rCh := make(chan result, len(names)), I don't get why go-rountine will never be released?

}
}
if len(errs) != 0 {
return errs[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log all the errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they are the same errors because there is only one source of error.

@dzhwinter
Copy link
Contributor Author

@typhoonzero 认为原来的代码中buffered channel在收到第一个err后,return将直接返回,该函数运行结束。由于errCh所在函数已经消失,其余的goroutine全部失控了。

  errCh := make(chan error, len(grads))
  for _, g := range grads {
    go func(g Gradient) {
      err := c.pservers[c.partition(g.Name)].Call("Service.SendGrad", g, nil)
      errCh <- err
    }(g)
  }

要不,我写个程序测试一下?

@helinwang
Copy link
Contributor

@dzhwinter 嗯辛苦测试一下吧~我理解这里errCh即使生成他的function返回,因为被其他go-rountine引用,是不会被释放的,所以不会有问题。也可能是我理解错了。

@luotao1
Copy link
Contributor

luotao1 commented Feb 1, 2019

感谢您给PaddlePaddle贡献代码。由于Paddle V1/V2版本已不再维护,相关代码也已从develop分支上删除,因此关闭您的PR,欢迎您向Paddle最新版-Fluid贡献代码。
Thanks for contributing to PaddlePaddle! Since V1/V2 will not be maintained anymore, and related codes have been deleted from develop branch as well, we close this PR. Welcome to contribute to Fluid——the latest version of PaddlePaddle.

@luotao1 luotao1 closed this Feb 1, 2019
heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants