Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.17】 为 Paddle 新增 pdist API -part #57869

Merged
merged 26 commits into from
Dec 14, 2023
Merged

【Hackathon 5th No.17】 为 Paddle 新增 pdist API -part #57869

merged 26 commits into from
Dec 14, 2023

Conversation

cocoshe
Copy link
Contributor

@cocoshe cocoshe commented Oct 3, 2023

@paddle-bot
Copy link

paddle-bot bot commented Oct 3, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 3, 2023
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 15, 2023

Sorry to inform you that 9c651d9's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Oct 29, 2023

Sorry to inform you that 13d04ad's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 16, 2023

@zxcd 可以辛苦帮忙review一下嘛?看看有没有啥问题~

paddle.enable_static()


class TestpdistAPICase1(TestpdistAPI):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

规范一下测试命名

exe = paddle.static.Executor(self.place)
res = exe.run(feed={'x': self.x}, fetch_list=[out])
out_ref = ref_pdist(self.x, self.p)
np.testing.assert_allclose(out_ref, res[0], rtol=1e-5, atol=1e-5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该不需要使用rtol, atol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> x = np.random.rand(3, 4).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> x == t_x.numpy()
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True,  True])


>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True,  True,  True,  True])


>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True,  True,  True, False])
>>> 
>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True,  True,  True, False])
>>> 
>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True,  True,  True, False])
>>> 
>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([False, False, False,  True,  True])
>>> 
>>> x = np.random.rand(5, 6).astype('float64')
>>> t_x = paddle.to_tensor(x)
>>> np_norm = np.linalg.norm(x, axis=-1)
>>> pd_norm = paddle.linalg.norm(t_x, axis=-1)
>>> np_norm == pd_norm.numpy()
array([ True,  True, False, False,  True])
>>> 

如果不用rtol atol的话精度过不了。我进行了一些尝试,发现norm看样子像是没和numpy对齐,在cdist的单测中也是放宽了精度

np.testing.assert_allclose(out_ref, res[0], rtol=1e-5, atol=1e-5)

python/paddle/nn/functional/distance.py Outdated Show resolved Hide resolved
compute_mode (str, optional): The mode for compute distance.

- ``use_mm_for_euclid_dist_if_necessary`` , for p = 2.0 and (P > 25 or R > 25), it will use matrix multiplication to calculate euclid distance if possible.
- ``use_mm_for_euclid_dist`` , for p = 2.0, it will use matrix multiplication to calculate euclid distance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仅针对pdist这个API来说,cdist中使用到的这几个mode是否会用到?如果完全用不到我理解这个地方可能不需要添加mode参数。

Copy link
Contributor Author

@cocoshe cocoshe Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仅针对pdist这个API来说,cdist中使用到的这几个mode是否会用到?如果完全用不到我理解这个地方可能不需要添加mode参数。

我这里是把pdist当作一种特殊的cdist来实现:cdist输入x:[P,M], y:[R,M], 输出shape: [P,R],所以我在这里就是通过计算两份相同输入的cdist来实现pdist(令y=x),也就是cdist(x, x), 输入shape:[N, M],输出shape:[N, N],然后取上三角(去除对角线)得到结果。

对照起来看的话,cdist中的P和R对应现在输入[N,M]中的N,那当N较大的时候(大于25)我感觉是不是也可以采取cdist中的矩阵策略,当然不采用矩阵方法的话,应该就是直接norm然后取三角了,不知道这么想对不对

Copy link
Contributor

@zxcd zxcd Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从竞品对照来看,pdist的结果都是比较统一的,不存在N大于25时需要进行额外操作。另外从公式来说cdist和pdist的意义并不完全一致,我建议与竞品保持一致,不需要额外添加该参数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从竞品对照来看,pdist的结果都是比较统一的,不存在N大于25时需要进行额外操作。我建议与竞品保持一致,不需要额外添加该参数。

OKK,辛苦review,我之后稍作修改~

Copy link

paddle-ci-bot bot commented Nov 19, 2023

Sorry to inform you that 8bcbe47's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@cocoshe cocoshe requested a review from zxcd November 24, 2023 18:40
self.x = np.random.rand(50, 20).astype('float64')


class TestPdistAPICase8(TestPdistAPI):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议给出有意义的测试命名

.. code-block:: python

>>> import paddle
>>> a = paddle.randn([4, 5])
Copy link
Contributor

@zxcd zxcd Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc中给出seed,不然PR-CI-Static-Check过不了
参考:

>>> paddle.seed(2023)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc中给出seed,不然PR-CI-Static-Check过不了 参考:

>>> paddle.seed(2023)

现在添加了,但是好像没作用嘛?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加了seed之后,你的print的结果也会有变化,这块的输出你可以参考报错的内容

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加了seed之后,你的print的结果也会有变化,这块的输出你可以参考报错的内容

明白了,稍后修改~

@luotao1
Copy link
Contributor

luotao1 commented Dec 6, 2023

2023-12-06 09:50:20 ----------------Check results--------------------
2023-12-06 09:50:20 >>> Sample code test capacity: {'\''cpu'\''}
2023-12-06 09:50:20 >>> 1 sample codes ran failed in env: {'\''cpu'\''}
2023-12-06 09:50:20 <DocTest(<modname?> paddle.nn.functional.pdist:1:0 ln 1)>, running time: 0.974s
2023-12-06 09:50:20 >>> Mistakes found in sample codes in env: {'\''cpu'\''}!

Copy link
Contributor

@zxcd zxcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -162,6 +163,7 @@
'conv3d',
'conv3d_transpose',
'pairwise_distance',
'pdist',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which path do we recommend users to use? If paddle.pdist is recommended, it cannot be added to this __all__ list. if paddle.nn.functional.pdist, it cannot be added to the __all__ list in python/paddle/__init__.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tks for you review, I think I prefer paddle.pdist path, I removed this line then.

jeff41404
jeff41404 previously approved these changes Dec 13, 2023
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Dec 13, 2023

对应中文文档可以提上来

cocoshe and others added 2 commits December 13, 2023 15:59
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit fa440e9 into PaddlePaddle:develop Dec 14, 2023
29 checks passed
@luotao1 luotao1 changed the title 【Hackathon 5th No.17】 为 Paddle 新增 pdist API 【Hackathon 5th No.17】 为 Paddle 新增 pdist API -part Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants