Skip to content

support DataLoader with multi-process mode on MacOs and Windows basically #35854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

wwqgtxx
Copy link

@wwqgtxx wwqgtxx commented Sep 18, 2021

PR types

New features

PR changes

APIs

Describe

After reading code, I found we can simply comment out some platform-related code to make DataLoader with multi-process mode work on MacOs and Windows.

@paddle-bot-old
Copy link

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@CLAassistant
Copy link

CLAassistant commented Sep 18, 2021

CLA assistant check
All committers have signed the CLA.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

Sorry to inform you that 109ca9d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@paddle-bot-old paddle-bot-old bot added the contributor External developers label Sep 1, 2022
@luotao1 luotao1 self-assigned this Sep 2, 2022
@luotao1
Copy link
Contributor

luotao1 commented Sep 2, 2022

Sorry for too late to notice this PR, could you resolve the conflict files to pass the CI first? @wwqgtxx

@luotao1 luotao1 requested review from heavengate and zhiqiu and removed request for zhiqiu September 2, 2022 09:10
@heavengate
Copy link
Contributor

DataLoader unittest is disabled for Mac/Windows currently, could you please remove these lines to reopen DataLoader unites for Mac/Windows to run unitests CI

list(REMOVE_ITEM TEST_OPS test_multiprocess_dataloader_static)

@wwqgtxx
Copy link
Author

wwqgtxx commented Sep 2, 2022

maybe test_multiprocess_dataloader_dataset (Timeout) cause by #44170 #36819

@luotao1
Copy link
Contributor

luotao1 commented Sep 2, 2022

Do you try test_multiprocess_dataloader_dataset successfully on your local mac or windows? You can adjust the elapsed time such as set_tests_properties(jit_kernel_test PROPERTIES TIMEOUT 120) to pass the CI.

@wwqgtxx
Copy link
Author

wwqgtxx commented Sep 2, 2022

Do you try test_multiprocess_dataloader_dataset successfully on your local mac or windows? You can adjust the elapsed time such as set_tests_properties(jit_kernel_test PROPERTIES TIMEOUT 120) to pass the CI.

In fact, when trying to use multiprocessing to pass Tensor, there must be an error.
I ran the test in test_multiprocess_dataloader_dataset locally, and found that the test of passing pure numpy arrays can pass normally. Once a Tensor appears in the return value of the DataSet.__iter__, it will fail.
This behavior is the same as #44170. So this should not be a problem caused by this PR.

@luotao1
Copy link
Contributor

luotao1 commented Sep 6, 2022

@wwqgtxx Discussed with @heavengate:

  1. You can remove list(REMOVE_ITEM TEST_OPS test_multiprocess_dataloader_dataset) to pass the CI
  2. After all CI passed, we will merge this PR
  3. Maybe you could help us to fix Dataset里面使用multiprocessing会报错 #44170

or sys.platform == 'win32'):
warnings.warn(
"DataLoader with multi-process mode is not fully supported on MacOs and Windows currently." \
" Please use multi-process mode with use_shared_memory = False instead")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwqgtxx
现在CI还有多个单测失败,可以继续修一下么?比如这段就会触发test_dataloader_autotune失败
https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/6548861/job/18199166

2022-09-06 16:54:46 C:\home\workspace\Paddle\build\python\paddle\fluid\reader.py:492: UserWarning: DataLoader with multi-process mode is not fully supported on MacOs and Windows currently. Please use multi-process mode with use_shared_memory = False instead
2022-09-06 16:54:46   "DataLoader with multi-process mode is not fully supported on MacOs and Windows currently." \
2022-09-06 16:54:46 test_dataloader_autotune failed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实我也看过ci的日志,大部分单测失败都是因为某个class不能被pickle,就像之前提到的Tensor一样,这个问题可能是因为paddle的一些c/c++代码实现的基础类型就是不能被pickle导致的。
个人感觉作为一个basically的支持,目前的实现也勉强够用(我自己是用来跑PaddleDetection的训练,没遇到过什么错误)
如果要完全解决上述问题,应该要为那些native类型补充缺失的pickle相关接口,这个工程量就相对较大了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwqgtxx

如果要完全解决上述问题,应该要为那些native类型补充缺失的pickle相关接口,这个工程量就相对较大了

  • 确实这个工程量相对比较大了,如果有兴趣的话,可以帮我们来修复,非常欢迎。
  • 我们正在运营一个Paddle Framework Contributor Club (PFCC) 组织,会通过定期分享技术知识与发布开发者主导任务的形式持续为飞桨框架做贡献,详情可见 https://github.com/luotao1 主页说明。
  • 如何加入:可以提交 PR 至 Paddle,代码合入后,我们会邀请你加入 PFCC。非常期待你的加入~

来自飞桨PFCC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 @wwqgtxx 这个PR还有希望继续更新吗?目前pickle已经初步支持,需要补充 mac 和win上的一些传输处理。这块我可以提供一定指导。

Copy link
Collaborator

@ZHUI ZHUI Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#37302 这个pr是支持 multiprocessing 支持的pr,可以根据这个修改一些支持一下 mac or win 平台。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块我可以提供一定指导。

@ZHUI 太好啦

这个PR还有希望继续更新吗?

@wwqgtxx 请问你还有时间和兴趣继续更新么,如果可以的话,稍后我可以拉个群

@paddle-bot
Copy link

paddle-bot bot commented Jan 11, 2023

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

@ZHUI ZHUI reopened this Jan 11, 2023
@paddle-bot
Copy link

paddle-bot bot commented Jan 11, 2023

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants