Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DistDataloader] Update implementation, add nested.py #8380

Merged
merged 7 commits into from
May 9, 2024

Conversation

DesmonDay
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Description

  1. Optimize implementation of distributed dataloader next function.
  2. Add paddle.utils.nested

Copy link

paddle-bot bot commented May 7, 2024

Thanks for your contribution!

ZHUI
ZHUI previously approved these changes May 7, 2024
data = None
if self._need_data:
try:
data = next(self._dataloader_iter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy可以放到这里

Suggested change
data = next(self._dataloader_iter)
data = next(self._dataloader_iter)
data = nested_copy_place(data, place=paddle.framework._current_expected_place())

nested_copy_place,
nested_empty_tensor,
nested_reduce_tensor,
)

_MAX_DATA_DIM = 64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除吧,应该没用

num_workers=self.args.dataloader_num_workers,
)
if self.args.distributed_dataloader:
return _DataLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然都这样了,这里就直接 DistDataLoader

删掉_DataLoader = DistDataLoader if self.args.distributed_dataloader else DataLoader

ZHUI
ZHUI previously approved these changes May 8, 2024
Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 13.39286% with 97 lines in your changes are missing coverage. Please review.

Project coverage is 55.43%. Comparing base (d6ac1bd) to head (14148f2).

Files Patch % Lines
paddlenlp/data/dist_dataloader.py 6.00% 47 Missing ⚠️
paddlenlp/utils/nested.py 18.00% 41 Missing ⚠️
paddlenlp/trainer/trainer.py 10.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8380      +/-   ##
===========================================
+ Coverage    55.42%   55.43%   +0.01%     
===========================================
  Files          615      616       +1     
  Lines        96235    96209      -26     
===========================================
+ Hits         53335    53336       +1     
+ Misses       42900    42873      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wawltor wawltor merged commit a139758 into PaddlePaddle:develop May 9, 2024
8 of 11 checks passed
DesmonDay added a commit to DesmonDay/PaddleNLP that referenced this pull request May 13, 2024
)

* Fix sharding overlap bug

* [DistDataloader] Update implementation, add nested.py

* Fix pipeline

* add first try

* update dataloader

---------

Co-authored-by: lugimzzz <zhenglujing@baidu.com>
DesmonDay added a commit to DesmonDay/PaddleNLP that referenced this pull request May 13, 2024
)

* Fix sharding overlap bug

* [DistDataloader] Update implementation, add nested.py

* Fix pipeline

* add first try

* update dataloader

---------

Co-authored-by: lugimzzz <zhenglujing@baidu.com>
DesmonDay added a commit that referenced this pull request May 13, 2024
* [DistDataloader] Update implementation, add nested.py (#8380)
* fix distdataloader, fix eval with dp group (#8420)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants