-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Make check_repository_consistency run faster by MP
#36175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Other scripts in (For completeness: A few months ago we had a conversation somewhere about adding multiprocessing or not. I remember the general consensus was to leave it as a last resort, as it may lead to other issues. Personally, I think the benefits outweigh the risks :D) |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Other scripts might be also addressed by multiprocessing, but it depends on if a script has a linear logic: |
|
@gante I just made it as an argument |
| with multiprocessing.Pool(4) as p: | ||
| outputs = p.map(compare_files, new_ordered_files) | ||
| for output in outputs: | ||
| non_matching_files += output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious couldn't we get unexpected results with multiprocessing here, for example
new_ordered_files = [modular_B(A), modular_C(B)]
With multiprocessing both of them would be converted "at the same time", but should be converted sequentially because C depends on the converted B model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_ordered_files is computed sequentially. And the MP is only used (after new_ordered_files is computed) in the case where fix_and_overwrite=False, so there is no file change in this case :-)
|
|
||
| with multiprocessing.Pool(4) as p: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just going though that quickly and noticed that 4 is hardcoded here. It is expected @ydshieh? Probably should use args.num_workers no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh.... I forget to update this line after introducing args.num_workers. Thank you for spotting !
What does this PR do?
It takes 3 minutes on
main. This PR simply usesmultiprocessingto make it faster.