-
-
Notifications
You must be signed in to change notification settings - Fork 13
testing multiprocessing for faster finds! #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
holy crap @SuperKogito this went from 4-5 minutes to 44 seconds!! Major improvement!! https://github.com/USRSE/usrse.github.io/runs/5701139638?check_suite_focus=true |
@SuperKogito I know you've been busy or not able to respond in the past - I've tested this (even via the action) on external repos so I'm going to merge for the super speed up, and we can open further issue/PR if any new issues arise! I hope you are doing well! |
I am really sorry for taking some time, I intended to look at it today but couldn't. Overall the structure is nice (kudos for the new class), and the improvements are great ;) I only wonder if this might touch some requests limit if requests are simultaneous. However since the automatic tests didn't fail, I think it is safe to merge. |
Oh yay you are around! I think we should actually be OK because it’s parsed on the level of the file (so checks are unique within a process run), and if we hit a case of shared urls across files it either will work off the bat still or retry. I tested here and on our USRSE repo and saw no issues so I think it’s a huge improvement worth it! But I thought about this, and if we do need to handle duplication across jobs we could always parse urls first, taking this into account, and then run the multiprocess with no duplications. That might further optimize actually - ok if O try this out and open another PR? And since I know you are around I will indeed wait for your review this time! |
I agree, I think it is a great improvement 👏 I feel silly how we didn't think of it before 😝 ( just an idea ) |
okay - I'm on it! Agreed - let me get in a PR for optimizing the urls we check, and then let's rebase #52 and we can time the master branch against the same with async/aiohttp. If it's an improvement that is noticeable, it's definitely worth considering! If it's a trivial change then probably not worth the pain of async 😆 |
This might be a terrible idea, but in repos where we have a LOT of files to check, it's getting much slower to do the check. So here I'm going to test using multiprocessing, meaning we can check ~9 files in parallel. I'll try to open a custom action branch so I can test this on a repo I know is rather large (given it passes here of course!).
Signed-off-by: vsoch vsoch@users.noreply.github.com