testing multiprocessing for faster finds! #63

vsoch · 2022-03-26T03:58:35Z

This might be a terrible idea, but in repos where we have a LOT of files to check, it's getting much slower to do the check. So here I'm going to test using multiprocessing, meaning we can check ~9 files in parallel. I'll try to open a custom action branch so I can test this on a repo I know is rather large (given it passes here of course!).

Signed-off-by: vsoch vsoch@users.noreply.github.com

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch · 2022-03-26T04:22:31Z

holy crap @SuperKogito this went from 4-5 minutes to 44 seconds!! Major improvement!! https://github.com/USRSE/usrse.github.io/runs/5701139638?check_suite_focus=true

vsoch · 2022-03-27T21:37:23Z

@SuperKogito I know you've been busy or not able to respond in the past - I've tested this (even via the action) on external repos so I'm going to merge for the super speed up, and we can open further issue/PR if any new issues arise! I hope you are doing well!

SuperKogito · 2022-03-27T21:52:21Z

I am really sorry for taking some time, I intended to look at it today but couldn't. Overall the structure is nice (kudos for the new class), and the improvements are great ;) I only wonder if this might touch some requests limit if requests are simultaneous. However since the automatic tests didn't fail, I think it is safe to merge.

vsoch · 2022-03-27T22:06:30Z

Oh yay you are around! I think we should actually be OK because it’s parsed on the level of the file (so checks are unique within a process run), and if we hit a case of shared urls across files it either will work off the bat still or retry. I tested here and on our USRSE repo and saw no issues so I think it’s a huge improvement worth it!

But I thought about this, and if we do need to handle duplication across jobs we could always parse urls first, taking this into account, and then run the multiprocess with no duplications.

That might further optimize actually - ok if O try this out and open another PR? And since I know you are around I will indeed wait for your review this time!

SuperKogito · 2022-03-27T22:21:41Z

I agree, I think it is a great improvement 👏 I feel silly how we didn't think of it before 😝
yes filtering the duplicates before multiprocessing should make it even faster - Of course, give it a shot and if it is stable enough let's merge ;)

( just an idea )
I think if you have lengthy checks and looking for further faster processing, it might be worthy to revive #52 sine that will reach the limits of acceleration but asynchronous processing has its drawbacks and I think if we ever pursue that we will need to create different python and action folders (say: urltechie_super_speed :p ) cuz using that will force us to drop a lot of our useful generic features.

vsoch · 2022-03-27T22:24:44Z

okay - I'm on it!

Agreed - let me get in a PR for optimizing the urls we check, and then let's rebase #52 and we can time the master branch against the same with async/aiohttp. If it's an improvement that is noticeable, it's definitely worth considering! If it's a trivial change then probably not worth the pain of async 😆

testing multiprocessing for faster finds!

c992abc

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch mentioned this pull request Mar 26, 2022

testing (hopefully) faster urlchecker updated with multiprocessing USRSE/usrse.github.io#768

Merged

remove extra verbose logging of task info

0821d5d

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch requested a review from SuperKogito March 26, 2022 04:44

vsoch merged commit 1f9bd5c into master Mar 27, 2022

vsoch deleted the test/multiprocessing branch March 27, 2022 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing multiprocessing for faster finds! #63

testing multiprocessing for faster finds! #63

vsoch commented Mar 26, 2022

vsoch commented Mar 26, 2022

vsoch commented Mar 27, 2022

SuperKogito commented Mar 27, 2022

vsoch commented Mar 27, 2022 •

edited

Loading

SuperKogito commented Mar 27, 2022 •

edited

Loading

vsoch commented Mar 27, 2022

testing multiprocessing for faster finds! #63

testing multiprocessing for faster finds! #63

Conversation

vsoch commented Mar 26, 2022

vsoch commented Mar 26, 2022

vsoch commented Mar 27, 2022

SuperKogito commented Mar 27, 2022

vsoch commented Mar 27, 2022 • edited Loading

SuperKogito commented Mar 27, 2022 • edited Loading

vsoch commented Mar 27, 2022

vsoch commented Mar 27, 2022 •

edited

Loading

SuperKogito commented Mar 27, 2022 •

edited

Loading