Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of jobs:threads during upload #708

Open
satra opened this issue Jul 12, 2021 · 12 comments
Open

use of jobs:threads during upload #708

satra opened this issue Jul 12, 2021 · 12 comments

Comments

@satra
Copy link
Member

satra commented Jul 12, 2021

@jwodder - is there a limit on the number of jobs/threads during upload? when i use -J 16:10 for example, i see only 10 max uploads taking place. is there a max limit set based on number of procs available on a system?

@jwodder
Copy link
Member

jwodder commented Jul 12, 2021

@satra I believe that, no matter what the number is set to, you're not going to get more than one thread per CPU at once.

@satra
Copy link
Member Author

satra commented Jul 12, 2021

thank you @jwodder - the question is more about the total number of jobs running in dandi-cli.

this machine has 96 vCPUs hence at most 96 concurrent threads. using other tools i was running over 9600 threads (obviously not executed at the same time). thus in dandi-cli, if i requested 100 jobs -J 100:1 or -J 100:10, i'm expecting a 100 uploads to start. is there some specific internal choice that the CLI is making that limits the number of uploads to 10 when i do -J 20:10?

@jwodder
Copy link
Member

jwodder commented Jul 12, 2021

@satra I don't believe either dandi-cli or pyout (which controls the general upload threads) sets a maximum.

@satra
Copy link
Member Author

satra commented Jul 12, 2021

I don't believe either dandi-cli or pyout (which controls the general upload threads) sets a maximum.

then the question is how do you think i should debug this? it definitely does show number of uploads to be limited at 10 and the number of concurrent digesters to also be limited at 10.

@jwodder
Copy link
Member

jwodder commented Jul 12, 2021

@satra This line limits the CLI to processing no more than 10 uploads at once.

@satra
Copy link
Member Author

satra commented Jul 12, 2021

any particular reason for that? can we remove the limit?

@jwodder
Copy link
Member

jwodder commented Jul 12, 2021

@satra I don't know; it was there before I started.

@satra
Copy link
Member Author

satra commented Jul 12, 2021

@jwodder - ok. i'll hack it for now while i upload. @yarikoptic - any reasons why 10? or can that be updated to be the number of jobs?

@yarikoptic
Copy link
Member

NB Yarik got a moment while waiting for a nice gas station fella to look at Yarik's poor minivan ;)

originally limiting was added in 04f980f (with no more than 6). I think it is largely was due to interaction with pyout (which would now in effect would limit now to how many rows it could display on the screen) and back then we were talking about relatively large files so going higher didn't make much sense anyways :-/ the whole upload logic I guess yet to be refactored to separate display (pyout or just pure json dumps etc) from the logic and to not rely on pyout's parallelization via threads.

@jwodder
Copy link
Member

jwodder commented Jul 27, 2022

@yarikoptic @satra Is there anything to be done for this?

@satra
Copy link
Member Author

satra commented Jul 27, 2022

has the pyout limitation been resolved, where there is a decoupling from pyout limits and the number of jobs:threads. if i understand correctly, that was the piece limiting the number of jobs/threads? for example, can i use an arbitrary number of jobs now?

@jwodder
Copy link
Member

jwodder commented Jul 27, 2022

@satra The upload code still "manually" enforces a limit of ten concurrent uploads.

@yarikoptic What, if anything, should be done about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants