Skip to content

Parallel download #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 22 additions & 5 deletions src/entitysdk/client.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Identifiable SDK client."""

import concurrent.futures
import io
import os
from pathlib import Path
Expand Down Expand Up @@ -349,8 +350,9 @@
output_path: os.PathLike,
project_context: ProjectContext | None = None,
ignore_directory_name: bool = False,
max_concurrent: int = 4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it serial by default to avoid surprises?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of having max_concurrent not use threads at all, in case it becomes a problem, but I don't forsee it being one (famous last words.)
I think doing it in parallel by default is worthwhile, personally.

Copy link
Collaborator

@GianlucaFicarelli GianlucaFicarelli Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be aware, if we integrate this in a service using fastapi (obi-one), and we have N concurrent requests served by sync endpoints, then we'll create up to N*max_concurrent additional threads (besides the N needed by the fastapi pool, 40 by default according to https://www.starlette.io/threadpool/).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think that's a problem?
obi-one can limit the max_concurrent, if it wants

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but bringing a service to a halt/crash in order to realize that max_concurrent has to be lowered in download_directory is not great, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be surprised if threads doing downloads is what pushes it over the edge; assuming all 40 threads are spawning another 4, that's a limit of 160 - which is high, but not unheard of.

) -> list[Path]:
"""List directory existing entity's endpoint from a directory path."""
"""Download directory of assets."""
output_path = Path(output_path)

if output_path.exists() and output_path.is_file():
Expand Down Expand Up @@ -386,9 +388,8 @@
project_context=project_context,
)

paths = []
for path in contents.files:
paths.append(
if max_concurrent == 1:
paths = [

Check warning on line 392 in src/entitysdk/client.py

View check run for this annotation

Codecov / codecov/patch

src/entitysdk/client.py#L392

Added line #L392 was not covered by tests
self.download_file(
entity_id=entity_id,
entity_type=entity_type,
Expand All @@ -397,7 +398,23 @@
asset_path=path,
project_context=context,
)
)
for path in contents.files
]
else:
with concurrent.futures.ThreadPoolExecutor(max_workers=max_concurrent) as executor:
futures = [
executor.submit(
self.download_file,
entity_id=entity_id,
entity_type=entity_type,
asset_id=asset if asset else asset_id,
output_path=output_path / path,
asset_path=path,
project_context=context,
)
for path in contents.files
]
paths = [future.result() for future in futures]

return paths

Expand Down