Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LitData doesn't support s3 bucket connection outside server #183

Open
sanyalsunny111 opened this issue Jun 25, 2024 · 11 comments
Open

LitData doesn't support s3 bucket connection outside server #183

sanyalsunny111 opened this issue Jun 25, 2024 · 11 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sanyalsunny111
Copy link

sanyalsunny111 commented Jun 25, 2024

🚀 Feature

LitData should support s3 bucket connection for streaming data outside of the same server.

Motivation

Current LitData supports s3 bucket connection for within public prod server but not outside of that for instance a GCP server.

Additional context

Sebastian and Adrian motivated me to raise this issue.

@sanyalsunny111 sanyalsunny111 added enhancement New feature or request help wanted Extra attention is needed labels Jun 25, 2024
Copy link

Hi! thanks for your contribution!, great first issue!

@tchaton
Copy link
Collaborator

tchaton commented Jun 25, 2024

Hey @sanyalsunny111,

I am not sure I fully understand the issue.

@rasbt
Copy link
Contributor

rasbt commented Jun 25, 2024

Could you provide the concrete code snippets and file paths (and studio names) to illustrate this to @tchaton with a concrete example to follow @sanyalsunny111

@sanyalsunny111
Copy link
Author

acknowledged I will do it shortly.

@sanyalsunny111
Copy link
Author

@tchaton So, some dataset is uploaded to a publicly accessible s3 bucket and also in data prep of some teamspace. Now that I have tried to access this data using studio's public prod profile. However when I am trying to use the same data using s3 (yes I have configured through aws cli) or teamspace I couldn't access it. Below it a screenshot where it is asking for an access key.

image

@tchaton
Copy link
Collaborator

tchaton commented Jun 27, 2024

Hey @sanyalsunny111. Can you share a reproducible script ?

@sanyalsunny111
Copy link
Author

Sure @tchaton I am using litgpt w/ no changes. Here is a loom video I recorded
https://www.loom.com/share/5b55bc4c23e3403ea3257cdf34ceab2e?sid=761c670b-d52d-465e-bafe-d86be5d239cb

@tchaton
Copy link
Collaborator

tchaton commented Jun 27, 2024

Hey @sanyalsunny111 Any Studio I can duplicate ?

@sanyalsunny111
Copy link
Author

here /thunder/Experiments-Sunny2024

@sanyalsunny111
Copy link
Author

@tchaton Luca made some modifications and for me it is working fine now. Thought of updating you. He changed below mentioned lines in /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/streaming/client.py

if has_shared_credentials_file or not _IS_IN_STUDIO or True:
            self._client = boto3.client(
                "s3", config=botocore.config.Config(retries={"max_attempts": 1000, "mode": "adaptive"}, signature_version=botocore.UNSIGNED)
            ) 

@tchaton
Copy link
Collaborator

tchaton commented Jun 30, 2024

Hey @sanyalsunny111. Can you make a PR with the fix ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants