Skip to content

fix: boto3 session options #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 24, 2025

Conversation

deependujha
Copy link
Collaborator

@deependujha deependujha commented May 22, 2025

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #603

Adds support to provide s3_session_options while creating s3_client.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@deependujha
Copy link
Collaborator Author

deependujha commented May 22, 2025

Hi @robmarkcole

Can you try with this pr, setting:

s3_uri = "s3://my-dummy-bucket-litdata/simple_data/"

ds = ld.StreamingDataset(s3_uri, s3_session_options = {"profile_name":"default"})

storage_options - what you need while creating client
s3_session_options - what you need while creating session (specific to s3). For other cases it'll be ignored.

Make sure to either uninstall s5cmd, or set DISABLE_S5CMD=1

Copy link

codecov bot commented May 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79%. Comparing base (7194311) to head (2b0b816).
Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #604   +/-   ##
===================================
  Coverage    79%    79%           
===================================
  Files        41     41           
  Lines      6138   6143    +5     
===================================
+ Hits       4838   4843    +5     
  Misses     1300   1300           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@robmarkcole
Copy link
Contributor

robmarkcole commented May 22, 2025

@deependujha now my test script returns botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

⚡ ~ python test_litdata.py
Traceback (most recent call last):
  File "/teamspace/studios/this_studio/test_litdata.py", line 3, in <module>
    dataset = StreamingDataset(
              ^^^^^^^^^^^^^^^^^
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/dataset.py", line 125, in __init__
    self.subsampled_files, self.region_of_interest = subsample_streaming_dataset(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/teamspace/studios/this_studio/litdata/src/litdata/utilities/dataset_utilities.py", line 46, in subsample_streaming_dataset
    cache_path = _try_create_cache_dir(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/teamspace/studios/this_studio/litdata/src/litdata/utilities/dataset_utilities.py", line 224, in _try_create_cache_dir
    updated_at = _read_updated_at(resolved_input_dir, storage_options, s3_session_options, index_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/teamspace/studios/this_studio/litdata/src/litdata/utilities/dataset_utilities.py", line 167, in _read_updated_at
    downloader.download_file(os.path.join(input_dir.url, _INDEX_FILENAME), temp_index_filepath)
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/downloader.py", line 160, in download_file
    self._client.client.download_file(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/boto3/s3/inject.py", line 192, in download_file
    return transfer.download_file(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/boto3/s3/transfer.py", line 406, in download_file
    future.result()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/s3transfer/futures.py", line 264, in result
    raise self._exception
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/s3transfer/tasks.py", line 265, in _main
    self._submit(transfer_future=transfer_future, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/s3transfer/download.py", line 352, in _submit
    response = client.head_object(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.11/site-packages/botocore/client.py", line 1023, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

@deependujha
Copy link
Collaborator Author

can you provide the traceback

@deependujha
Copy link
Collaborator Author

deependujha commented May 22, 2025

btw, can you try using aws cli to check if things actually works, like downloading or listing files? This is weird.

@robmarkcole
Copy link
Contributor

@deependujha cli has no issues, it's boto3 not pickup up the sso creds somehow

⚡ ~ aws s3 ls s3://bucketi/datasets/v3/
                           PRE test/
                           PRE train/
                           PRE val/

@deependujha
Copy link
Collaborator Author

from chatgpt:


🧠 TL;DR

Your AWS CLI works fine because it supports AWS SSO out of the box, but boto3 isn't picking up the SSO credentials automatically unless extra setup is done. That’s why you're seeing:

boto3 → ❌ 403 Forbidden
aws cli → ✅ Works fine


🪵 What's really going on?

When you use SSO-based profiles (credential_process or sso_start_url in your AWS config), the AWS CLI knows how to refresh and cache tokens.

But:

🔥 boto3.Session(profile_name="your-sso-profile") does not automatically handle SSO

  • Unless you explicitly run aws sso login --profile your-sso-profile first, and
  • boto3 needs to be launched in an environment where the cached SSO token can be found.

✅ How to Fix It

✅ 1. Make sure you've run:

aws sso login --profile your-sso-profile

This will cache your credentials at:

~/.aws/sso/cache/*.json

✅ 2. Use this exact code to let boto3 pick up SSO-based profile:

import boto3

session = boto3.Session(profile_name="your-sso-profile")
s3 = session.client("s3")

# Try listing to see if it works
response = s3.list_objects_v2(Bucket="bucketi", Prefix="datasets/v3/")
print(response)

💡 Pro Tip

If you're running this from an environment like:

  • VSCode terminals
  • Notebooks
  • Lambda
  • Docker
  • Background services

You must ensure the environment has access to the local AWS SSO cache files.


🧪 Confirm Boto3 is using the right profile

import boto3
session = boto3.Session(profile_name="your-sso-profile")
print(session.get_credentials().get_frozen_credentials())

This should show your access key, secret, and token — if it doesn't, boto3 isn’t picking up the creds.

@deependujha deependujha changed the title fix: boto session options fix: boto3 session options May 22, 2025
@robmarkcole
Copy link
Contributor

@deependujha I've another test script which includes

import boto3

session = boto3.Session(profile_name='my_profile')

this one IS successful. As I say, for some reason litdata is not using the profile

@deependujha
Copy link
Collaborator Author

I tested it on lightning studio with profile and updated condition. It works for me.

sorry for your inconvenience, can I get status on this?

@robmarkcole
Copy link
Contributor

Same error An error occurred (404) when calling the HeadObject operation: Not Found - did you test on profile that uses sso?

@robmarkcole
Copy link
Contributor

@tchaton I assume you have a company AWS account and could test out SSO auth?

Copy link
Collaborator

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example ?

@tchaton tchaton merged commit 0f4f9b2 into Lightning-AI:main May 24, 2025
32 checks passed
@deependujha deependujha deleted the fix/s3-session-options branch May 24, 2025 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidd
4 participants