adlfs user agent #501

anjaliratnam-msft · 2025-06-23T18:08:32Z

Added adlfs to the user agent to differentiate between adlfs and azure python sdk usage.

Request headers example:
'x-ms-version': 'REDACTED'
'Accept': 'application/xml'
'User-Agent': 'adlfs/2024.12.0 azsdk-python-storage-blob/12.25.1 Python/3.13.5 (Windows-11-10.0.26100-SP0)'
'x-ms-date': 'REDACTED'
'x-ms-client-request-id': '0fbad80d-5059-11f0-9c34-000d3a6d20b2'
'Authorization': 'REDACTED'

kyleknap

Looks good. I just had a couple of suggestions.

kyleknap · 2025-06-25T22:59:15Z

adlfs/spec.py

@@ -2055,21 +2062,26 @@ def connect_client(self):
                        account_url=self.fs.account_url,
                        credential=cred,
                        _location_mode=self.fs.location_mode,
+                        user_agent=self._user_agent,


Ah I missed that we need to set the user_agent in this class as well as in the file system class when we talked about it offline...

Given that we need to duplicate this logic across classes along with if branch statements, it is probably worth we try to consolidate client creation to module level helper function at this point. To do this, let's:

Define a top-level _USER_AGENT that we can use as the constant for the adlfs user agent.

Define a top-level _create_aio_blob_service_client() function that requires an account_url and optional accepts credential and location_mode. It will use _USER_AGENT as part of creating the client

Define a top-level _create_aio_blob_service_client_from_connection_string() function that requires just a connection_string. It will also set _USER_AGENT as part of creating the client.

I think in making this refactor it is going to much more manageable adding the user agent in and plumb in more configurations in the future.

kyleknap · 2025-06-25T23:05:48Z

adlfs/tests/test_spec.py

@@ -2045,3 +2048,15 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir):
        with fs.open("data/afile", "xb") as f:
            pass
    assert fs.cat_file("data/afile") == b"data"
+
+
+def test_user_agent(storage: azure.storage.blob.BlobServiceClient, mocker):


It would be great if we could expand + parameterize this test case to assert user agent being set for:

Both the file system and file classes

Both client creation approaches i.e., using from_connection_string and using the initializer

I think this is important given the amount of branching logic when it comes to client creation so that there is coverage for the various ways clients are created in adlfs

kyleknap

This is looking good. I mainly focused on the implementation. I still want to take a deeper pass on the tests to understand how the scaffolding and fixtures are set up with adlfs as part of reviewing the PR's tests.

kyleknap · 2025-07-09T17:36:40Z

adlfs/spec.py

+    if credential is not None:
+        return AIOBlobServiceClient(
+            account_url=account_url,
+            credential=credential,
+            _location_mode=location_mode,
+            user_agent=_get_user_agent(),
+        )
+    elif location_mode is not None:
+        return AIOBlobServiceClient(
+            account_url=account_url,
+            credential=None,
+            _location_mode=location_mode,
+            user_agent=_get_user_agent(),
+        )
+    else:
+        return AIOBlobServiceClient(
+            account_url=account_url,
+            user_agent=_get_user_agent(),
+        )


For this factory, it would be interesting to see if we can consolidate the actual client instantiation to one instance. To do this we should be able to leverage keyword arguments, for example:

service_client_kwargs = { "account_url": account_url, "user_agent": _get_user_agent(), } if credential is not None: service_client_kwargs["credentials"] = credentials if location_mode is not None: service_client_kwargs["_location_mode"] = location_mode return AIOBlobServiceClient(**service_client_kwargs)

The main reason that I'm suggesting we do this is it reduces the verbosity and makes it easier to follow/add to the logic when there is only one path to instantiating the client.

kyleknap · 2025-07-09T17:41:30Z

adlfs/spec.py

+def _create_aio_blob_service_client(
+    account_url: str,
+    location_mode: Optional[str] = None,
+    credential: Optional[str] = None,


Hmm technically this is not the correct type; it can be more than a string. But it looks like the initializer for the filesystem use this type as well. I think it is fine to keep it as is but it may be worth doing a pass at a later time to add correct typing to the project and integrate mypy into the CI. Mainly the value add is add another level of safety when adding to the project and it allows consumers to use correct types when consuming adlfs.

kyleknap · 2025-07-09T17:46:18Z

adlfs/spec.py

@@ -73,6 +73,12 @@
 _SOCKET_TIMEOUT_DEFAULT = object()


+def _get_user_agent():


In order to break the circular loop here, I think it would be worth moving this logic: https://github.com/anjaliratnam-msft/adlfs/blob/f64dff2b2ce62dd2d194e0be02804104b4505d7d/adlfs/__init__.py#L6-L11 to the utils.py module and then have the __init__.py import __version__ and version_tuple and this module can import __version__. Then we can go back to having a module level _USER_AGENT constant.

Generally, a utils.py module will have all of the shared code and not have any intra project imports which is usually a safe approach to avoiding import errors.

kyleknap · 2025-07-09T17:54:19Z

adlfs/tests/test_spec.py

+    assert mock_client.call_args.kwargs["user_agent"] == f"adlfs/{__version__}"
+
+
+@patch("adlfs.spec.AIOBlobServiceClient")


Could you elaborate on why this patch outside of the mocker is needed? I'm not sure how well the patch will behave when we patch the class here and then patch the from_connection_string of the already patched class later on. I'd assume we'd only need one of the patches for the purpose introspection and mocking.

When I run it on its own the test passes but when I run the whole test directory it fails without patching the class. For some reason it's only this test that's causing issues. I'm pretty sure it's happening because the AIOBlobServiceClient class is getting patched in another test and it's affecting this, but I couldn't figure out where it's coming from.

Yeah that sounds like a patch that is not properly being cleaned up. I'll take a look to see if I can find anything

Actually, in terms of state remaining, we may be running up against the caching feature of fsspec: https://filesystem-spec.readthedocs.io/en/latest/features.html#instance-caching. For any of the calls we make to instantiating an AzureBlobFileSystem, we should probably throw in an skip_instance_cache=True so that we actually create a new file system and try to recreate a new service client.

kyleknap

Looks good. Just leaving comments related to the tests.

kyleknap · 2025-07-14T20:28:24Z

adlfs/tests/test_spec.py

@@ -2045,3 +2047,100 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir):
        with fs.open("data/afile", "xb") as f:
            pass
    assert fs.cat_file("data/afile") == b"data"
+
+
+def test_user_agent_blob_file_connection_str(


For all of these test cases that we are adding, I'm thinking we just move them to a test_user_agent.py file. This follows the pattern of other features in the project that have several related test cases. This is beneficial in that it will:

Reduce adding to the cluttering of test_spec.py

Makes it easier to run just the user agent tests (e.g. py.test test_user_agent.py)

kyleknap · 2025-07-14T20:30:04Z

adlfs/tests/test_spec.py

+    mock_client.assert_called_once()
+    assert "user_agent" in mock_client.call_args.kwargs
+    assert mock_client.call_args.kwargs["user_agent"] == f"adlfs/{__version__}"


It would probably make sense to consolidate these assertions to a helper function (e.g., assert_sets_adlfs_user_agent) and we can just provide it the mock client method whether it be the mock initializer or the from_connection_string mock.

kyleknap · 2025-07-14T20:37:17Z

adlfs/tests/test_spec.py

+    mock_client = mocker.patch.object(
+        AIOBlobServiceClient, "from_connection_string", return_value=mocker.MagicMock()
+    )


Playing around with these mocks, I think it would be worth just having two new patch fixtures: one for the __init__ and one for the from_connection_string() method e.g.,:

@pytest.fixture() def mock_from_connection_string(mocker): return mocker.patch.object( AIOBlobServiceClient, "from_connection_string", autospec=True, side_effect=AIOBlobServiceClient.from_connection_string ) @pytest.fixture() def mock_service_client_init(mocker): return mocker.patch.object( AIOBlobServiceClient, "__init__", autospec=True, side_effect=AIOBlobServiceClient.__init__ )

This is a pattern used in another test case in test_spec.py where it effective will allow us to spy the arguments passed in but delegated to the actual method. Right now, in patching the entire class or building up mocks seems like a lot of scaffolding would be required to be able to get a test passing and I'm not sure if the scaffolding makes sense when we are already leveraging Azurite for the fake. With this approach, we should just be able to get what we want, which is capturing what was passed to the various class instantiation methods.

kyleknap · 2025-07-14T20:41:43Z

adlfs/tests/test_spec.py

+    fs = AzureBlobFileSystem(
+        account_name=storage.account_name,
+    )
+    AzureBlobFile(fs, "data/root/a/file.txt", mode="rb")


For this test case, I'm thinking we either do the following to test the case where we are not using from_connection_string():

Try to generate a SAS and provide it as a credential to the file and file system to make Azurite's authentication happy

Create a new container that allows anonymous reads in Azurite that we reference instead of the data container.

The main reason that I think we should explore that approach is that because we are already using Azurite, it seems ideal to try to avoid building up mocks/fakes if we can avoid it to reduce the overall amount of scaffolding needed.

kyleknap · 2025-07-14T20:42:43Z

adlfs/spec.py

@@ -73,6 +73,12 @@
 _SOCKET_TIMEOUT_DEFAULT = object()


We should also make sure to add a changelog entry for this update here: https://github.com/fsspec/adlfs/blob/main/CHANGELOG.md

kyleknap · 2025-07-14T20:44:35Z

adlfs/tests/test_spec.py

+        return_value=mock_client_instance,
+    )
+
+    AzureBlobFileSystem(account_name=storage.account_name, connection_string=CONN_STR)


Whenever creating an AzureBlobFileSystem, we should definitely be setting skip_instance_cache=True so that we make sure we always construct a new file system for these cases around instantiation

kyleknap

It's looking good. Just had some more comments with the most focus on the tests, but we should be getting close here.

kyleknap · 2025-07-21T22:59:53Z

adlfs/tests/test_user_agent.py

+_USER_AGENT = f"adlfs/{__version__}"
+
+
+def assert_sets_adlfs_user_agent(mock_client):


It might be good to rename mock_client to mock_client_create_method to better indicate it accepts mocks of client creation methods as opposed to a mock of the client class.

kyleknap · 2025-07-21T23:04:33Z

adlfs/tests/test_user_agent.py

+
+
+def assert_sets_adlfs_user_agent(mock_client):
+    mock_client.assert_called_once()


It would be good to double check whether our test cases are actually hitting the AzureBlobFile.connect_client() method and if we are actually asserting against the client creation calls in the AzureBlobFile class. Mainly looking through the adlfs logic, it seems like for the AzureBlobFileSystem, it will always create a client and I'm curious if we are ever creating a client in the AzureBlobFile class and actually have coverage for the lines we updated in that class we changed.

kyleknap · 2025-07-21T23:05:35Z

adlfs/tests/test_user_agent.py

+URL = "http://127.0.0.1:10000"
+ACCOUNT_NAME = "devstoreaccount1"
+KEY = "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="  # NOQA
+CONN_STR = f"DefaultEndpointsProtocol=http;AccountName={ACCOUNT_NAME};AccountKey={KEY};BlobEndpoint={URL}/{ACCOUNT_NAME};"  # NOQA
+DEFAULT_VERSION_ID = "1970-01-01T00:00:00.0000000Z"
+LATEST_VERSION_ID = "2022-01-01T00:00:00.0000000Z"


It might make sense to just create a constants.py in the tests/ directory that the conftest.py and the other test files can import from to reduce the amount of copies we have of these settings.

kyleknap · 2025-07-21T23:07:25Z

adlfs/tests/test_user_agent.py

+def assert_sets_adlfs_user_agent(mock_client):
+    mock_client.assert_called_once()
+    assert "user_agent" in mock_client.call_args.kwargs
+    assert mock_client.call_args.kwargs["user_agent"] == _USER_AGENT


We can probably assert directly against f"adlfs/{__version__}" instead of creating the module constant _USER_AGENT since there is only one instance where we use the user agent in this file.

kyleknap · 2025-07-21T23:09:10Z

CHANGELOG.md

@@ -3,7 +3,7 @@
 Unreleased
 ----------

- .
+- Add adlfs user agent


Let's expand this out to be a little more specific of how the user agent changed to help users understand the change better. Maybe something like?

Added "adlfs" to library's default user agent

kyleknap · 2025-07-21T23:11:58Z

adlfs/tests/test_user_agent.py

+@pytest.fixture(scope="function")
+def mock_container(host):
+    conn_str = f"DefaultEndpointsProtocol=http;AccountName={ACCOUNT_NAME};AccountKey={KEY};BlobEndpoint={URL}/{ACCOUNT_NAME};"  # NOQA
+
+    bbs = BlobServiceClient.from_connection_string(conn_str=conn_str)
+    if "data2" not in [c["name"] for c in bbs.list_containers()]:
+        bbs.create_container("data2", public_access=PublicAccess.Container)
+    container_client = bbs.get_container_client(container="data2")
+    bbs.insert_time = datetime.datetime.now(tz=datetime.timezone.utc).replace(
+        microsecond=0
+    )
+    container_client.upload_blob("root/a/file.txt", b"0123456789")
+
+    yield bbs
+
+    bbs.delete_container("data2")


Do we need this fixture if we are using connection strings and account keys in all of these tests? I'm assuming we'd only need this if we were using anonymous credentials.

kyleknap · 2025-07-21T23:24:17Z

adlfs/tests/test_user_agent.py

+    AzureBlobFile(fs, "data2/root/a/file.txt", mode="rb")
+    assert_sets_adlfs_user_agent(mock_service_client_init)


Instead of updating the library code to support http, another work around that I found is instead of rb we can leverage wb mode to delay making API calls at initializer time e.g.:

f = AzureBlobFile(fs, "data2/root/a/file.txt", mode="wb") assert_sets_adlfs_user_agent(mock_service_client_init) # Probably should add comment on why we close it here f.closed = True

The sort of hacky part is at the end where, if we set the file-like object closed to True it will make sure we do not make any API calls as part of garbage collection when finalizing the blob and cause the test to hang because of ssl connection issues. We should make sure to add a comment explaining that if we adopt this approach.

Unfortunately, it does not look like there are any easy work arounds to be able to avoid SSL errors when working with Azurite and the longer term options would be much larger scope than the intention of this PR, but something we could consider later on:

Run Azurite using HTTPS. This would require us setting up certs for the azurite container

Support HTTP outside of connection strings. I think this is something we can consider, but should be hesitant unless there is a strong reason because of the security implications in not using HTTPS.

kyleknap

Looks good. Just had one more comment on the test cases.

kyleknap · 2025-07-22T22:00:28Z

adlfs/tests/test_user_agent.py

+
+def assert_sets_adlfs_user_agent(mock_client_create_method):
+    mock_client_create_method.assert_called()
+    assert "user_agent" in mock_client_create_method.call_args.kwargs


The call_args property returns only the last call to the mock. So, we may still have the problem where we are only asserting the call on the client created in the file system instead of the file object if the file object just reuses the client from the file system. Instead, we should probably:

Leverage call_args_list to ensure all calls made include the user agent.

Add an expected_call_count with a default of 1 to this helper assertion so that for the file-related tests we can assert that the client was created twice; once in the filesystem and once in the file-like object as there is no way to avoid avoid creating the client in the file system object.

kyleknap · 2025-07-23T18:14:44Z

adlfs/spec.py

+    }
+    if credential is not None:
+        service_client_kwargs["credential"] = credential
+    elif location_mode is not None:


Something really subtle that we missed and I found out after pulling down the PR and trying it locally. This elif needs to be an if. Otherwise, the location_mode does not get passed in when using credentials.

I think it is worth probably adding a test cases for location_mode, but I'm fine in making that a follow up PR after this one is merged. Specifically, I'm think we can:

Rename the test_user_agent.py test suite to test_connect_client.py

Add parameterized tests to make sure that location_mode is respected for only SAS, credential, and account keys using the same mocks and repurposing the assertion method used for the user agent.

kyleknap

Looks good! 🚢 @martindurant this should be set ready for a look on your end. Let us know if you have any questions. A significant portion of the changes is just around updating the tests since there were not many existing test cases around constructing an SDK client. The test cases added focused around setting the user agent, but we plan to follow up with more cases around how adlfs passes parameters to SDK client constructors.

kyleknap reviewed Jun 25, 2025

View reviewed changes

anjaliratnam-msft force-pushed the add_user_agent branch from 1a52ede to 0b9ec60 Compare July 1, 2025 15:57

kyleknap reviewed Jul 9, 2025

View reviewed changes

kyleknap reviewed Jul 14, 2025

View reviewed changes

anjaliratnam-msft added 6 commits July 21, 2025 15:00

added adlfs user agent

9f890a5

fixed test and flake8 errors

d261085

pre-commit fixes

871890c

test/linting fixes

9002f5e

test fixes

eec8b18

updates

f386454

anjaliratnam-msft force-pushed the add_user_agent branch from 634d28b to f386454 Compare July 21, 2025 22:02

kyleknap reviewed Jul 21, 2025

View reviewed changes

kyleknap mentioned this pull request Jul 21, 2025

Refactor do_connect for better logging #504

Open

anjaliratnam-msft added 2 commits July 22, 2025 11:36

updates

c5b799e

linting error

2b93c24

kyleknap reviewed Jul 22, 2025

View reviewed changes

updates

273b32c

kyleknap reviewed Jul 23, 2025

View reviewed changes

updates

fd7275b

kyleknap reviewed Jul 23, 2025

View reviewed changes

kyleknap approved these changes Jul 23, 2025

View reviewed changes

		@@ -73,6 +73,12 @@
		_SOCKET_TIMEOUT_DEFAULT = object()


		def _get_user_agent():

		assert mock_client.call_args.kwargs["user_agent"] == f"adlfs/{__version__}"


		@patch("adlfs.spec.AIOBlobServiceClient")

		_USER_AGENT = f"adlfs/{__version__}"


		def assert_sets_adlfs_user_agent(mock_client):



		def assert_sets_adlfs_user_agent(mock_client):
		mock_client.assert_called_once()

		AzureBlobFile(fs, "data2/root/a/file.txt", mode="rb")
		assert_sets_adlfs_user_agent(mock_service_client_init)

adlfs user agent #501

Are you sure you want to change the base?

adlfs user agent #501

Uh oh!

Conversation

anjaliratnam-msft commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anjaliratnam-msft commented Jun 23, 2025 •

edited

Loading