Nc/1oct/py prepare multipart #1054

nfcampos · 2024-10-01T19:39:45Z

Add requests-toolbelt dep
Use custom http adapter
- subclass so that we can override blocksize (default one too small to maximize performance on modern network stacks)
- override default pool size to match the max nr of batch tarcing threads (otherwise some threads would be left waiting for connections to be available)
- actually mount the adapter (previous behavior was never mounting the adapter because sessions always have http and https adapters)
- note before retry policy was being ignored, and no transport level retries were being done
Add attachments field to run dict/model
- update _run_transform to collect/drop attachments
- warn if attachments are passed to an endpoint that can't accept them, in this case inject runs without attachments

- subclass so that we can override blocksize (default one too small to maximize performance on modern network stacks) - override default pool size to match the max nr of batch tarcing threads (otherwise some threads would be left waiting for connections to be available) - actually mount the adapter (previous behavior was never mounting the adapter because sessions always have http and https adapters)

- update _run_transform to collect/drop attachments - warn if attachments are passed to an endpoint that can't accept them, in this case inject runs without attachments

- Use streaming multipart encoder form requests_toolbelt - Currently dump each part to json before sending the request as that's the only way to enforce the payload size limit - When we lift payload size limit we should implement true streaming encoding, where each part is only encoded immediately before being sent over the connection, and use transfer-encoding: chunked

hinthornw · 2024-10-01T23:47:11Z

python/langsmith/client.py

@@ -435,6 +443,34 @@ class TracingQueueItem:
    item: Any = field(compare=False)


+class _LangSmithHttpAdapter(requests_adapters.HTTPAdapter):


Not for the PR but I wonder if we should be running simple tests on our min/max bounds for requests lib here since we are taking on some maintanence burdne by subclassing here; just like 1 unit test to check on bounds

python/langsmith/client.py

hinthornw · 2024-10-01T23:54:15Z

python/langsmith/client.py

@@ -781,7 +821,7 @@ def request_with_retries(
            ls_utils.FilterPoolFullWarning(host=str(self._host)),
        ]
        retry_on_: Tuple[Type[BaseException], ...] = (
-            *(retry_on or []),
+            *(retry_on or ()),


Suggested change

*(retry_on or ()),

*(retry_on or EMPTY_SEQ),

I don't think this one works, because its the wrong type

python/langsmith/client.py

hinthornw · 2024-10-01T23:58:39Z

python/tests/unit_tests/test_client.py

-    # Check that no duplicate run_ids are present in the request bodies
-    assert len(request_bodies) == len(set([body["id"] for body in request_bodies]))
+        # Check that no duplicate run_ids are present in the request bodies
+        assert len(request_bodies) == len(set([body["id"] for body in request_bodies]))



Likely stupid, but maybe a quick test on multipart_ingest_runs on two empty lists (or just one empty list) if that's not done elsewhere

python/langsmith/schemas.py

hinthornw · 2024-10-02T00:00:33Z

python/langsmith/client.py

+        self._insert_runtime_env(create_dicts)
+        self._insert_runtime_env(update_dicts)
+        # check size limit
+        size_limit_bytes = (self.info.batch_ingest_config or {}).get(


Should we also have an EMPTY_DICT? (or just a frozen dict-like)

from types import MappingProxyType EMPTY_DICT = MappingProxyType({})

python/langsmith/client.py

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>

nfcampos added 3 commits October 1, 2024 12:31

Add requests-toolbelt dep

a8da15f

Add attachments field to run dict/model

d60ec1f

- update _run_transform to collect/drop attachments - warn if attachments are passed to an endpoint that can't accept them, in this case inject runs without attachments

nfcampos requested review from hinthornw and agola11 October 1, 2024 19:40

nfcampos and others added 13 commits October 1, 2024 13:39

Fix up

a20c94c

Lint

b515cda

Fix up

1cd9732

Add FF

a217db7

Add integration test

454622b

Fix for urllib<2

d8d582f

Lint

69dc1e2

Lint

08ec720

Fix

a1f100e

Fix

12c110f

Fix test

2ceeebf

hinthornw reviewed Oct 2, 2024

View reviewed changes

nfcampos and others added 2 commits October 1, 2024 17:10

Apply suggestions from code review

89510e7

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>

Fix

e68c424

hinthornw approved these changes Oct 2, 2024

View reviewed changes

Actually don't cache FF

39eeeef

nfcampos merged commit d08d392 into main Oct 2, 2024
7 of 9 checks passed

nfcampos deleted the nc/1oct/py-prepare-multipart branch October 2, 2024 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nc/1oct/py prepare multipart #1054

Nc/1oct/py prepare multipart #1054

nfcampos commented Oct 1, 2024

hinthornw Oct 1, 2024

hinthornw Oct 1, 2024

nfcampos Oct 2, 2024

hinthornw Oct 2, 2024

hinthornw Oct 1, 2024

hinthornw Oct 2, 2024

		@@ -435,6 +443,34 @@ class TracingQueueItem:
		item: Any = field(compare=False)


		class _LangSmithHttpAdapter(requests_adapters.HTTPAdapter):

Nc/1oct/py prepare multipart #1054

Nc/1oct/py prepare multipart #1054

Conversation

nfcampos commented Oct 1, 2024

hinthornw Oct 1, 2024

Choose a reason for hiding this comment

hinthornw Oct 1, 2024

Choose a reason for hiding this comment

nfcampos Oct 2, 2024

Choose a reason for hiding this comment

hinthornw Oct 2, 2024

Choose a reason for hiding this comment

hinthornw Oct 1, 2024

Choose a reason for hiding this comment

hinthornw Oct 2, 2024

Choose a reason for hiding this comment