-
Notifications
You must be signed in to change notification settings - Fork 808
fix(langchain): include content attribute when assistant messages have tool calls #3287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughSpan attribute emission was changed to always include message content (LLM_PROMPTS.{i}.content), JSON-encoding non-strings; tool_calls continue to be logged. New tests validate combinations of content and tool_calls. Tests remove an environment default in conftest and agent tests now use a local prompt constant. CI adds disk cleanup steps before checkout. Changes
Sequence Diagram(s)sequenceDiagram
participant Tests
participant set_chat_request
participant Span
Tests->>set_chat_request: set_chat_request(messages, span_holder)
loop for each message i
set_chat_request->>Span: set_attribute(LLM_PROMPTS.i.role, role)
set_chat_request->>Span: set_attribute(LLM_PROMPTS.i.content, content)
alt message has tool_calls
loop for each tool_call j
set_chat_request->>Span: set_attribute(LLM_PROMPTS.i.tool_calls.j.<field>, value)
end
set_chat_request->>Span: set_attribute(LLM_PROMPTS.i.tool_call_id, id)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 220137b in 1 minute and 9 seconds. Click for details.
- Reviewed
222
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
2
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py:178
- Draft comment:
Good change: always setting the content attribute regardless of tool_calls. Consider adding a test for cases where msg.content is None or a non-string to verify JSON serialization behavior. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The first part is just acknowledging the change, which isn't actionable. The second part suggests tests, but suggesting tests violates our rules - we should assume the author has tested their changes appropriately. The JSON serialization is a straightforward implementation that doesn't seem to need special verification. The suggestion for tests could be valuable since handling None and complex objects in JSON serialization can be tricky. The code change does introduce new serialization behavior that could have edge cases. While testing edge cases is generally good practice, our rules state we should not make purely informative comments or ask authors to verify/test their changes. The implementation is straightforward enough that we should trust the author's testing. The comment should be deleted as it's partly just acknowledging the change and partly suggesting additional testing, neither of which require code changes or provide actionable value.
2. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:156
- Draft comment:
Test coverage is solid. Consider adding an edge-case test for non-string content (e.g., a dict or None) to ensure correct JSON dumping. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the suggestion seems reasonable at first glance, I need to consider: 1) Is this a real edge case that could occur? 2) Do we have enough context about the AIMessage class to know if non-string content is even possible? 3) Is this speculative or addressing a real gap? 4) Looking at the test file's purpose, it's specifically focused on fixing a bug where content was missing when tool_calls were present. The comment may be identifying a valid edge case, but without seeing the AIMessage class definition or understanding if non-string content is even possible, this could be a speculative suggestion. The test file has a clear, focused purpose of fixing a specific bug around missing content with tool calls. Adding edge cases for content types goes beyond the scope of this bug fix. The comment should be deleted as it's speculative without evidence that non-string content is possible, and goes beyond the focused scope of this bug fix.
Workflow ID: wflow_YIRgqPQWu3os9QMg
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 43a49d5 in 47 seconds. Click for details.
- Reviewed
253
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
4
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:27
- Draft comment:
Good refactoring of the messages array: The explicit list format and removal of extraneous commented code improve readability of the test setup. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:60
- Draft comment:
Thorough assertions verify that for an assistant message with tool calls, both the 'content' and 'tool_calls' attributes are captured. Note the use of index-based keys; ensure the message ordering remains consistent in future changes. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
3. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:125
- Draft comment:
The test for an assistant message with only tool calls ensures that empty content is not set. This is a clear and effective way to prevent unnecessary span clutter. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
4. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:152
- Draft comment:
The test for messages with only content (and no tool calls) correctly verifies that no tool_calls attributes are present by filtering keys. This ensures clarity in the span data. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
Workflow ID: wflow_Svg9dXRqRxc9XzYc
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
222-235
: Fix type check: use isinstance instead of identity comparison
if generation.message.content is str
is never true for string instances and prevents the intended branch from executing. Useisinstance(...)
instead.- if generation.message.content is str: + if isinstance(generation.message.content, str): _set_span_attribute( span, f"{prefix}.content", generation.message.content, ) else: _set_span_attribute( span, f"{prefix}.content", json.dumps( generation.message.content, cls=CallbackFilteredJSONEncoder ), )
🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
179-189
: Avoid serializing None to the string "null" for contentCurrent logic JSON-encodes any non-string content. If
msg.content
isNone
, this results in the literal string"null"
being recorded, which is misleading and inconsistent with_set_span_attribute
’s semantics (skip None/empty). TreatNone
explicitly to avoid emitting a"null"
string.Apply this diff to preserve intended behavior:
- # Always set content if it exists, regardless of tool_calls presence - content = ( - msg.content - if isinstance(msg.content, str) - else json.dumps(msg.content, cls=CallbackFilteredJSONEncoder) - ) - _set_span_attribute( - span, - f"{SpanAttributes.LLM_PROMPTS}.{i}.content", - content, - ) + # Always set content if it exists (skip None/empty), regardless of tool_calls presence + raw_content = msg.content + if isinstance(raw_content, str): + content = raw_content + elif raw_content is None: + content = None + else: + content = json.dumps(raw_content, cls=CallbackFilteredJSONEncoder) + _set_span_attribute( + span, + f"{SpanAttributes.LLM_PROMPTS}.{i}.content", + content, + )packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (5)
9-9
: Remove unused import to satisfy Flake8/Ruff
pytest
is imported but unused.-import pytest
24-26
: Explicitly set span.is_recording() to True for deterministic testsRelying on Mock’s truthiness can be flaky. Make
is_recording()
explicitly return True.# Create a mock span mock_span = Mock() mock_span.set_attribute = Mock() + mock_span.is_recording.return_value = True
102-104
: Set is_recording() to True here as wellMirror the first test to ensure attributes are emitted.
# Create a mock span mock_span = Mock() mock_span.set_attribute = Mock() + mock_span.is_recording.return_value = True
151-153
: Set is_recording() to True for consistency across testsEnsure consistent behavior across all tests.
# Create a mock span mock_span = Mock() mock_span.set_attribute = Mock() + mock_span.is_recording.return_value = True
181-181
: Prefer iterating dict directly over calling .keys()Use
for k in dict
instead offor k in dict.keys()
.- tool_call_attributes = [attr for attr in attributes.keys() if "tool_calls" in attr] + tool_call_attributes = [attr for attr in attributes if "tool_calls" in attr]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
CallbackFilteredJSONEncoder
(22-46)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-257)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
set_chat_request
(137-198)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
9-9: pytest
imported but unused
Remove unused import: pytest
(F401)
181-181: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
[error] 9-9: 'pytest' imported but unused
(F401)
🔇 Additional comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
179-189
: LGTM: Always setting content alongside tool_calls improves observabilityUnconditionally setting
gen_ai.prompt.{i}.content
(while still emitting tool_calls) addresses the missing-content gap and aligns with the PR goal. The JSON-encoding path usingCallbackFilteredJSONEncoder
is appropriate for structured content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (3)
50-52
: Use mock Call.args for clarity and simplicityAccessing the args tuple via Call.args improves readability and avoids relying on Call’s indexing behavior.
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list] - attributes = {args[0]: args[1] for args in call_args} + attributes = { + c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list + }
148-151
: Consistently use Call.args to build the attributes mapSame readability improvement as above.
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list] - - attributes = {args[0]: args[1] for args in call_args} + attributes = { + c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list + }
160-160
: Iterate dict directly instead of using .keys() (SIM118)Minor style/perf nit; aligns with Ruff/flake8 suggestions.
- tool_call_attributes = [attr for attr in attributes.keys() if "tool_calls" in attr] + tool_call_attributes = [attr for attr in attributes if "tool_calls" in attr]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
set_chat_request
(137-198)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-257)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
9-9: pytest
imported but unused
Remove unused import: pytest
(F401)
160-160: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
[error] 9-9: 'pytest' imported but unused
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (2)
16-46
: Good coverage for assistant message with both content and tool_callsSolid, readable scenario; verifies role, content, tool_call id/name, tool message tool_call_id, and final assistant reply. This will catch regressions in the new unconditional content logging.
134-159
: LGTM for content-only assistant messageCorrectly validates role and content presence, and ensures no tool_calls attributes leak into this scenario.
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
Outdated
Show resolved
Hide resolved
def test_assistant_message_with_only_tool_calls_no_content(): | ||
""" | ||
Test that when an assistant message has only tool_calls and no content, | ||
the tool_calls are still included and no content attribute is set. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Docstring contradicts the intended fix; content should be present even with tool_calls
The implementation now always sets .content
for assistant messages, including when tool_calls exist (empty string when content is empty). Update the test description accordingly.
- """
- Test that when an assistant message has only tool_calls and no content,
- the tool_calls are still included and no content attribute is set.
- """
+ """
+ Test that when an assistant message has only tool_calls and empty content,
+ both the tool_calls are included and the content attribute is set to an
+ empty string (per the fix to always include content).
+ """
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_assistant_message_with_only_tool_calls_no_content(): | |
""" | |
Test that when an assistant message has only tool_calls and no content, | |
the tool_calls are still included and no content attribute is set. | |
""" | |
def test_assistant_message_with_only_tool_calls_no_content(): | |
""" | |
Test that when an assistant message has only tool_calls and empty content, | |
both the tool_calls are included and the content attribute is set to an | |
empty string (per the fix to always include content). | |
""" | |
# ... rest of the test ... |
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
around lines 97 to 101, the test docstring incorrectly states that assistant
messages with only tool_calls have no content; update the docstring to state
that .content is still set (possibly as an empty string) even when tool_calls
exist so it matches the current implementation which always sets .content for
assistant messages. Ensure the wording clarifies that tool_calls are included
and content may be empty rather than absent.
call_args = [call[0] for call in mock_span.set_attribute.call_args_list] | ||
attributes = {args[0]: args[1] for args in call_args} | ||
|
||
assert f"{SpanAttributes.LLM_PROMPTS}.0.role" in attributes | ||
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.role"] == "assistant" | ||
assert f"{SpanAttributes.LLM_PROMPTS}.0.content" not in attributes | ||
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id" in attributes | ||
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id"] == "call_123" | ||
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name" in attributes | ||
assert ( | ||
attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name"] == "some_tool" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix test expectation: content attribute should exist (empty string) when only tool_calls are present
The updated set_chat_request always emits .content
. The current assertion expects the opposite and will fail.
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list]
- attributes = {args[0]: args[1] for args in call_args}
+ attributes = {
+ c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list
+ }
@@
- assert f"{SpanAttributes.LLM_PROMPTS}.0.content" not in attributes
+ assert f"{SpanAttributes.LLM_PROMPTS}.0.content" in attributes
+ assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.content"] == ""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
call_args = [call[0] for call in mock_span.set_attribute.call_args_list] | |
attributes = {args[0]: args[1] for args in call_args} | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.role" in attributes | |
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.role"] == "assistant" | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.content" not in attributes | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id" in attributes | |
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id"] == "call_123" | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name" in attributes | |
assert ( | |
attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name"] == "some_tool" | |
) | |
# Replace the old call_args extraction with a direct comprehension | |
attributes = { | |
c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list | |
} | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.role" in attributes | |
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.role"] == "assistant" | |
# Content should now always be emitted (empty string when no content) | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.content" in attributes | |
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.content"] == "" | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id" in attributes | |
assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.id"] == "call_123" | |
assert f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name" in attributes | |
assert ( | |
attributes[f"{SpanAttributes.LLM_PROMPTS}.0.tool_calls.0.name"] == "some_tool" | |
) |
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
around lines 120 to 131, the test currently asserts that
f"{SpanAttributes.LLM_PROMPTS}.0.content" is not in attributes but the updated
set_chat_request always emits the .content attribute even when only tool_calls
exist; change the test to assert the .content attribute is present and its value
is an empty string (i.e., replace the "not in" assertion with presence and
equality to ""), keeping the other assertions about tool_calls intact.
…lmetry into gk/fix-langgraph-prompts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed f0b1619 in 38 seconds. Click for details.
- Reviewed
12
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py:6
- Draft comment:
Remove unused import of pytest; it's not used in this file. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_WrhO45L8ebSQTpr1
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (4)
119-121
: Prefer Mock.call.args over tuple indexing for call args (repeat of earlier suggestion)Same refinement as above; avoids non-idiomatic tuple indexing on call objects.
Apply this diff:
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list] - attributes = {args[0]: args[1] for args in call_args} + attributes = { + c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list + }
147-150
: Repeat: simplify call args extraction with .argsSame improvement as in the first test for consistency and clarity.
Apply this diff:
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list] - - attributes = {args[0]: args[1] for args in call_args} + attributes = { + c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list + }
96-101
: Docstring contradicts intended behavior; content should be present (empty string) with tool_callsPer the fix, assistant messages always include a content attribute. Update the docstring to match behavior.
Apply this diff:
- """ - Test that when an assistant message has only tool_calls and no content, - the tool_calls are still included and no content attribute is set. - """ + """ + Test that when an assistant message has only tool_calls and empty content, + both the tool_calls are included and the content attribute is set to an + empty string. + """
122-130
: Fix test expectation: content attribute should exist (empty string) when only tool_calls are presentThe instrumentation now always emits
.content
. This assertion will fail against the new behavior.Apply this diff:
- assert f"{SpanAttributes.LLM_PROMPTS}.0.content" not in attributes + assert f"{SpanAttributes.LLM_PROMPTS}.0.content" in attributes + assert attributes[f"{SpanAttributes.LLM_PROMPTS}.0.content"] == ""
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (2)
49-51
: Simplify call argument extraction using Mock.call.argsUse the public .args API on call objects and avoid building an intermediate list. More readable and robust across Python/mock versions.
Apply this diff:
- call_args = [call[0] for call in mock_span.set_attribute.call_args_list] - attributes = {args[0]: args[1] for args in call_args} + attributes = { + c.args[0]: c.args[1] for c in mock_span.set_attribute.call_args_list + }
159-160
: Nit: avoid unnecessary.keys()
Use key in dict instead of key in dict.keys().
Apply this diff:
- tool_call_attributes = [attr for attr in attributes.keys() if "tool_calls" in attr] + tool_call_attributes = [attr for attr in attributes if "tool_calls" in attr]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
set_chat_request
(137-198)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-257)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
159-159: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
🔇 Additional comments (3)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (3)
15-21
: LGTM: Solid coverage for content + tool_calls scenarioThis test clearly validates both content and tool_calls for assistant messages and protects against regressions in the intended fix.
47-47
: Guard against config gating: ensure prompts are enabled during testsset_chat_request is gated by should_send_prompts(). If configuration disables prompts, these tests will silently fail. Consider patching should_send_prompts to True within each test to make them robust to environment/config changes.
Optionally, update tests to patch the gating function:
+from unittest.mock import Mock, patch @@ - set_chat_request(mock_span, {}, messages, {}, mock_span_holder) + with patch( + "opentelemetry.instrumentation.langchain.span_utils.should_send_prompts", + return_value=True, + ): + set_chat_request(mock_span, {}, messages, {}, mock_span_holder)Repeat the with patch block in the other two tests as well.
Also applies to: 117-117, 145-145
9-13
: LGTM: Imports are minimal and compliantNo unused imports (e.g., pytest) remain; should satisfy Flake8/Ruff F401.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed df3ad15 in 1 minute and 27 seconds. Click for details.
- Reviewed
52
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. .github/workflows/ci.yml:35
- Draft comment:
Consider using 'docker system prune --all --force' for a more robust and clear cleanup of Docker images. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While 'docker system prune' is more comprehensive as it removes all unused containers, networks, and images, the current command 'docker rmi $(docker images -aq)' specifically targets removing all images, which might be exactly what's intended here. The current solution works and is clear in its intent. The suggestion isn't fixing a bug or addressing a critical issue. The suggested command might be more thorough in cleanup, potentially freeing up more space. The current command might leave some Docker artifacts behind. The current command is focused specifically on images which appears to be the intent, and works reliably. The broader cleanup might be unnecessary and could potentially remove things we want to keep. While the suggestion is valid, the current solution is working as intended and the change would be more of a preference than a necessary improvement.
Workflow ID: wflow_iq5SoGfDdetMxyGj
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Free up disk space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated 'Free up disk space' step found in multiple jobs. Consider refactoring this block into a reusable composite action or workflow for DRY code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed d06a826 in 59 seconds. Click for details.
- Reviewed
31
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
3
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. .github/workflows/ci.yml:31
- Draft comment:
Good use of '|| true' to prevent failures if apt-get removal fails. Note that the package 'hhvm' has been removed from the list compared to the original command—please ensure that this omission is intentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. .github/workflows/ci.yml:69
- Draft comment:
The '|| true' addition prevents the step from failing if apt-get removal errors out. Notice that 'hhvm' is no longer included in the removal list; please verify that this change is intended. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. .github/workflows/ci.yml:111
- Draft comment:
Appending '|| true' is a safe practice to avoid CI failures. Also, 'hhvm' has been removed from the package list compared to before—confirm that this is an expected change. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =50%
<= threshold50%
The comment contains two parts. The first part is purely informative about appending '|| true', which violates the rules. The second part asks for confirmation about the removal of 'hhvm' from the package list, which is a valid request for confirmation of an intentional change.
Workflow ID: wflow_xIqrZXiHYi6KJil8
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
.github/workflows/ci.yml (3)
29-37
: Deduplicate the “Free up disk space” step via YAML anchor or reusable actionThis block is repeated across jobs. Define it once and reuse to keep CI DRY and easier to maintain. Also make it more resilient with continue-on-error and a more thorough Docker prune.
Apply this diff to define an anchor and harden the step:
- - name: Free up disk space - run: | - sudo apt-get remove -y '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' '^mysql-.*' azure-cli google-cloud-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri || true - sudo apt-get autoremove -y - sudo apt-get clean - # Remove Docker images - docker rmi $(docker images -aq) || true - # Show available space - df -h + - &free_disk_space + name: Free up disk space + continue-on-error: true + run: | + sudo apt-get remove -y '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' '^mysql-.*' azure-cli google-cloud-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri || true + sudo apt-get autoremove -y || true + sudo apt-get clean || true + # Remove Docker images and unused data + sudo docker system prune -af || true + # Show available space + df -hThen, in the other jobs, replace the repeated block with
- *free_disk_space
(see suggestions on those ranges). Alternatively, factor this into a composite action under .github/actions/free-disk-space. Would you like me to scaffold that?
67-75
: Reuse the anchored step instead of duplicating the cleanup scriptReference the previously defined
&free_disk_space
step to avoid repeating shell blocks in multiple jobs.Apply this diff:
- - name: Free up disk space - run: | - sudo apt-get remove -y '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' '^mysql-.*' azure-cli google-cloud-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri || true - sudo apt-get autoremove -y - sudo apt-get clean - # Remove Docker images - docker rmi $(docker images -aq) || true - # Show available space - df -h + - *free_disk_space
112-120
: Reuse the anchored step and verify we don’t need browsersUse the anchor to dedupe the cleanup step here. Also, since this removes Chrome/Firefox, ensure no tests rely on Playwright/Selenium/Puppeteer.
Apply this diff:
- - name: Free up disk space - run: | - sudo apt-get remove -y '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' '^mysql-.*' azure-cli google-cloud-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri || true - sudo apt-get autoremove -y - sudo apt-get clean - # Remove Docker images - docker rmi $(docker images -aq) || true - # Show available space - df -h + - *free_disk_spaceRun this script to confirm no browser-based tooling is used:
#!/bin/bash set -euo pipefail # Search for common browser testing libs and usage rg -n -C2 -iP '\b(playwright|selenium|puppeteer|webdriver|cypress)\b' \ || echo "No direct browser testing libraries referenced." # Look for typical CI env vars or commands indicating browser use rg -n -C2 -iP '\b(Xvfb|google-chrome|chromium|firefox|geckodriver|chromedriver)\b' \ || echo "No explicit browser invocations found."
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
29-37
: Consider using a maintained “free disk space” action instead of custom shellThe quoted regex patterns for apt may not match as intended on all runners, and apt/dpkg locks can intermittently fail builds. A purpose-built action is more robust and maintained.
Example replacement:
- uses: jlumbroso/free-disk-space@v1.3.1 with: tool-cache: true android: true dotnet: true haskell: true large-packages: true docker-images: true swap-storage: trueIf you prefer to keep the inline script, adding continue-on-error (as suggested) plus
sudo docker system prune -af
generally frees more space than removing images by ID.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.github/workflows/ci.yml
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 0e1108d in 3 minutes and 12 seconds. Click for details.
- Reviewed
56
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
3
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:164
- Draft comment:
Monkey-patched init is not restored. Consider adding teardown logic to revert the patch after tests to avoid side effects. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This is a test configuration file and the monkey patch is specifically for test compatibility. The patch only applies for Python 3.11+ and is scoped to tests. While it's generally good practice to restore monkey patches, in this case it's a test-only modification that won't affect production code. The patch is also simple enough that it's unlikely to cause issues between tests. The comment raises a valid point about test hygiene. Not restoring monkey patches could potentially affect other tests or cause hard-to-debug issues if test order matters. However, this is a very controlled test environment where the patch is intentionally meant to apply across all tests for Python 3.11+. The benefit of restoring it is minimal compared to its purpose. While technically correct, this comment is not important enough to keep given the test-specific context and intentional design of the patch.
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:198
- Draft comment:
Removal of 'query' from the 'match_on' list appears intentional. Confirm that omitting query parameters won't lead to mismatches in tests. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test configuration change. The comment violates the rule about not asking authors to confirm their intentions. The author deliberately made this change as part of a larger VCR configuration update. There's no strong evidence that removing 'query' is incorrect - it's a valid configuration choice that may have been made to make tests less brittle. The query parameter matching could be important for some tests, and removing it might cause tests to pass when they shouldn't. The author likely removed query matching intentionally to make tests more robust. If this causes test issues, they would be caught when running the tests. We shouldn't second-guess configuration choices without evidence of problems. Delete the comment. It violates the rule about asking authors to confirm their intentions, and there's no strong evidence that the configuration change is incorrect.
3. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:204
- Draft comment:
Using a relative path for 'cassette_library_dir' may be fragile. Consider using an absolute path (e.g. via file) to ensure cassette files are reliably located. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test configuration file, and pytest has well-defined behavior for relative paths in test files. The relative path "tests/cassettes" is standard practice in pytest projects. The comment suggests a change that would actually make the code more complex without clear benefit. VCR.py works fine with relative paths in pytest contexts. The comment raises a valid concern about path reliability in general. In some contexts, absolute paths are indeed more reliable. However, in the specific context of pytest and VCR.py, relative paths are the conventional approach and work reliably due to pytest's path handling. The suggested change would add unnecessary complexity. The comment should be deleted as it suggests a change that would complicate the code without providing real benefit in this specific context.
Workflow ID: wflow_dZeQmaR5qwyNM6Nr
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
152-175
: Make the langsmith monkey-patch reversible and robust via pytest’s monkeypatch; also treat session=None.The current autouse fixture permanently overwrites Client.init (not restored after each test) and skips the case where callers pass session=None. Prefer using pytest’s monkeypatch so it auto-restores per test and handle a None value as “unset”.
Apply this diff:
-@pytest.fixture(autouse=True) -def force_langsmith_requests(): +@pytest.fixture(autouse=True) +def force_langsmith_requests(monkeypatch): """Force langsmith to use requests instead of httpx for VCR compatibility.""" # For Python 3.11+, we need to ensure langsmith uses requests instead of httpx # since VCR has better support for requests - if sys.version_info >= (3, 11): - import requests - - # Monkey patch langsmith to always use requests session - try: - import langsmith.client - - original_init = langsmith.client.Client.__init__ - - def patched_init(self, *args, **kwargs): - if "session" not in kwargs: - kwargs["session"] = requests.Session() - return original_init(self, *args, **kwargs) - - langsmith.client.Client.__init__ = patched_init - except ImportError: - pass # langsmith not available + if sys.version_info >= (3, 11): + try: + import requests + import langsmith.client + except ImportError: + return # langsmith not available + + original_init = langsmith.client.Client.__init__ + + def patched_init(self, *args, **kwargs): + # Treat missing or None session as unset + if not kwargs.get("session"): + kwargs["session"] = requests.Session() + return original_init(self, *args, **kwargs) + + # Auto-restore after each test via monkeypatch + monkeypatch.setattr(langsmith.client.Client, "__init__", patched_init, raising=True)Notes:
- If any Client instances are constructed at import time (before tests run), a fixture-level patch won’t catch them. If that’s a concern in this repo, we can move the patch to module import time in conftest instead. Want me to draft that variant?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
🧠 Learnings (2)
📚 Learning: 2025-08-17T15:06:48.093Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.093Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
📚 Learning: 2025-08-17T15:06:48.093Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.093Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording
Applied to files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2)
4-4
: LGTM: sys import for version guard is appropriate.Needed for the Python 3.11+ conditional in the new autouse fixture.
199-206
: Please verify the intent behind dropping “query” from VCR’s matchersOur automated scan didn’t find any
requests
orhttpx
calls using theparams=
argument in the test suite, but omittingquery
frommatch_on
can still lead to cassette collisions when paths are reused with different query strings. If the goal is to avoid spurious mismatches from volatile query values, consider instead:
- Restoring
"query"
in thematch_on
list- Scrubbing only the ephemeral query keys via
filter_query_parameters
- Extending
filter_headers
to include both lowercase and uppercase variants- Clarifying the YAML serializer comment
Suggested diff:
--- a/packages/opentelemetry-instrumentation-langchain/tests/conftest.py +++ b/packages/opentelemetry-instrumentation-langchain/tests/conftest.py @@ -199,12 +199,16 @@ def vcr_config(): "filter_headers": ["authorization", "x-api-key"], - "match_on": ["method", "scheme", "host", "port", "path"], + "filter_headers": ["authorization", "Authorization", "x-api-key", "X-API-Key"], + "match_on": ["method", "scheme", "host", "port", "path", "query"], "before_record_request": before_record_request, "ignore_localhost": True, "decode_compressed_response": True, - # Ensure VCR captures both requests and httpx "serializer": "yaml", "cassette_library_dir": "tests/cassettes", + "filter_query_parameters": [ + ("api_key", "FILTERED"), + ("key", "FILTERED"), + ("token", "FILTERED") + ],Please confirm that no existing tests rely on unfiltered query strings and that re-adding
"query"
won’t break stable recordings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 50919c1 in 49 seconds. Click for details.
- Reviewed
38
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:162
- Draft comment:
Renamed variables ('original_langsmith_init' and 'patched_langsmith_init') improve clarity in the monkey-patching of langsmith.client.Client.init. Ensure that tests cover this patch. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:196
- Draft comment:
VCR config update: Adding 'user-agent' to filter_headers is useful, but removal of 'serializer' and 'cassette_library_dir' in favor of 'allow_playback_repeats' may affect cassette management. Confirm tests and documentation reflect these changes. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
Workflow ID: wflow_7u3OvHgTqRlQLoxT
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
197-205
: Prevent API key leakage via query strings and form bodies in VCR cassettesCurrent redaction handles headers and top-level JSON api_key but misses:
- Query string parameters (e.g., ?api_key=..., ?key=..., ?access_token=...)
- application/x-www-form-urlencoded bodies with sensitive params
These can leak real secrets if env vars are set locally/CI. Add VCR’s filter_query_parameters and filter_post_data_parameters.
Apply this diff:
return { "filter_headers": ["authorization", "x-api-key", "user-agent", "cookie", "set-cookie"], - "match_on": ["method", "scheme", "host", "port", "path"], + "match_on": ["method", "scheme", "host", "port", "path"], "before_record_request": before_record_request, "ignore_localhost": True, "decode_compressed_response": True, "allow_playback_repeats": True, + # Prevent secrets in URLs/bodies from being recorded + "filter_query_parameters": [ + ("api_key", "FILTERED"), + ("key", "FILTERED"), + ("access_token", "FILTERED"), + ("token", "FILTERED"), + ], + "filter_post_data_parameters": [ + ("api_key", "FILTERED"), + ("key", "FILTERED"), + ("access_token", "FILTERED"), + ("token", "FILTERED"), + ], }
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
198-198
: Redact cookies as well to avoid session leakage into cassettesGood call adding user-agent to filtered headers. Consider also filtering cookies, which can carry sensitive session data.
Apply this diff:
- "filter_headers": ["authorization", "x-api-key", "user-agent"], + "filter_headers": ["authorization", "x-api-key", "user-agent", "cookie", "set-cookie"],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.093Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.093Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2)
4-4
: LGTM: sys import for version-gated behaviorImporting sys to gate behavior by Python version is appropriate here.
199-199
: No query-sensitive calls detected—keep current VCR match_on
Verified that LangChain instrumentation tests do not issue requests with query parameters (noparams=
calls or?
URLs in this codepath), so omitting"query"
won’t introduce false positives.
@pytest.fixture(autouse=True) | ||
def force_langsmith_requests(): | ||
"""Force langsmith to use requests instead of httpx for VCR compatibility.""" | ||
# For Python 3.11+, we need to ensure langsmith uses requests instead of httpx | ||
# since VCR has better support for requests | ||
if sys.version_info >= (3, 11): | ||
import requests | ||
|
||
# Monkey patch langsmith to always use requests session | ||
try: | ||
import langsmith.client | ||
|
||
original_langsmith_init = langsmith.client.Client.__init__ | ||
|
||
def patched_langsmith_init(self, *args, **kwargs): | ||
if "session" not in kwargs: | ||
kwargs["session"] = requests.Session() | ||
return original_langsmith_init(self, *args, **kwargs) | ||
|
||
langsmith.client.Client.__init__ = patched_langsmith_init | ||
except ImportError: | ||
pass # langsmith not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid global monkey-patch leaks and wrapper stacking; use pytest’s monkeypatch and handle session=None
The autouse fixture patches langsmith.client.Client.init but never restores it. Because this is function-scoped and autouse, each test wraps the already-wrapped init, leading to stacked wrappers over the entire test run. This can cause confusing call stacks and performance degradation. Also, if a caller passes session=None, the intent (force requests) won’t be applied.
Patch with pytest’s monkeypatch (auto-restores per test) and set a requests.Session when session is missing or None.
Apply this diff:
-@pytest.fixture(autouse=True)
-def force_langsmith_requests():
+@pytest.fixture(autouse=True)
+def force_langsmith_requests(monkeypatch):
"""Force langsmith to use requests instead of httpx for VCR compatibility."""
# For Python 3.11+, we need to ensure langsmith uses requests instead of httpx
# since VCR has better support for requests
- if sys.version_info >= (3, 11):
- import requests
-
- # Monkey patch langsmith to always use requests session
- try:
- import langsmith.client
-
- original_langsmith_init = langsmith.client.Client.__init__
-
- def patched_langsmith_init(self, *args, **kwargs):
- if "session" not in kwargs:
- kwargs["session"] = requests.Session()
- return original_langsmith_init(self, *args, **kwargs)
-
- langsmith.client.Client.__init__ = patched_langsmith_init
- except ImportError:
- pass # langsmith not available
+ if sys.version_info < (3, 11):
+ return
+ try:
+ import requests
+ import langsmith.client
+ except ImportError:
+ return # langsmith not available
+
+ original_langsmith_init = langsmith.client.Client.__init__
+
+ def patched_langsmith_init(self, *args, **kwargs):
+ if kwargs.get("session") is None:
+ kwargs["session"] = requests.Session()
+ return original_langsmith_init(self, *args, **kwargs)
+
+ # Ensure restoration after each test to avoid accumulating wrappers
+ monkeypatch.setattr(langsmith.client.Client, "__init__", patched_langsmith_init)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@pytest.fixture(autouse=True) | |
def force_langsmith_requests(): | |
"""Force langsmith to use requests instead of httpx for VCR compatibility.""" | |
# For Python 3.11+, we need to ensure langsmith uses requests instead of httpx | |
# since VCR has better support for requests | |
if sys.version_info >= (3, 11): | |
import requests | |
# Monkey patch langsmith to always use requests session | |
try: | |
import langsmith.client | |
original_langsmith_init = langsmith.client.Client.__init__ | |
def patched_langsmith_init(self, *args, **kwargs): | |
if "session" not in kwargs: | |
kwargs["session"] = requests.Session() | |
return original_langsmith_init(self, *args, **kwargs) | |
langsmith.client.Client.__init__ = patched_langsmith_init | |
except ImportError: | |
pass # langsmith not available | |
@pytest.fixture(autouse=True) | |
def force_langsmith_requests(monkeypatch): | |
"""Force langsmith to use requests instead of httpx for VCR compatibility.""" | |
# For Python 3.11+, we need to ensure langsmith uses requests instead of httpx | |
# since VCR has better support for requests | |
if sys.version_info < (3, 11): | |
return | |
try: | |
import requests | |
import langsmith.client | |
except ImportError: | |
return # langsmith not available | |
original_langsmith_init = langsmith.client.Client.__init__ | |
def patched_langsmith_init(self, *args, **kwargs): | |
if kwargs.get("session") is None: | |
kwargs["session"] = requests.Session() | |
return original_langsmith_init(self, *args, **kwargs) | |
# Ensure restoration after each test to avoid accumulating wrappers | |
monkeypatch.setattr(langsmith.client.Client, "__init__", patched_langsmith_init) |
🤖 Prompt for AI Agents
In packages/opentelemetry-instrumentation-langchain/tests/conftest.py around
lines 152 to 173, the autouse fixture currently monkey-patches
langsmith.client.Client.__init__ in-place which never gets restored and causes
stacked wrappers across tests and fails to handle callers passing session=None;
replace the manual in-place patch with pytest's monkeypatch fixture so the
change is automatically reverted after each test, and in the patched initializer
ensure you set kwargs["session"] = requests.Session() when "session" is not in
kwargs or kwargs.get("session") is None before delegating to the original
__init__ (store the original via getattr then use monkeypatch.setattr to set the
wrapped initializer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed ef8e8d1 in 54 seconds. Click for details.
- Reviewed
82
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
3
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:142
- Draft comment:
Removal of the LANGSMITH_API_KEY environment setup is intentional since tests now use a constant prompt. Ensure this cleanup doesn't inadvertently affect scenarios that might previously have relied on that key. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:16
- Draft comment:
Replacing the hub.pull call with a constant OPENAI_FUNCTIONS_AGENT_PROMPT improves test determinism by removing an external dependency and the need for LANGSMITH_API_KEY. This refactor is a solid improvement. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:248
- Draft comment:
The assert_message_in_logs helper uses strict equality on the log record body. In future, if additional attributes are added to logs, considering a partial (subset) match could make tests more resilient. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
Workflow ID: wflow_HIGnVXXyq1IRCGxQ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (7)
34-36
: Nit: Name the template variable explicitly for clarityRename prompt to prompt_template to avoid confusion with user input strings used later in other tests.
Apply this minimal rename here:
- prompt = OPENAI_FUNCTIONS_AGENT_PROMPT - agent = create_tool_calling_agent(model, tools, prompt) + prompt_template = OPENAI_FUNCTIONS_AGENT_PROMPT + agent = create_tool_calling_agent(model, tools, prompt_template)
77-112
: Avoid variable shadowing: separate template from user inputUsing prompt for both the ChatPromptTemplate and the user’s input string is easy to misread and invites mistakes. Use prompt_template for the template and user_input for the string.
Apply:
- prompt = OPENAI_FUNCTIONS_AGENT_PROMPT - agent = create_tool_calling_agent(model, tools, prompt) + prompt_template = OPENAI_FUNCTIONS_AGENT_PROMPT + agent = create_tool_calling_agent(model, tools, prompt_template) - prompt = "What is OpenLLMetry?" - response = agent_executor.invoke({"input": prompt}) + user_input = "What is OpenLLMetry?" + response = agent_executor.invoke({"input": user_input}) @@ - assert_message_in_logs(logs, "gen_ai.user.message", {"content": prompt}) + assert_message_in_logs(logs, "gen_ai.user.message", {"content": user_input})
172-176
: Nit: Consistent naming for the prompt templateMirror the earlier rename to keep tests consistent.
- prompt = OPENAI_FUNCTIONS_AGENT_PROMPT - agent = create_tool_calling_agent(model, tools, prompt) + prompt_template = OPENAI_FUNCTIONS_AGENT_PROMPT + agent = create_tool_calling_agent(model, tools, prompt_template)
43-60
: Simplify span name assertions; remove redundant duplicatesset([...]) and duplicate items inside the expected set are unnecessary. A set comprehension reads cleaner, and duplicates in a set literal are misleading.
Example change for each assertion block:
- assert set([span.name for span in spans]) == { - "RunnableLambda.task", - "RunnableParallel<agent_scratchpad>.task", - "RunnableAssign<agent_scratchpad>.task", - "ChatPromptTemplate.task", - "ChatOpenAI.chat", - "ToolsAgentOutputParser.task", - "RunnableSequence.task", - "tavily_search_results_json.tool", - "RunnableLambda.task", - "RunnableParallel<agent_scratchpad>.task", - "RunnableAssign<agent_scratchpad>.task", - "ChatPromptTemplate.task", - "ChatOpenAI.chat", - "ToolsAgentOutputParser.task", - "RunnableSequence.task", - "AgentExecutor.workflow", - } + assert {span.name for span in spans} == { + "AgentExecutor.workflow", + "ChatOpenAI.chat", + "ChatPromptTemplate.task", + "RunnableAssign<agent_scratchpad>.task", + "RunnableLambda.task", + "RunnableParallel<agent_scratchpad>.task", + "RunnableSequence.task", + "ToolsAgentOutputParser.task", + "tavily_search_results_json.tool", + }Repeat similarly for the other two blocks.
Also applies to: 87-104, 181-198
106-108
: Reduce brittleness of exact log count assertionsExact counts often break when adding new observability fields/events. Since you already assert presence and content of specific events, consider asserting a lower bound instead.
- assert len(logs) == 8 + assert len(logs) >= 8You can keep the detailed event assertions below to guard against regressions.
Also applies to: 200-205
1-1
: Make log body matching resilient to additional fields; generalize typeTests currently require exact equality with the log body. As instrumentation evolves (e.g., always including content), additional fields may appear and cause false negatives. Match expected as a subset of the body and accept lists/dicts as-is. Also, the exporter commonly returns a list; Sequence is a better type than Tuple.
-from typing import Tuple +from typing import Sequence-def assert_message_in_logs( - logs: Tuple[LogData], event_name: str, expected_content: dict -): +def assert_message_in_logs( + logs: Sequence[LogData], event_name: str, expected_content: dict +): @@ - assert any(dict(log.log_record.body) == expected_content for log in logs) + # Allow extra keys in the body; require at least the expected subset to match. + assert any( + dict(log.log_record.body).items() >= expected_content.items() for log in logs + )Also applies to: 248-256
16-24
: No lingering hub references; consider updating outdated test commentThe grep commands confirm there are no actual
hub.pull
calls orhub
imports remaining. Only the comment in the test still referenceshub.pull
, which can be updated for clarity:
- packages/opentelemetry-instrumentation-langchain/tests/test_agents.py: line 16 – update or remove the
# Constant prompt template to replace hub.pull("hwchase17/openai-functions-agent")
comment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
(0 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_agents.py
(5 hunks)
💤 Files with no reviewable changes (1)
- packages/opentelemetry-instrumentation-langchain/tests/conftest.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (1)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (1)
6-6
: LGTM: Explicit prompt imports are correct and future-proofSwitching to ChatPromptTemplate and MessagesPlaceholder is the right call for deterministic, hub-free tests.
feat(instrumentation): ...
orfix(instrumentation): ...
.Important
Fixes missing content in telemetry for assistant messages with tool calls in LangChain instrumentation and adds tests to validate this behavior.
set_chat_request()
inspan_utils.py
now always includes message content for assistant messages, even when tool calls are present.span_utils.py
.test_tool_call_content.py
to validate assistant message content and tool-call attributes.test_agents.py
to use a constant prompt template and validate log events.ci.yml
to prevent low-disk-space failures during CI runs.This description was created by
for ef8e8d1. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores