Skip to content

Conversation

@ronensc
Copy link
Collaborator

@ronensc ronensc commented Jul 29, 2025

Continuation of #3094

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Fixes #2271


Important

Ensure LLM spans are created for sync cases in Langchain instrumentation, with updated context propagation and tests.

  • Behavior:
    • Ensures LLM spans are created for synchronous cases in TraceloopCallbackHandler.
    • Adds context propagation for sync callbacks in _BaseCallbackManagerInitWrapper.
    • Suppresses model instrumentation in legacy chains using SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY in __init__.py and callback_handler.py.
  • Testing:
    • Adds tests for langgraph in test_langgraph.py and test_langchain_metrics.py.
    • Includes VCR cassettes for various langgraph scenarios.
  • Dependencies:
    • Updates pyproject.toml to include langgraph dependency.

This description was created by Ellipsis for b0dd14e. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Added integration and tests for LangGraph workflows with OpenAI's GPT-4o model, enabling state graph-based processing of mathematical requests.
    • Introduced a sample application demonstrating LangGraph and OpenAI integration.
    • Enhanced metrics and tracing for LangGraph workflows, including token usage and operation duration.
  • Bug Fixes

    • Improved robustness in span attribute setting to ensure non-empty model names.
    • Refined context token management for better tracing and suppression of nested instrumentation.
  • Tests

    • Added comprehensive tests and cassettes for LangGraph workflows, covering both synchronous and asynchronous invocations.
    • Introduced new metrics tests to validate token and duration metrics for LangGraph operations.
  • Chores

    • Updated test dependencies to include support for LangGraph.

@coderabbitai
Copy link

coderabbitai bot commented Jul 29, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-langchain/poetry.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce improved context propagation and suppression logic in the Langchain OpenTelemetry instrumentation to ensure OpenAI requests made within LangGraph nodes are traced correctly. New test cases and cassettes validate tracing and metrics for OpenAI calls in LangGraph workflows, and a sample application demonstrates the integration.

Changes

Cohort / File(s) Change Summary
Instrumentation Logic Updates
opentelemetry/instrumentation/langchain/__init__.py, opentelemetry/instrumentation/langchain/callback_handler.py
Refactored callback handler/context management, added suppression key handling, improved context token attach/detach, removed unused helper, and ensured robust span attribute setting.
Test Cassettes for LangGraph
tests/cassettes/test_langgraph/*, tests/metrics/cassettes/test_langchain_metrics/test_langgraph_metrics.yaml
Added new VCR cassettes recording OpenAI API interactions for LangGraph-based tests.
LangGraph Workflow Instrumentation Tests
tests/test_langgraph.py
Added tests to verify tracing spans, attributes, parent-child relationships, and context management for OpenAI calls in LangGraph workflows (sync and async).
LangGraph Metrics Tests
tests/metrics/test_langchain_metrics.py
Added test for LangGraph metrics, verifying token usage, operation duration, and generation choices metrics for OpenAI calls.
Test Infrastructure
tests/conftest.py
Modified fixture to pass both tracer and meter providers to OpenAI instrumentor for consistency.
Sample Application
sample_app/langgraph_openai.py
Added a sample app demonstrating OpenAI client usage within a LangGraph workflow and Traceloop SDK integration.
Test Dependency Update
pyproject.toml
Added langgraph as a test dependency for the instrumentation package.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LangGraph Workflow
    participant LangGraph Node (calculate)
    participant OpenAI Client
    participant OpenTelemetry Instrumentation

    User->>LangGraph Workflow: invoke(request)
    LangGraph Workflow->>LangGraph Node (calculate): execute(state)
    LangGraph Node (calculate)->>OpenAI Client: create chat completion
    OpenAI Client->>OpenTelemetry Instrumentation: (instrumented) send request
    OpenTelemetry Instrumentation->>OpenAI Client: inject tracing headers, manage context
    OpenAI Client-->>LangGraph Node (calculate): response
    LangGraph Node (calculate)-->>LangGraph Workflow: result
    LangGraph Workflow-->>User: result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Assessment against linked issues

Objective Addressed Explanation
OpenAI requests in LangGraph nodes are traced with spans and LLM/gen_ai metrics (#2271)
Tracing spans include detailed attributes: model info, prompt, completion, token usage (#2271)
Works for both sync and async LangGraph node invocations (#2271)
Context propagation and suppression logic are robust for OpenAI client usage in LangGraph (#2271)

Poem

In LangGraph’s warren, the traces were lost,
OpenAI calls wandered, a rabbit tempest-tossed.
But now with fresh context and tokens in tow,
Each span hops along where the workflows go.
Metrics and traces, all neatly aligned—
The bug is now burrowed and left far behind!
🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to b0dd14e in 3 minutes and 57 seconds. Click for details.
  • Reviewed 1013 lines of code in 12 files
  • Skipped 1 files when reviewing.
  • Skipped posting 13 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:189
  • Draft comment:
    Renaming the parameter from '_callback_manager' to '_callback_handler' in _BaseCallbackManagerInitWrapper is clear. Just ensure that all type annotations and usage match the intended handler type consistently.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that type annotations and usage match the intended handler type consistently. This falls under asking the author to ensure something, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue, making it not useful according to the guidelines.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:226
  • Draft comment:
    In _create_span, the token is attached conditionally for synchronous callbacks. This is good; ensure that downstream code (e.g. _end_span) always detaches these tokens when present to avoid context accumulation. The current use in _end_span appears correct.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 40% <= threshold 50% The comment is suggesting a specific action to ensure that tokens are detached in downstream code, which is a valid concern for preventing context accumulation. However, it also includes a directive to "ensure" something, which is against the rules. The comment also states that the current use appears correct, which is informative. Overall, the comment is not fully compliant with the rules.
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:302
  • Draft comment:
    In _create_llm_span, the suppression key is attached via context_api.set_value with SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY. This fallback is appropriate for legacy chains, but double-check that its token is detached later (e.g., in on_chain_end) to prevent unintended propagation of the suppression flag.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to double-check that a token is detached later in the code. This falls under the rule of not asking the author to double-check things or ensure behavior is intended. Therefore, this comment should be removed.
4. packages/opentelemetry-instrumentation-langchain/pyproject.toml:60
  • Draft comment:
    The addition of the 'langgraph' dependency is appropriate for the new requirements. Ensure version '^0.4' is compatible with your other dependencies.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is discussing a dependency change and advising to ensure compatibility, which is against the rules. It doesn't provide a specific suggestion or ask for a specific test to be written.
5. packages/sample-app/sample_app/langgraph_openai.py:6
  • Draft comment:
    The sample app demonstrates a correct usage of LangGraph instrumentation with OpenAI. The initialization with Traceloop is clear, and the flow properly compiles and invokes the langgraph workflow.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
6. packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py:36
  • Draft comment:
    The tests verifying span names and parent-child relationships in the langgraph workflow look comprehensive. The hard-coded attribute assertions (e.g. token counts) are acceptable if they reflect stable responses, but be aware that upstream changes might require updates.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It mentions that the tests look comprehensive and warns about potential future updates due to upstream changes, which is not allowed by the rules.
7. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:244
  • Draft comment:
    Typo: In the comment, "this should helps as a fallback" should be "this should help as a fallback".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct about the grammar, it's an extremely minor issue that doesn't affect code functionality or readability. The meaning is perfectly clear despite the small grammatical error. Comments about typos in comments are generally not important enough to warrant a PR comment. The grammar error could be seen as unprofessional and might be worth fixing since it's a simple change. Poor grammar in comments could be considered a code quality issue. While clean documentation is good, this is such a minor issue that it creates more noise than value in the PR review process. The comment is still perfectly understandable. Delete this comment as it points out an extremely minor grammatical issue that doesn't impact code quality or understanding.
8. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_ainvoke.yaml:4
  • Draft comment:
    Typo: The request body contains "What''s 5 + 5?" with a double apostrophe. This is likely a typo and should be "What's 5 + 5?".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a YAML file containing JSON data. Double apostrophes in JSON strings within YAML are actually correct escaping - it's not a typo. The double apostrophe is necessary because the string is enclosed in single quotes in YAML. This is standard YAML/JSON serialization. The comment is incorrect in suggesting this needs to be fixed. Could there be a specific reason why the automated tool flagged this? Maybe there's a different YAML serialization format that would be preferred? No - this is a test cassette file that's likely automatically generated from real API interactions. The current format is perfectly valid and changing it would serve no purpose. The comment should be deleted as it incorrectly identifies proper YAML/JSON escaping as a typo. The current format is correct and requires no changes.
9. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_ainvoke.yaml:4
  • Draft comment:
    Typo: The model string is given as "gpt-4o" which appears to be a typo. It might be intended to be "gpt-4".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Since this is a test cassette file, it's likely recording actual API interactions that occurred during testing. The "typo" might actually be intentional for testing purposes, or it could be capturing real behavior. Test fixtures often contain edge cases or error scenarios on purpose. Without knowing the test's intention, we can't be sure this is actually a problem that needs fixing. The model name could be intentionally wrong to test error handling. Also, this might be a valid model name we're not familiar with. Even if it is a typo, changing it would make the test fixture no longer match the actual recorded interaction, potentially breaking tests. The comment should be deleted as it makes assumptions about test data without understanding the test's purpose, and changing test fixtures could break tests.
10. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_invoke.yaml:4
  • Draft comment:
    It looks like there might be a typographical error in the request body: the string "What''s 5 + 5?" has two consecutive apostrophes. Was this intended, or should it be "What's 5 + 5?"?
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test cassette file that records API interactions. The double apostrophes are not a typo - they are correct YAML syntax for escaping single quotes within a single-quoted string. The comment is incorrect in suggesting this might be a problem. Additionally, this is test data that's likely automatically generated/recorded, not hand-written code. Could there be a case where double apostrophes actually indicate a real problem in the API request? Could this affect the OpenAI API call? No - the double apostrophes are purely YAML syntax and will be correctly parsed into a single apostrophe when the YAML is loaded. The actual API request will receive the correct single apostrophe. The comment should be deleted because it incorrectly suggests there's a problem with valid YAML syntax.
11. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_invoke.yaml:4
  • Draft comment:
    The model field value 'gpt-4o' appears on the same line. This might be a typo—did you intend to use 'gpt-4' instead?
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Since this is a test cassette file, it's recording an actual API interaction that happened. The fact that the request succeeded (status code 200) suggests that either "gpt-4o" is a valid model name, or this is a mock/test scenario where the exact model name doesn't matter. Test cassettes are typically used for recording and replaying test interactions, so being precise about the model name may not be important for the test's purpose. The model name could be a legitimate test value chosen deliberately, or it could be used to test error handling. We shouldn't assume it's a typo without more context. Given this is a test file and the interaction succeeded, the exact model name is likely not critical. The comment is making assumptions about the intended model name without strong evidence of an actual issue. Delete the comment. In a test cassette, the exact model name may be intentional, and the successful API response suggests this isn't causing any actual problems.
12. packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py:192
  • Draft comment:
    Typographical check: The function name is test_langgraph_metrics. Considering the file is for LangChain instrumentation, should this be test_langchain_metrics instead?
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
13. packages/sample-app/sample_app/langgraph_openai.py:20
  • Draft comment:
    Typo notice: The model parameter is set to "gpt-4o". Please confirm if this is intended or if it should be "gpt-4".
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_QGZ5STBV1tSkeiHU

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.


# In legacy chains like LLMChain, suppressing model instrumentations
# within create_llm_span doesn't work, so this should helps as a fallback
context_api.attach(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _OpenAITracingWrapper.call, a new suppression context is attached via context_api.attach(). Consider whether this token should be explicitly detached after the wrapped call to avoid any context leakage. The fallback looks intentional, but ensuring cleanup may prevent potential side effects.

def calculate(state: State):
request = state["request"]
completion = openai_client.chat.completions.create(
model="gpt-4o",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo found: the model name is "gpt-4o". Did you mean "gpt-4"?

Suggested change
model="gpt-4o",
model="gpt-4",

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
packages/sample-app/sample_app/langgraph_openai.py (2)

12-26: Consider adding error handling and result processing.

The calculate function implementation is correct but could be more robust. Consider adding error handling for the OpenAI API call and validating the response structure.

 def calculate(state: State):
     request = state["request"]
-    completion = client.chat.completions.create(
-        model="gpt-4o",
-        messages=[
-            {"role": "system", "content": "You are a mathematician."},
-            {"role": "user", "content": request}
-        ]
-    )
-    return {"result": completion.choices[0].message.content}
+    try:
+        completion = client.chat.completions.create(
+            model="gpt-4o",
+            messages=[
+                {"role": "system", "content": "You are a mathematician."},
+                {"role": "user", "content": request}
+            ]
+        )
+        if completion.choices and completion.choices[0].message.content:
+            return {"result": completion.choices[0].message.content}
+        else:
+            return {"result": "No response received"}
+    except Exception as e:
+        return {"result": f"Error: {str(e)}"}

35-36: Capture and display the workflow result.

The workflow is invoked but the result is not captured or used. Consider storing and displaying the result to make the sample more complete.

 user_request = "What's 5 + 5?"
-langgraph.invoke(input={"request": user_request})
+result = langgraph.invoke(input={"request": user_request})
+print(f"Question: {user_request}")
+print(f"Answer: {result.get('result', 'No result')}")
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (1)

191-267: Consider making metric assertions more robust.

The test logic is sound but could be more resilient to changes. The assertion assert len(metric_data) == 3 is brittle and could break if additional metrics are added.

-    metric_data = resource_metrics[0].scope_metrics[-1].metrics
-    assert len(metric_data) == 3
+    metric_data = resource_metrics[0].scope_metrics[-1].metrics
+    assert len(metric_data) >= 3, f"Expected at least 3 metrics, got {len(metric_data)}"

Also consider using the existing verify_langchain_metrics helper function to reduce code duplication:

+    found_token_metric, found_duration_metric = verify_langchain_metrics(reader)
+    assert found_token_metric, "Token usage metrics not found"
+    assert found_duration_metric, "Operation duration metrics not found"
+    
+    # Additional check for generation choices metric
     generation_choices_metric = next(
         (
             m
             for m in metric_data
             if m.name == Meters.LLM_GENERATION_CHOICES
         ),
         None
     )
     assert generation_choices_metric is not None
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

67-70: Consider making token count assertions more flexible.

Hardcoded token counts (24, 11, 35) may cause test brittleness if OpenAI's tokenization changes. Consider using assertions that verify token counts are positive integers rather than exact values, or document why these specific values are expected.

-    assert openai_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS] == 24
-    assert openai_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS] == 11
-    assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35
+    # Verify token counts are present and reasonable
+    prompt_tokens = openai_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS]
+    completion_tokens = openai_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS]
+    total_tokens = openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
+    
+    assert isinstance(prompt_tokens, int) and prompt_tokens > 0
+    assert isinstance(completion_tokens, int) and completion_tokens > 0
+    assert total_tokens == prompt_tokens + completion_tokens
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d529be and b0dd14e.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-langchain/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (6 hunks)
  • packages/opentelemetry-instrumentation-langchain/pyproject.toml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_ainvoke.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_double_ainvoke.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_double_invoke.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_invoke.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_langgraph_metrics.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1 hunks)
  • packages/sample-app/sample_app/langgraph_openai.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/sample-app/sample_app/langgraph_openai.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
  • Traceloop (37-227)
  • init (49-202)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (4)
  • State (14-16)
  • State (79-81)
  • calculate (18-27)
  • calculate (83-92)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
  • SpanHolder (27-36)
  • _set_span_attribute (52-54)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-257)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py

139-142: Yoda condition detected

(SIM300)


148-153: Yoda condition detected

(SIM300)


180-183: Yoda condition detected

(SIM300)


189-194: Yoda condition detected

(SIM300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (19)
packages/opentelemetry-instrumentation-langchain/pyproject.toml (1)

61-61: LGTM! Dependency addition supports LangGraph instrumentation testing.

The addition of langgraph = "^0.4" to test dependencies is appropriate and necessary for the new test cases that validate OpenTelemetry instrumentation integration with LangGraph workflows.

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)

79-79: LGTM! Consistent instrumentation setup across providers.

Adding meter_provider=meter_provider to the OpenAI instrumentor call ensures consistent instrumentation setup and aligns with the LangchainInstrumentor configuration. This supports proper metrics collection in the test environment.

packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_langgraph_metrics.yaml (1)

1-91: LGTM! Test cassette supports LangGraph metrics validation.

This VCR cassette properly records an OpenAI API interaction with appropriate request/response structure for testing LangGraph workflow instrumentation and metrics collection. The simple math query provides predictable test behavior.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_ainvoke.yaml (1)

1-91: LGTM! Test cassette supports async LangGraph invocation testing.

This cassette records the same OpenAI interaction as other test cassettes, which provides consistency for testing async LangGraph invocation scenarios while isolating instrumentation behavior differences.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_invoke.yaml (1)

1-91: LGTM! Test cassette supports sync LangGraph invocation testing.

This cassette provides consistent recorded OpenAI interactions for testing synchronous LangGraph invocations, directly supporting the PR objective of ensuring proper LLM span creation for sync cases.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_double_ainvoke.yaml (1)

1-91: LGTM! Test cassette properly records OpenAI API interaction.

The cassette correctly captures the HTTP interaction with OpenAI's chat completions endpoint for the double async invocation test scenario. The request structure, headers, and gzipped response format are all appropriate for VCR test replay.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_langgraph/test_langgraph_double_invoke.yaml (1)

1-91: LGTM! Synchronous test cassette matches expected format.

The cassette properly records the synchronous OpenAI API interaction for the double invoke test scenario. The structure is consistent with the async version, with the appropriate x-stainless-async: 'false' header indicating synchronous execution.

packages/sample-app/sample_app/langgraph_openai.py (1)

1-10: LGTM! Proper imports and SDK initialization.

The imports are appropriate for the LangGraph-OpenAI integration, and the Traceloop SDK initialization correctly sets up tracing for the sample application.

packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)

2-2: LGTM! Appropriate import for TypedDict.

The TypedDict import is correctly added to support the State class definition in the new test.


8-14: LGTM! New imports and fixture for LangGraph testing.

The imports for LangGraph and OpenAI are appropriate, and the openai_client fixture provides a clean way to inject the OpenAI client into tests.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (4)

6-7: LGTM! Necessary imports for context management.

The import of context as context_api provides access to OpenTelemetry context management functionality needed for the suppression mechanism.


19-19: LGTM! Import for instrumentation suppression.

The SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY import is essential for the suppression mechanism that prevents duplicate instrumentation in nested LLM calls.


189-208: LGTM! Improved callback manager wrapper with proper handler management.

The refactoring from callback_manager to callback_handler improves code clarity. The addition of the _callback_manager attribute to the handler enables proper context propagation by allowing the handler to determine whether it's dealing with sync or async callback managers.


243-247: Effective fallback suppression mechanism for legacy chains.

The context attachment with SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY provides a crucial fallback for legacy chains where the primary suppression mechanism in create_llm_span may not work effectively. This directly addresses the sync tracing issue described in the PR objectives.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5)

185-188: LGTM! Proper context cleanup.

The addition of context token detachment ensures proper cleanup when spans end, preventing context leakage between operations.


226-233: Well-documented workaround for async context propagation.

The conditional context attachment based on callback type is a reasonable workaround for the known async context propagation issues. The TODO comment and GitHub issue references provide excellent context for future maintainers.


300-309: Correct placement of instrumentation suppression.

Moving the suppression token attachment to _create_llm_span ensures it only affects LLM operations, preventing double instrumentation while allowing other span types to function normally.


474-474: Good defensive programming for model name attribute.

Ensuring the model name is never empty by using "unknown" as a fallback helps maintain data consistency for metrics and filtering operations.


155-155: Verify Python 3.10+ support for PEP 604 union syntax

I wasn’t able to find any pyproject.toml, setup.cfg, or setup.py declaring a minimum Python version. Since the annotation

self._callback_manager: CallbackManager | AsyncCallbackManager = None

uses the A | B union syntax (PEP 604), please confirm that this project targets Python 3.10 or later.
• File:
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Line 155

Command used to update:
nx run opentelemetry-instrumentation-langchain:lock
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ronensc!

@nirga nirga merged commit 168236a into traceloop:main Jul 30, 2025
10 checks passed
Comment on lines -88 to -98
def _message_type_to_role(message_type: str) -> str:
if message_type == "human":
return "user"
elif message_type == "system":
return "system"
elif message_type == "ai":
return "assistant"
elif message_type == "tool":
return "tool"
else:
return "unknown"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function from this file, as it was moved to span_utils.py in PR #2889

@ronensc ronensc deleted the droidnxs/fix-langchain-missing-llm-spans branch July 30, 2025 12:51
@obs-gh-abhishekrao
Copy link
Contributor

obs-gh-abhishekrao commented Jul 30, 2025

Thank you for the collaboration and getting this across the finish line @nirga @ronensc! I'll keep close eyes to see if we can hopefully solve the async case sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: OpenAI Requests Not Traced When Sent from a LangGraph Node

3 participants