fix(openai): record exception as span events as well#3067
Merged
Conversation
Contributor
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 6e396bc in 2 minutes and 20 seconds. Click for details.
- Reviewed
524lines of code in10files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:39
- Draft comment:
Hard-coded token usage (8) may be brittle if encoding changes. Consider dynamically computing or loosening this assertion. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:87
- Draft comment:
Using hard-coded expected log event content (for 'gen_ai.choice') might become fragile if response formatting changes. Consider verifying key fields or using regex to allow some flexibility. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:277
- Draft comment:
Repeated assertions on hard-coded token values (e.g., prompt tokens = 8) and fixed API base URLs can be brittle. Consider centralizing expected constants or adding comments to clarify these expectations. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:400
- Draft comment:
Assertions comparing fixed response IDs (e.g. 'cmpl-8wq43c8U5ZZCQBX5lrSpsANwcd3OF') may be brittle with VCR responses. Consider matching against a pattern or documenting why these values are stable. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:540
- Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The model name "gpt-4o" appears to be intentionally used for testing purposes. It's used consistently across multiple test cases, including tests that expect authentication errors. The tests are passing with this model name. The comment assumes this is a typo but there's no evidence to support that - in fact, the evidence suggests it's intentional. Could this be a real typo that was accidentally copied across all test cases? Could using an invalid model name affect the test coverage? The tests specifically check for authentication errors, not model validation errors. Using an invalid model name is actually good for testing as it ensures the error handling works correctly without making real API calls. The comment should be deleted. The model name appears to be intentionally set for testing purposes, and changing it could actually make the tests less effective.
6. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:574
- Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The consistent use of "gpt-4o" across all test cases strongly suggests this is intentional. These are tests for error handling and API behavior, and using an invalid model name could be part of the test design. In fact, looking at the test cases where this appears, they're testing error handling scenarios with invalid API keys, which makes an invalid model name even more likely to be intentional. Could this be a genuine typo that was copy-pasted throughout the test file? The model name "gpt-4o" does look unusual. While "gpt-4o" is unusual, the fact that these are tests specifically designed to handle errors and invalid inputs, combined with the consistent usage across all test cases, strongly suggests this is intentional rather than a copy-pasted typo. Delete the comment. The unusual model name appears to be intentionally used for testing error scenarios.
Workflow ID: wflow_omj7pCLGmyK7tvtb
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
amitalokbera
pushed a commit
to amitalokbera/openllmetry
that referenced
this pull request
Jul 15, 2025
nina-kollman
pushed a commit
that referenced
this pull request
Aug 11, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#3066
feat(instrumentation): ...orfix(instrumentation): ....Important
This PR adds exception recording to span events in OpenAI instrumentation wrappers and updates tests to verify this behavior.
span.record_exception(e)to exception handling inchat_wrapper(),completion_wrapper(),embeddings_wrapper(), andruns_create_wrapper()to log exceptions as span events.EventHandleWrapper.on_exception()to record exceptions in spans.test_chat.py,test_chat_parse.py,test_completions.py, andtest_embeddings.pyto verify exceptions are recorded as span events.This description was created by
for 6e396bc. You can customize this summary. It will automatically update as commits are pushed.