feat(haystack): add token count tracking for embeddings #2323

Hansehart · 2025-10-18T15:05:45Z

Summary

Adds token usage tracking for embedding operations in the Haystack instrumentation package. This enables cost monitoring and usage analytics for embedding API calls when using observability platforms like Langfuse.

Changes

Added _get_embedding_token_count_attributes() function to extract token counts from embedder responses
Integrated token tracking into the EMBEDDER component handler in _ComponentRunWrapper
Added test assertions in test_openai_document_embedder_embedding_span_has_expected_attributes to verify token count attributes
Follows the same implementation pattern as existing _get_llm_token_count_attributes()

Implementation Details

The new function supports two response formats:

Standard embedding API format: response["usage"]
Custom components format: response["meta"]["usage"]

Token counts tracked:

LLM_TOKEN_COUNT_PROMPT - Input tokens consumed
LLM_TOKEN_COUNT_TOTAL - Total tokens used
Note: LLM_TOKEN_COUNT_COMPLETION is not tracked for embeddings (no output generation)

Test Coverage

Updated existing test to verify token count attributes are correctly captured from OpenAI embedding responses.

Note

Adds token usage extraction for embedding responses and records model/tokens on EMBEDDER spans, with tests.

Instrumentation (Haystack):
- EMBEDDER spans now include token usage and model details.
  - New _get_embedding_token_count_attributes() extracts prompt_tokens, total_tokens, and model from response["usage"] or response["meta"]["usage"].
  - Integrated into _ComponentRunWrapper for ComponentType.EMBEDDER, merging with existing embedding attributes and setting both LLM_MODEL_NAME and EMBEDDING_MODEL_NAME when available.
Tests:
- Update test_openai_document_embedder_embedding_span_has_expected_attributes to assert LLM_MODEL_NAME, LLM_TOKEN_COUNT_PROMPT, and LLM_TOKEN_COUNT_TOTAL for OpenAI embeddings.

^{Written by Cursor Bugbot for commit 835cec0. This will update automatically on new commits. Configure here.}

github-actions · 2025-10-18T15:05:58Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

Hansehart · 2025-10-18T15:06:49Z

I have read the CLA Document and I hereby sign the CLA

cursor · 2025-10-19T21:20:12Z

...eninference-instrumentation-haystack/src/openinference/instrumentation/haystack/_wrappers.py

+                        **dict(_get_embedding_attributes(arguments, response)),
+                        **dict(_get_embedding_token_count_attributes(response)),
+                    }
+                )


Bug: Model Name Overwritten by Metadata

The EMBEDDING_MODEL_NAME attribute is set twice for embedder components: once from the component's configuration and again from the response metadata. The response metadata value overwrites the component's configured value, potentially causing inconsistent model name reporting.

Additional Locations (1)

python/instrumentation/openinference-instrumentation-haystack/src/openinference/instrumentation/haystack/_wrappers.py#L556-L561

I configured response metadata as primary and component config as fallback! Is This acceptable?

nate-mar · 2025-10-22T08:56:09Z

@Hansehart Thanks so much for this! We very much appreciate it. Let us know if you need help resolving the failed CI checks. My colleague @axiomofjoy will take a look at this when he gets a chance.

Hansehart · 2025-10-22T21:33:19Z

I’ll try to get this fixed in the next few days. Hints or pointers are very welcome though.

caroger · 2025-10-23T18:57:15Z

I’ll try to get this fixed in the next few days. Hints or pointers are very welcome though.

Hi @Hansehart , you can see details of the CI/testing result by clicking into the checks. For instance - https://github.com/Arize-ai/openinference/actions/runs/18636229373/job/53302442562?pr=2323

It seems the CI failed at the mypy type checking steps.

For fast iterations - you can re-run the CI locally with uvx --with tox-uv tox run -e py313-ci-haystack-latest

Hansehart added 2 commits October 18, 2025 16:37

add: haystack embedding usage

4e42100

add: assertions for embeddings

a9289f5

github-project-automation bot added this to Instrumentation Oct 18, 2025

Hansehart marked this pull request as ready for review October 18, 2025 15:06

Hansehart requested a review from a team as a code owner October 18, 2025 15:06

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 18, 2025

github-actions bot added a commit that referenced this pull request Oct 18, 2025

@Hansehart has signed the CLA in #2323

53b1b87

This comment was marked as outdated.

Sign in to view

Hansehart added 2 commits October 18, 2025 17:14

fix: removed completion_tokens for embedding

48b8f49

add: extract model from meta for embedding cost tracking

1c6139a

This comment was marked as outdated.

Sign in to view

add: improved model detection and formatting

835cec0

cursor bot reviewed Oct 19, 2025

View reviewed changes

axiomofjoy self-requested a review October 21, 2025 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(haystack): add token count tracking for embeddings #2323

feat(haystack): add token count tracking for embeddings #2323

Uh oh!

Hansehart commented Oct 18, 2025 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Oct 18, 2025 •

edited

Loading

Uh oh!

Hansehart commented Oct 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 19, 2025

Uh oh!

Hansehart Oct 22, 2025

Uh oh!

nate-mar commented Oct 22, 2025

Uh oh!

Hansehart commented Oct 22, 2025

Uh oh!

caroger commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat(haystack): add token count tracking for embeddings #2323

Are you sure you want to change the base?

feat(haystack): add token count tracking for embeddings #2323

Uh oh!

Conversation

Hansehart commented Oct 18, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Implementation Details

Test Coverage

Uh oh!

github-actions bot commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hansehart commented Oct 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 19, 2025

Choose a reason for hiding this comment

Bug: Model Name Overwritten by Metadata

Uh oh!

Hansehart Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

nate-mar commented Oct 22, 2025

Uh oh!

Hansehart commented Oct 22, 2025

Uh oh!

caroger commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hansehart commented Oct 18, 2025 •

edited by cursor bot

Loading

github-actions bot commented Oct 18, 2025 •

edited

Loading