feat: Plumb kwargs through to evaluate and evaluate_dataframe #9786

anticorrelator · 2025-10-03T19:12:51Z

allows passing arbitrary kwargs into evaluator's evaluate and async_evaluate methods
when appropriate, these are passed onto LLM.generate_classification as invocation parameters
per-evaluator kwargs can be passed into evaluate_dataframe with eval_kwargs, a mapping from evaluator names to kwarg dictionaries

Note

Adds **kwargs passthrough to evaluator evaluate/async methods and dataframe evaluators (with per-evaluator eval_kwargs), forwarding them to LLM classification calls.

Evaluators:
- Evaluator.evaluate/async_evaluate now accept **kwargs and forward to _evaluate/_async_evaluate (including thread-wrapper path).
- _evaluate/_async_evaluate signatures updated across base, LLM, and decorator-generated evaluators to accept **kwargs.
LLM Evaluation:
- ClassificationEvaluator forwards **kwargs to LLM.generate_classification/async_generate_classification.
DataFrame Runners:
- evaluate_dataframe and async_evaluate_dataframe accept eval_kwargs (per-evaluator) and **kwargs (global); merged and passed to each evaluator during task execution.

^{Written by Cursor Bugbot for commit 3d1dd97. This will update automatically on new commits. Configure here.}

cursor · 2025-10-03T19:14:00Z

packages/phoenix-evals/src/phoenix/evals/evaluators.py

                    self._docstring = original_docstring

-                def _evaluate(self, eval_input: EvalInput) -> List[Score]:
+                def _evaluate(self, eval_input: EvalInput, **kwargs: Any) -> List[Score]:


Bug: Decorator Ignores Function Parameters

The _evaluate and _async_evaluate methods generated by the create_evaluator decorator accept **kwargs but don't forward them to the underlying user-defined function. This means any kwargs provided to function-based evaluators are silently ignored, which can cause unexpected behavior for users expecting their functions to receive these parameters.

graphite-app · 2025-10-03T19:14:24Z

packages/phoenix-evals/src/phoenix/evals/evaluators.py

+        task_kwargs: Dict[str, Any] = (
+            eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}
+        )
+        task_kwargs.update(kwargs)


The order of merging kwargs needs to be reversed to match the documented behavior. Currently, general kwargs will override evaluator-specific kwargs, but the docstring states that eval_kwargs should take precedence.

To fix this, change:

task_kwargs: Dict[str, Any] = ( eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {} ) task_kwargs.update(kwargs)

to:

task_kwargs = kwargs.copy() task_kwargs.update(eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {})

This ensures that evaluator-specific settings from eval_kwargs will override the general kwargs as intended.

Suggested change

task_kwargs: Dict[str, Any] = (

eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}

)

task_kwargs.update(kwargs)

task_kwargs: Dict[str, Any] = kwargs.copy()

task_kwargs.update(

eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}

)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

feat: Plumb kwargs through to evaluate and evaluate_dataframe

3d1dd97

anticorrelator requested a review from a team as a code owner October 3, 2025 19:12

github-project-automation bot moved this to 📘 Todo in phoenix Oct 3, 2025

github-project-automation bot added this to phoenix Oct 3, 2025

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 3, 2025

cursor bot reviewed Oct 3, 2025

View reviewed changes

graphite-app bot reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Plumb kwargs through to evaluate and evaluate_dataframe #9786

feat: Plumb kwargs through to evaluate and evaluate_dataframe #9786

Uh oh!

anticorrelator commented Oct 3, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Oct 3, 2025

Uh oh!

graphite-app bot Oct 3, 2025

Uh oh!

Uh oh!

feat: Plumb kwargs through to evaluate and evaluate_dataframe #9786

Are you sure you want to change the base?

feat: Plumb kwargs through to evaluate and evaluate_dataframe #9786

Uh oh!

Conversation

anticorrelator commented Oct 3, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Oct 3, 2025

Choose a reason for hiding this comment

Bug: Decorator Ignores Function Parameters

Uh oh!

graphite-app bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anticorrelator commented Oct 3, 2025 •

edited by cursor bot

Loading