Skip to content

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Oct 3, 2025

  • allows passing arbitrary kwargs into evaluator's evaluate and async_evaluate methods
  • when appropriate, these are passed onto LLM.generate_classification as invocation parameters
  • per-evaluator kwargs can be passed into evaluate_dataframe with eval_kwargs, a mapping from evaluator names to kwarg dictionaries

Note

Adds **kwargs passthrough to evaluator evaluate/async methods and dataframe evaluators (with per-evaluator eval_kwargs), forwarding them to LLM classification calls.

  • Evaluators:
    • Evaluator.evaluate/async_evaluate now accept **kwargs and forward to _evaluate/_async_evaluate (including thread-wrapper path).
    • _evaluate/_async_evaluate signatures updated across base, LLM, and decorator-generated evaluators to accept **kwargs.
  • LLM Evaluation:
    • ClassificationEvaluator forwards **kwargs to LLM.generate_classification/async_generate_classification.
  • DataFrame Runners:
    • evaluate_dataframe and async_evaluate_dataframe accept eval_kwargs (per-evaluator) and **kwargs (global); merged and passed to each evaluator during task execution.

Written by Cursor Bugbot for commit 3d1dd97. This will update automatically on new commits. Configure here.

@anticorrelator anticorrelator requested a review from a team as a code owner October 3, 2025 19:12
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Oct 3, 2025
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 3, 2025
self._docstring = original_docstring

def _evaluate(self, eval_input: EvalInput) -> List[Score]:
def _evaluate(self, eval_input: EvalInput, **kwargs: Any) -> List[Score]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Decorator Ignores Function Parameters

The _evaluate and _async_evaluate methods generated by the create_evaluator decorator accept **kwargs but don't forward them to the underlying user-defined function. This means any kwargs provided to function-based evaluators are silently ignored, which can cause unexpected behavior for users expecting their functions to receive these parameters.

Fix in Cursor Fix in Web

Comment on lines +1312 to +1315
task_kwargs: Dict[str, Any] = (
eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}
)
task_kwargs.update(kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of merging kwargs needs to be reversed to match the documented behavior. Currently, general kwargs will override evaluator-specific kwargs, but the docstring states that eval_kwargs should take precedence.

To fix this, change:

task_kwargs: Dict[str, Any] = (
    eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}
)
task_kwargs.update(kwargs)

to:

task_kwargs = kwargs.copy()
task_kwargs.update(eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {})

This ensures that evaluator-specific settings from eval_kwargs will override the general kwargs as intended.

Suggested change
task_kwargs: Dict[str, Any] = (
eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}
)
task_kwargs.update(kwargs)
task_kwargs: Dict[str, Any] = kwargs.copy()
task_kwargs.update(
eval_kwargs.get(evaluator.name, {}) if eval_kwargs else {}
)

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: 📘 Todo
Development

Successfully merging this pull request may close these issues.

1 participant