Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Give evaluators access to all columns in the test set #1395

Closed
2 tasks done
Tracked by #1393
mmabrouk opened this issue Feb 19, 2024 · 2 comments
Closed
2 tasks done
Tracked by #1393

[Feature] Give evaluators access to all columns in the test set #1395

mmabrouk opened this issue Feb 19, 2024 · 2 comments
Assignees

Comments

@mmabrouk
Copy link
Member

mmabrouk commented Feb 19, 2024

Context:

In entity recognition tasks, the user needs to evaluate multiple outputs. For instance, say the task is to extract the author and a date from a pdf. The user would create an LLM application that returns a JSON with author and date field.
To evaluate this application they would like to create three evaluator, one that evalutes the precision of the whole prediction (the fraction of the fields that have been predicted correctly), another that evaluates the prediction for author, and a last one for date.

The challenge lie if the test set contains the correct answer in two different columns: one for author and one for date.

The user would want to create an evaluator specifying the field in the llm output and the column in the test set with the correct answer. Currently this is not possible, since the evaluator worker has only access to the correct_answer column.

This issue is not only relevant for entity recognition task, but also for other evaluators such as RAGAS which require in addition to the ground truth, a context column. For these cases, we need to provide the evaluator function with custom list of columns.

After finishing this issue, we will be able to:

Description:

Design a solution to enable the evaluators (in agenta-backend/agenta_backend/resources/evaluators/evaluators.py) to access any column in the test set.

@aakrem
Copy link
Collaborator

aakrem commented Apr 29, 2024

Suggested solution:

What we can do here is

1. Testsets

Have the user specify the correct answer columns in the testset from ui.
1st good thing no schema limitation

class TestSetDB(Document):
    name: str
    app: Link[AppDB]
    csvdata: List[Dict[str, str]]
    ground_truth_columns: [str] # new field
    # ...

2. Evaluators

User will specify the evaluator-config-settings for each ground truth column
and we would need to create an evaluator config for each ground truth column

Description of the image

also here good thing no schema limitation

class EvaluatorConfigDB(Document):
    app: Link[AppDB]
    user: Link[UserDB]
    name: str
    evaluator_key: str
    ground_truth_column: str # new field
    settings_values: Dict[str, Any] = Field(default=dict)

3. Evaluation

Adjust the code here to pass the correct answer from the evaluator config
https://github.com/Agenta-AI/agenta/blob/main/agenta-backend/agenta_backend/tasks/evaluations.py#L220

4. Evaluation Results

It will be a packed view.


Maybe we can improve 2 but I can't find better solution there.
Or maybe it's fine as it is.

@mmabrouk
Copy link
Member Author

@aakrem , RAG evaluator don't have one correct answer column, but multiple columns. So at the end there is no one correct answer column that the user would label in the test set.
Same for 2, the RAG evaluator does not have one ground_truth column but multiple.

I think you are trying to keep the logic in the evaluators as is while changing the rest. I don't think that can work. I think the solution is to give the evaluators access to all the columns in the test set and not only one correct_answer value / column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants