You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In entity recognition tasks, the user needs to evaluate multiple outputs. For instance, say the task is to extract the author and a date from a pdf. The user would create an LLM application that returns a JSON with author and date field.
To evaluate this application they would like to create three evaluator, one that evalutes the precision of the whole prediction (the fraction of the fields that have been predicted correctly), another that evaluates the prediction for author, and a last one for date.
The challenge lie if the test set contains the correct answer in two different columns: one for author and one for date.
The user would want to create an evaluator specifying the field in the llm output and the column in the test set with the correct answer. Currently this is not possible, since the evaluator worker has only access to the correct_answer column.
This issue is not only relevant for entity recognition task, but also for other evaluators such as RAGAS which require in addition to the ground truth, a context column. For these cases, we need to provide the evaluator function with custom list of columns.
User will specify the evaluator-config-settings for each ground truth column
and we would need to create an evaluator config for each ground truth column
@aakrem , RAG evaluator don't have one correct answer column, but multiple columns. So at the end there is no one correct answer column that the user would label in the test set.
Same for 2, the RAG evaluator does not have one ground_truth column but multiple.
I think you are trying to keep the logic in the evaluators as is while changing the rest. I don't think that can work. I think the solution is to give the evaluators access to all the columns in the test set and not only one correct_answer value / column
Context:
In entity recognition tasks, the user needs to evaluate multiple outputs. For instance, say the task is to extract the author and a date from a pdf. The user would create an LLM application that returns a JSON with author and date field.
To evaluate this application they would like to create three evaluator, one that evalutes the precision of the whole prediction (the fraction of the fields that have been predicted correctly), another that evaluates the prediction for author, and a last one for date.
The challenge lie if the test set contains the correct answer in two different columns: one for author and one for date.
The user would want to create an evaluator specifying the field in the llm output and the column in the test set with the correct answer. Currently this is not possible, since the evaluator worker has only access to the correct_answer column.
This issue is not only relevant for entity recognition task, but also for other evaluators such as RAGAS which require in addition to the ground truth, a context column. For these cases, we need to provide the evaluator function with custom list of columns.
After finishing this issue, we will be able to:
Description:
Design a solution to enable the evaluators (in agenta-backend/agenta_backend/resources/evaluators/evaluators.py) to access any column in the test set.
The text was updated successfully, but these errors were encountered: