You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/metrics/available_metrics/general_purpose.md
+29-1
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ General purpose evaluation metrics are used to evaluate any given task.
6
6
7
7
`AspectCritic` is an evaluation metric that can be used to evaluate responses based on predefined aspects in free form natural language. The output of aspect critiques is binary, indicating whether the submission aligns with the defined aspect or not.
8
8
9
+
**Without reference**
10
+
9
11
### Example
10
12
11
13
```python
@@ -15,7 +17,6 @@ from ragas.metrics import AspectCritic
15
17
sample = SingleTurnSample(
16
18
user_input="Where is the Eiffel Tower located?",
17
19
response="The Eiffel Tower is located in Paris.",
18
-
reference="The Eiffel Tower is located in Paris.",
19
20
)
20
21
21
22
scorer = AspectCritic(
@@ -25,6 +26,31 @@ scorer = AspectCritic(
25
26
scorer.llm = openai_model
26
27
await scorer.single_turn_ascore(sample)
27
28
```
29
+
30
+
**With reference**
31
+
32
+
### Example
33
+
34
+
```python
35
+
from ragas.dataset_schema import SingleTurnSample
36
+
from ragas.metrics import AspectCriticWithReference
37
+
38
+
39
+
sample = SingleTurnSample(
40
+
user_input="Where is the Eiffel Tower located?",
41
+
response="The Eiffel Tower is located in Paris.",
42
+
reference="The Eiffel Tower is located in Paris.",
43
+
)
44
+
45
+
scorer = AspectCritic(
46
+
name="correctness",
47
+
definition="Is the response factually similar to the reference?",
48
+
)
49
+
scorer.llm = openai_model
50
+
await scorer.single_turn_ascore(sample)
51
+
52
+
```
53
+
28
54
### How it works
29
55
30
56
Critics are essentially basic LLM calls using the defined criteria. For example, let's see how the harmfulness critic works:
@@ -39,6 +65,8 @@ Critics are essentially basic LLM calls using the defined criteria. For example,
39
65
- Step 2: The majority vote from the returned verdicts determines the binary output.
40
66
- Output: Yes
41
67
68
+
69
+
42
70
## Simple Criteria Scoring
43
71
44
72
Course graned evaluation method is an evaluation metric that can be used to score (integer) responses based on predefined single free form scoring criteria. The output of course grained evaluation is a integer score between the range specified in the criteria.
instruction="Given an input, response, and reference. Evaluate the submission only using the given criteria. Use only 'Yes' (1) and 'No' (0) as verdict."
227
+
input_model=AspectCriticInputWithReference
228
+
output_model=AspectCriticOutputWithReference
229
+
examples= [
230
+
(
231
+
AspectCriticInputWithReference(
232
+
user_input="Who was the director of Los Alamos Laboratory?",
233
+
response="Einstein was the director of Los Alamos Laboratory.",
234
+
reference="J. Robert Oppenheimer was the director of Los Alamos Laboratory.",
235
+
criteria="Is the output written in perfect grammar",
236
+
),
237
+
AspectCriticOutputWithReference(
238
+
reason="The criteria for evaluation is whether the output is written in perfect grammar. In this case, the output is grammatically correct.",
239
+
verdict=1,
240
+
),
241
+
)
242
+
]
243
+
244
+
245
+
@dataclass
246
+
classAspectCriticWithReference(AspectCritic):
247
+
"""
248
+
AspectCriticWithReference judges the submission to give binary results using the criteria specified
249
+
It uses user_input, response and reference to evaluate the submission.
250
+
251
+
Attributes
252
+
----------
253
+
name: str
254
+
name of the metrics
255
+
definition: str
256
+
criteria to judge the submission, example "Is the submission spreading
257
+
fake information?"
258
+
strictness: int
259
+
The number of times self consistency checks is made. Final judgement is
0 commit comments