[Bug]: JSON format errors when using buil-in metrics: Hallucination and Answer Relevance #506

SrBliss · 2024-10-30T06:56:24Z

Willingness to contribute

No. I can't contribute a fix for this bug at this time.

What component(s) are affected?

Python SDK
Opik UI
Opik Server
Documentation

Opik version

Opik version: 1.0.2

Describe the problem

When calling evaluate using built-in metrics Hallucination or Answer Relevance I get a JSON format related error and evaluation fails.

Reproduction steps

Snippet:

# Define the metrics
hallucination_metric = Hallucination(name="Hallucination")
answerrelevance_metric = AnswerRelevance(name="AnswerRelevance")

SWEEP_ID = "03"

for i, prompt in enumerate(system_prompts):
    SYSTEM_PROMPT = prompt
    experiment_config = {"system_prompt": SYSTEM_PROMPT, "model": "gpt-3.5-turbo"}
    experiment_name = f"comet-chatbot-{SWEEP_ID}-{i}"

    res = evaluate(
        experiment_name=experiment_name,
        dataset=dataset,
        experiment_config=experiment_config,
        task=evaluation_task,
        scoring_metrics=[hallucination_metric,
                         answerrelevance_metric]
    )

Error:

Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]OPIK: Failed to compute metric Hallucination. Score result will be marked as failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 121, in _parse_model_output
    dict_content = json.loads(content)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/opik/evaluation/tasks_scorer.py", line 29, in _score_test_case
    result = metric.score(**score_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 87, in score
    return self._parse_model_output(model_output)
  File "/usr/local/lib/python3.10/dist-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 130, in _parse_model_output
    raise exceptions.MetricComputationError(
opik.evaluation.metrics.exceptions.MetricComputationError: Failed hallucination detection

The text was updated successfully, but these errors were encountered:

jverre · 2024-10-30T08:40:29Z

@SrBliss This is because the LLM does not consistently return a valid JSON, I've opened a PR to add support for structured outputs: #506

jverre · 2024-11-03T09:09:54Z

We are now enforcing structured outputs in our evaluation metrics so you shouldn't face this issue anymore

SrBliss added the bug Something isn't working label Oct 30, 2024

jverre added the work in progress label Oct 30, 2024

jverre closed this as completed Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: JSON format errors when using buil-in metrics: Hallucination and Answer Relevance #506

[Bug]: JSON format errors when using buil-in metrics: Hallucination and Answer Relevance #506

SrBliss commented Oct 30, 2024 •

edited

Loading

jverre commented Oct 30, 2024

jverre commented Nov 3, 2024

[Bug]: JSON format errors when using buil-in metrics: Hallucination and Answer Relevance #506

[Bug]: JSON format errors when using buil-in metrics: Hallucination and Answer Relevance #506

Comments

SrBliss commented Oct 30, 2024 • edited Loading

Willingness to contribute

What component(s) are affected?

Opik version

Describe the problem

Reproduction steps

jverre commented Oct 30, 2024

jverre commented Nov 3, 2024

SrBliss commented Oct 30, 2024 •

edited

Loading