From f4641494eeef97422def91538277dac4cded1380 Mon Sep 17 00:00:00 2001 From: Ian Webster Date: Sat, 16 Dec 2023 11:20:36 -0800 Subject: [PATCH] chore: update readme with new assertions --- README.md | 60 ++++++++++--------- .../configuration/expected-outputs/index.md | 6 +- 2 files changed, 34 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index ac0bee2c923..1141422d6f7 100644 --- a/README.md +++ b/README.md @@ -104,38 +104,40 @@ See [Test assertions](https://promptfoo.dev/docs/configuration/expected-outputs) Deterministic eval metrics -| Assertion Type | Returns true if... | -| --------------- | -------------------------------------------------------------- | -| `equals` | output matches exactly | -| `contains` | output contains substring | -| `icontains` | output contains substring, case insensitive | -| `regex` | output matches regex | -| `starts-with` | output starts with string | -| `contains-any` | output contains any of the listed substrings | -| `contains-all` | output contains all list of substrings | -| `icontains-any` | output contains any of the listed substrings, case insensitive | -| `icontains-all` | output contains all list of substrings, case insensitive | -| `is-json` | output is valid json (optional json schema validation) | -| `contains-json` | output contains valid json (optional json schema validation) | -| `javascript` | provided Javascript function validates the output | -| `python` | provided Python function validates the output | -| `webhook` | provided webhook returns `{pass: true}` | -| `rouge-n` | Rouge-N score is above a given threshold | -| `levenshtein` | Levenshtein distance is below a threshold | +| Assertion Type | Returns true if... | +| ------------------------------- | ---------------------------------------------------------------- | +| `equals` | output matches exactly | +| `contains` | output contains substring | +| `icontains` | output contains substring, case insensitive | +| `regex` | output matches regex | +| `starts-with` | output starts with string | +| `contains-any` | output contains any of the listed substrings | +| `contains-all` | output contains all list of substrings | +| `icontains-any` | output contains any of the listed substrings, case insensitive | +| `icontains-all` | output contains all list of substrings, case insensitive | +| `is-json` | output is valid json (optional json schema validation) | +| `contains-json` | output contains valid json (optional json schema validation) | +| `javascript` | provided Javascript function validates the output | +| `python` | provided Python function validates the output | +| `webhook` | provided webhook returns `{pass: true}` | +| `rouge-n` | Rouge-N score is above a given threshold | +| `levenshtein` | Levenshtein distance is below a threshold | +| `latency`` | Latency is below a threshold (milliseconds) | +| `is-valid-openai-function-call` | Ensure that the function call matches the function's JSON schema | Model-assisted eval metrics -| Assertion Type | Method | -| ----------------------- | ------------------------------------------------------------------------------------------------ | -| `similar` | embeddings and cosine similarity are above a threshold | -| `classifer` | Grade using a [classifer](https://promptfoo.dev/docs/configuration/expected-outputs/classifier/) | -| `llm-rubric` | LLM output matches a given rubric, using a Language Model to grade output | -| `factuality` | LLM output adheres to the given facts, using Factuality method from OpenAI eval | -| `answer-relevance` | Ensure that LLM output is related to original query | -| `context-recall` | Ensure that ground truth appears in context | -| `context-relevance` | Ensure that context is relevant to original query | -| `context-faithfulness` | Ensure that LLM output uses the context | -| `model-graded-closedqa` | LLM output adheres to given criteria, using Closed QA method from OpenAI eval | +| Assertion Type | Method | +| ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | +| [similar](https://promptfoo.dev/docs/configuration/expected-outputs/similar) | Embeddings and cosine similarity are above a threshold | +| [classifier](https://promptfoo.dev/docs/configuration/expected-outputs/classifier) | Run LLM output through a classifier | +| [llm-rubric](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output matches a given rubric, using a Language Model to grade output | +| [answer-relevance](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output is related to original query | +| [context-faithfulness](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output uses the context | +| [context-recall](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that ground truth appears in context | +| [context-relevance](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that context is relevant to original query | +| [factuality](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output adheres to the given facts, using Factuality method from OpenAI eval | +| [model-graded-closedqa](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output adheres to given criteria, using Closed QA method from OpenAI eval | Every test type can be negated by prepending `not-`. For example, `not-equals` or `not-regex`. diff --git a/site/docs/configuration/expected-outputs/index.md b/site/docs/configuration/expected-outputs/index.md index e1c1a7acd5e..de6c5ec18c2 100644 --- a/site/docs/configuration/expected-outputs/index.md +++ b/site/docs/configuration/expected-outputs/index.md @@ -67,15 +67,15 @@ Model-assisted eval metrics | Assertion Type | Method | | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | -| [answer-relevance](/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output is related to original query | +| [similar](/docs/configuration/expected-outputs/similar) | Embeddings and cosine similarity are above a threshold | | [classifier](/docs/configuration/expected-outputs/classifier) | Run LLM output through a classifier | +| [llm-rubric](/docs/configuration/expected-outputs/model-graded) | LLM output matches a given rubric, using a Language Model to grade output | +| [answer-relevance](/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output is related to original query | | [context-faithfulness](/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output uses the context | | [context-recall](/docs/configuration/expected-outputs/model-graded) | Ensure that ground truth appears in context | | [context-relevance](/docs/configuration/expected-outputs/model-graded) | Ensure that context is relevant to original query | | [factuality](/docs/configuration/expected-outputs/model-graded) | LLM output adheres to the given facts, using Factuality method from OpenAI eval | -| [llm-rubric](/docs/configuration/expected-outputs/model-graded) | LLM output matches a given rubric, using a Language Model to grade output | | [model-graded-closedqa](/docs/configuration/expected-outputs/model-graded) | LLM output adheres to given criteria, using Closed QA method from OpenAI eval | -| [similar](/docs/configuration/expected-outputs/similar) | embeddings and cosine similarity are above a threshold | :::tip Every test type can be negated by prepending `not-`. For example, `not-equals` or `not-regex`.