Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions meta-evals/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Meta-Evaluations


This folder holds some scripts/tests for evaluating some of LangChain's default evaluators.
17 changes: 17 additions & 0 deletions meta-evals/correctness/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Correctness Meta-Evals

This folder contains a test script to check the aggregate performance of the "correctness"-related evaluators.

To upload the dataset to LangSmith, run:

```bash
python meta-evals/correctness/_upload_dataset.py
```

To test, run:

```bash
pytest --capture=no meta-evals/correctness/test_correctness_evaluator.py
```

Then navigate to the Web Q&A dataset to review the results.
Empty file.
30 changes: 30 additions & 0 deletions meta-evals/correctness/_upload_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from pathlib import Path
from langsmith import Client
import json
import logging
logging.basicConfig(level=logging.INFO)

# Synthetic dataset adapted from https://aclanthology.org/D13-1160/

_DATA_REPO = Path(__file__).parent / "data"
_CLIENT = Client()

def _upload_dataset(path: str):
with open(path, "r") as f:
data = json.load(f)
dataset_name = data["name"]
examples = data["examples"]
try:
dataset = _CLIENT.create_dataset(dataset_name)
except Exception as e:
logging.warning(f"Skipping {dataset_name}", e)
return
logging.info(f"Uploading dataset: {dataset_name}")
for i, example in enumerate(examples):
_CLIENT.create_example(example["inputs"], dataset_id=dataset.id, outputs=example["outputs"])
print(f"Uploaded {i+1}/{len(examples)}", end="\r")

if __name__ == '__main__':
for dataset in _DATA_REPO.glob("*.json"):
_upload_dataset(dataset)

Loading