Releases: rdnfn/feedback-forensics
Releases · rdnfn/feedback-forensics
v0.4.7
What's Changed
- Various improvements by @rdnfn in #72
- Fixes to testing (pin to Python 3.13)
- New experimental notebooks and configurations (human etc.)
- Fixing issues in plotting
- Add new models to ff-model-personality config
- Update README with paper release information by @rdnfn in #71
Full Changelog: v0.4.6...v0.4.7
v0.4.6
What's Changed
- Add model personality comparison dataset to web app by @rdnfn in #69
- Add filter of "nan" and empty value comparisons (as these can distort results) by @rdnfn in #69
- Add new models to model comparison config (gpt-oss-20b, gpt-5) by @rdnfn in #69
- Fix issue with example links in app (new arena dataset name with year) @rdnfn in #70
Full Changelog: v0.4.5...v0.4.6
v0.4.5
What's Changed
- Improve data loading by @rdnfn in #68
- Breaking change: Due to changes in the underlying loading mechanism,
-dno longer accepts the older ICAI results format of folders (rather than AP json files) - Add support for loading multiple datasets by providing directory alone (via
--dirCLI argument) - Add support for specifying multiple datasets via
-d/--datapathrather than just a single one - Add support for specifying dataset metadata (name, description, etc.) inside AnnotatedPairs json, rather than separately inside package
- Add support for loading pre-annotated results from HuggingFace (without token) via
--load-web-datasetsCLI flag (dataset link)
- Breaking change: Due to changes in the underlying loading mechanism,
Full Changelog: v0.4.4...v0.4.5
v0.4.4
What's Changed
- Add Python CLI for model comparison
- Add GitHub action for automatic addition of new model to HuggingFace dataset
- Add advanced tutorials to docs
- Correlation analysis
- Confidence intervals
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
-
Improvements to results presentation reducing clutter (by @rdnfn in #60)
- Make metrics shown configurable via env var, and reduce default number of shown metrics
- Hide prompt in example viewer if none available
-
Improvement to docs:
Full Changelog: v0.4.2...v0.4.3