Skip to content

Releases: rdnfn/feedback-forensics

v0.4.7

20 Nov 16:02
94aec89

Choose a tag to compare

What's Changed

  • Various improvements by @rdnfn in #72
    • Fixes to testing (pin to Python 3.13)
    • New experimental notebooks and configurations (human etc.)
    • Fixing issues in plotting
    • Add new models to ff-model-personality config
  • Update README with paper release information by @rdnfn in #71

Full Changelog: v0.4.6...v0.4.7

v0.4.6

07 Aug 20:58
d288781

Choose a tag to compare

What's Changed

  • Add model personality comparison dataset to web app by @rdnfn in #69
  • Add filter of "nan" and empty value comparisons (as these can distort results) by @rdnfn in #69
  • Add new models to model comparison config (gpt-oss-20b, gpt-5) by @rdnfn in #69
  • Fix issue with example links in app (new arena dataset name with year) @rdnfn in #70

Full Changelog: v0.4.5...v0.4.6

v0.4.5

03 Aug 11:05
460dc1e

Choose a tag to compare

What's Changed

  • Improve data loading by @rdnfn in #68
    • Breaking change: Due to changes in the underlying loading mechanism, -d no longer accepts the older ICAI results format of folders (rather than AP json files)
    • Add support for loading multiple datasets by providing directory alone (via --dir CLI argument)
    • Add support for specifying multiple datasets via -d/--datapath rather than just a single one
    • Add support for specifying dataset metadata (name, description, etc.) inside AnnotatedPairs json, rather than separately inside package
    • Add support for loading pre-annotated results from HuggingFace (without token) via --load-web-datasets CLI flag (dataset link)

Full Changelog: v0.4.4...v0.4.5

v0.4.4

31 Jul 12:11
4e7d725

Choose a tag to compare

What's Changed

  • Add Python CLI for model comparison
  • Add GitHub action for automatic addition of new model to HuggingFace dataset
  • Add advanced tutorials to docs
    • Correlation analysis
    • Confidence intervals

Full Changelog: v0.4.3...v0.4.4

v0.4.3

03 Jul 18:06
5b63373

Choose a tag to compare

What's Changed

  • Improvements to results presentation reducing clutter (by @rdnfn in #60)

    • Make metrics shown configurable via env var, and reduce default number of shown metrics
    • Hide prompt in example viewer if none available
  • Improvement to docs:

Full Changelog: v0.4.2...v0.4.3

v0.4.2

05 Jun 14:38
7d41ef4

Choose a tag to compare

What's Changed

  • Update model annotator name by @rdnfn in #51
  • Update README.md by @rdnfn in #52
  • Fix interface issues and improve docs by @rdnfn in #53

Full Changelog: v0.4.1...v0.4.2

v0.4.1

31 May 21:59
bc398a1

Choose a tag to compare

What's Changed

Full Changelog: v0.4.0...v0.4.1

v0.4.0

28 May 21:38
5272a3b

Choose a tag to compare

What's Changed

  • Add simplified settings view in addition to advanced settings making app easier to use, by @rdnfn in #41
  • Add new datapoint viewer (experimental) by @rdnfn in #41

Full Changelog: v0.3.2...v0.4.0

v0.3.2

20 May 15:18
15abf97

Choose a tag to compare

What's Changed

  • Revert change of special arena dataset source by @rdnfn in #40

Full Changelog: v0.3.1...v0.3.2

v0.3.1

20 May 14:32
e3c7902

Choose a tag to compare

What's Changed

  • Fixes and dataset improvements by @rdnfn in #37
    • Fix reference model setting (on already loaded datasets)
    • Update datasets available in online interface

Full Changelog: v0.3.0...v0.3.1