Skip to content

Releases: VectorInstitute/eval-agents

eval-agents v0.3.0

25 Feb 16:05
f0ff198

Choose a tag to compare

What's Changed

  • Report Generation: Adding notebooks to better explain the usage by @lotif in #61
  • Add integration test before workspace start by @amrit110 in #62
  • Report Generation: Adding unit tests by @lotif in #64
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #65
  • Add notebooks for AML investigation use case by @fcogidi in #63
  • Report Generation: Adding a system diagram by @lotif in #66
  • Add knowledgeqa agent with refactored code by @amrit110 in #67
  • Fix notebooks, agent code by @amrit110 in #68
  • Update and fix config by @amrit110 in #69
  • Bump up to 0.3.0 by @amrit110 in #70

Full Changelog: v0.2.1...v0.3.0

eval-agents v0.2.1

19 Feb 14:24

Choose a tag to compare

aieng-eval-agents v0.2.0

19 Feb 14:14

Choose a tag to compare

What's Changed

  • Bump astral-sh/setup-uv from 7.1.2 to 7.1.3 by @dependabot[bot] in #1
  • Bump actions/checkout from 5.0.0 to 6.0.0 by @dependabot[bot] in #3
  • Bump astral-sh/setup-uv from 7.1.3 to 7.1.4 by @dependabot[bot] in #2
  • Bump astral-sh/setup-uv from 7.1.4 to 7.1.5 by @dependabot[bot] in #6
  • Bump actions/setup-python from 6.0.0 to 6.1.0 by @dependabot[bot] in #5
  • Bump actions/checkout from 6.0.0 to 6.0.1 by @dependabot[bot] in #4
  • Bump astral-sh/setup-uv from 7.1.5 to 7.1.6 by @dependabot[bot] in #7
  • Refactor package structure and update dependencies by @fcogidi in #9
  • Bump astral-sh/setup-uv from 7.1.6 to 7.2.0 by @dependabot[bot] in #10
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #11
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #12
  • Bump actions/setup-python from 6.1.0 to 6.2.0 by @dependabot[bot] in #14
  • Bump actions/checkout from 6.0.1 to 6.0.2 by @dependabot[bot] in #15
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #16
  • First version of the Report Generation Agent by @lotif in #13
  • Add initial working implementation using search grounding by @amrit110 in #17
  • Feature/knowledge agent by @amrit110 in #18
  • Bump astral-sh/setup-uv from 7.2.0 to 7.2.1 by @dependabot[bot] in #25
  • Remove unused weaviate and vertex config, add evaluator model env variable by @amrit110 in #23
  • First Langfuse evaluations by @lotif in #19
  • Move grounding tool to reusable parent tools dir by @amrit110 in #24
  • Add code coverage reporting to unit test workflow by @amrit110 in #26
  • Report Generation Agent: Adding Trajectory Evaluation by @lotif in #27
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #29
  • Refactoring the Report Generation Agent demo UI and evaluation script by @lotif in #30
  • Add AML investigation agent by @fcogidi in #21
  • Extract langfuse tracing setup for google-adk to more general location by @fcogidi in #31
  • Improve search tool to extract resolved urls by @amrit110 in #28
  • Refactor Langfuse dataset upload by @fcogidi in #32
  • Rename modules to knowledge_qa by @amrit110 in #33
  • Add dataset upload script by @amrit110 in #34
  • Add web fetch tool by @amrit110 in #36
  • Add evaluation harness by @fcogidi in #37
  • Bump astral-sh/setup-uv from 7.2.1 to 7.3.0 by @dependabot[bot] in #40
  • Rename fields in CaseRecord for consistency with langfuse evaluators by @fcogidi in #38
  • Report Generation: Switching from OpenAI SDK to Google ADK by @lotif in #35
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #41
  • Upgrading cryptography pillow to address security issues by @lotif in #45
  • Report Generation Agent: switching to SQLAlchemy by @lotif in #42
  • Add LLM judge evaluator factory by @fcogidi in #39
  • Add separate DB manager by @amrit110 in #50
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #51
  • Add trace groundedness evaluator by @fcogidi in #44
  • Remove legacy evaulation module by @amrit110 in #53
  • Add evaluation script for knowledgeqa by @amrit110 in #43
  • Report Generation: Adding basic online evaluation scores by @lotif in #46
  • Report generation: Adding user feedback support in the form of thumbs up and thumbs down button in the UI by @lotif in #54
  • Refactor AML investigation agent by @fcogidi in #48
  • Add deterministic graders for AML investigation agent by @fcogidi in #49
  • Add tools for reading and grepping csv/xls files by @amrit110 in #52
  • Improve search tool code structure by @amrit110 in #55
  • Add agent planner and event extraction by @amrit110 in #56
  • Remove gradio file for knowledge_qa implementation by @amrit110 in #57
  • Add small fix to grader and evaluate scripts by @amrit110 in #58
  • Add additional tests for processing fns by @amrit110 in #59
  • Add AML agent evaluation script by @fcogidi in #60

New Contributors

Full Changelog: https://github.com/VectorInstitute/eval-agents/commits/v0.2.0