Releases · lightspeed-core/lightspeed-evaluation

03 Feb 10:16

asamal4

v0.4.0

89b8ea7

LightSpeed Evaluation v0.4.0 Latest

Latest

What's Changed

Key Changes

Flexible Tool Evaluation: Configurable ordered/unordered & full/partial match modes for tool call validation
Classical Evaluation Metrics: Support for traditional evaluation metrics (bleu, rouge, distance metrics)
Alternate Expected Response: Ability to set alternate ground-truth responses for static evaluation metrics
Eval Configuration Tracking: Evaluation configuration details now included in generated reports for better reproducibility
API Latency Metrics: Latency tracking and reporting for API performance analysis (for API streaming endpoint)
Data Grouping: Tag-based grouping of evaluation conversations for better organization
Data Filtering: Filter evaluation datasets by tags and conversation IDs (CLI arguments) for targeted testing
Cache Warmup: New optional CLI argument to pre-warm (clear) caches before evaluation runs

Pull Requests

bump eval to v0.4.0 by @asamal4 in #128
fix: azure env variable names for judgeLLM by @asamal4 in #129
[LEADS-141] Add Latency Metrics to Evaluation Reports by @bsatapat-jpg in #127
chore: consolidate test_data models by @asamal4 in #131
chore: refactor generator & statistics module by @asamal4 in #132
Add optional property tag to group eval conversations by @asamal4 in #134
add git hooks by @VladimirKadlec in #133
[LEADS-172] Support classical evaluation metrics by @bsatapat-jpg in #130
fix: align docs for updated make targets by @asamal4 in #135
[LEADS-153] Adding the ordered matching logic in tool eval by @bsatapat-jpg in #136
[LEADS-153] Implement match logic (full/partial) by @bsatapat-jpg in #137
Remove duplicate data validation in pipeline by @asamal4 in #141
chore: refactor evaluation runner by @asamal4 in #140
feat: add data filter by tags & conv_ids by @asamal4 in #143
[LEADS-153] Wiring the configuration and adding the config in system.yaml by @bsatapat-jpg in #139
[LEADS-182] - Add eval config data to the report by @arin-deloatch in #142
Leads 6 set expected responses by @xmican10 in #138
map max_tokens to max_completion_tokens internally by @asamal4 in #144
fix: Do subset matching for full_match=false by @saswatamcode in #145
Enhance test quality by @xmican10 in #146
use .model_dump instead of .dict by @asamal4 in #147
add cache-warmup flag by @VladimirKadlec in #149
Leads 212 remove unittest mocking by @xmican10 in #148

New Contributors

@saswatamcode made their first contribution in #145

Full Changelog: v0.3.0...v0.4.0

Contributors

VladimirKadlec, saswatamcode, and 4 other contributors

Assets 2

30 Dec 18:28

asamal4

v0.3.0

0f8df44

LightSpeed Evaluation v0.3.0

What's Changed

Key Changes

Token Usage Statistics: Track and report token consumption during evaluations (both API and JudgeLLM usage)
Certificate Support for JudgeLLM: Configure custom certificates when connecting to Judge LLM endpoints
Skip on Failure: Optional config to skip remaining evaluations in a conversation group when any evaluation criteria fails
Optional Packages: torch and nvidia-* packages are now optional, significantly reducing install size for use cases that don't require them

PRs

bump eval version to 0.3.0 by @asamal4 in #113
docs: reorganize docs, add configuration docs by @VladimirKadlec in #111
Configuration base url update by @yangcao77 in #110
[LEADS-40]: Get statistics about the token usage for lightspeed-evaluation by @bsatapat-jpg in #112
LEADS-160: Adding python 3.13 compatibility by @bsatapat-jpg in #115
add additional fields to output for non-error scenarios by @asamal4 in #114
remove dynamic all by @asamal4 in #116
make agents.md more concise by @asamal4 in #117
add bandit to make target by @asamal4 in #118
chore: refactor processor & errors.py by @asamal4 in #119
[LEADS-119] code scanning found multiple security problems by @bsatapat-jpg in #122
Skip rest of the eval for an metric failure within a conversation group by @asamal4 in #121
Leads 44 certificates for judge llm by @xmican10 in #120
[LEADS-140] lightspeed-evaluation has dependency on torch and nvidia* packages that are not required for all usecases by @bsatapat-jpg in #123
doc: note for rhaiis, models.corp judgellm by @asamal4 in #124
chore: update docs/key features by @asamal4 in #125
doc: Add troubleshooting for known issues by @asamal4 in #126

New Contributors

@yangcao77 made their first contribution in #110
@xmican10 made their first contribution in #120

Full Changelog: v0.2.0...v0.3.0

Contributors

VladimirKadlec, yangcao77, and 3 other contributors

Assets 2

02 Dec 14:00

asamal4

v0.2.0

7665def

LightSpeed Evaluation v0.2.0

What's Changed

bump lightspeed evaluation version by @asamal4 in #78
LCORE-723: Added statistical comparision between two evaluation result files by @bsatapat-jpg in #74
remove unused LightspeedStackClient module by @asamal4 in #81
add agents.md by @asamal4 in #82
LCORE-417 Convert unittest mocking to pytest mocking by @max-svistunov in #84
Concurent eval by @VladimirKadlec in #85
LCORE-834: Added script to run evaluation across multiple providers and models by @bsatapat-jpg in #83
add .caches/ folder to gitignore by @asamal4 in #87
LCORE-899: created the evaluation methodology by @bsatapat-jpg in #88
remove archived OLS eval tool by @asamal4 in #86
add CLAUDE.md by @asamal4 in #89
add agent-eval deprecation note by @asamal4 in #91
LCORE-900: Added the parallel execution for multi-modal evaluation in… by @bsatapat-jpg in #92
Ability to set alternate tool calls for eval by @asamal4 in #90
LCORE-748: Addded unit test cases coverage for the evaluation framework by @bsatapat-jpg in #95
LEADS-113: Added support for gemini embedding models by @bsatapat-jpg in #99
LEADS-2: Fix Path Object Serialization in Amended YAML Files by @bsatapat-jpg in #100
handle no tool call alternative by @asamal4 in #101
LCORE-916: configuration for CodeRabbitAI by @tisnik in #103
GEval Integration by @arin-deloatch in #97
Add keyword eval metric by @asamal4 in #93
fix: run turn evaluation immediately after api call by @asamal4 in #105
LCORE-664: Section about AI tools by @tisnik in #107
LCORE-974: fixed issues found by Pyright by @tisnik in #108
LEADS-8: Lazy imports for eval tool by @bsatapat-jpg in #106
add support for fail_on_invalid_data option by @VladimirKadlec in #94
LEADS-26: Increased Unit test cases coverage by @bsatapat-jpg in #109

New Contributors

@max-svistunov made their first contribution in #84
@arin-deloatch made their first contribution in #97

Full Changelog: v0.1.0...v0.2.0

Contributors

tisnik, VladimirKadlec, and 4 other contributors

Assets 2

10 Oct 15:12

asamal4

v0.1.0

f92850a

LightSpeed Evaluation v0.1.0

What's Changed

initial copy of OLS eval by @asamal4 in #1
merge ols and road-core, first working version by @VladimirKadlec in #2
delete old scripts/evaluation, add README by @VladimirKadlec in #3
add evaluation datasets by @VladimirKadlec in #4
LCORE-162: Setup all CI all linters/checkers by @matysek in #5
Add some type hints into rag_eval.py by @tisnik in #6
Fixed docstrings by @tisnik in #7
Added type hints for functions without return value by @tisnik in #8
LCORE-276: Pin HTTPX version for now by @tisnik in #9
add generate answers tool by @VladimirKadlec in #10
Update dependencies by @tisnik in #12
Fix error: missing argument by @tisnik in #13
Check provider models by @tisnik in #14
fix readme reference post migration by @asamal4 in #11
LCORE: 210 Added Contribution Guide by @jrobertboos in #15
fix empty question, change retry strategy by @VladimirKadlec in #17
fix few lint issues by @asamal4 in #18
feat: add agent e2e eval by @asamal4 in #19
agent eval: verbose print and fixes by @asamal4 in #20
temp-fix: fix/suppress pyright issues by @asamal4 in #21
agent eval: multi-turn & refactoring by @asamal4 in #22
agent-eval: py version by @asamal4 in #23
Agent eval: add tool call comparison by @asamal4 in #24
update dependencies by @VladimirKadlec in #25
fix: streaming error handling by @asamal4 in #26
Generic eval tool by @asamal4 in #28
fix runner by @asamal4 in #31
use uv instead of pdm by @Anxhela21 in #30
Fix Bandit checker on CI by @tisnik in #32
archive old eval and make lsc eval as primary by @asamal4 in #35
switch to regex check for tool arg value by @asamal4 in #41
docs: Add input data to generate answers documentation by @are-ces in #36
fix rule for black & pydocstyle by @asamal4 in #45
Added Unit test cases as well as integration test cases by @bsatapat-jpg in #42
Add client for query endpoint by @Anxhela21 in #43
Feature: Add response_eval:intent evaluation type for LLM response intent assessment by @ItzikEzra-rh in #46
API integration & refactoring by @asamal4 in #47
[nit] Clean up evaluation_data.yaml by @lpiwowar in #52
fix: use uv pip instead of pip by @are-ces in #50
[LCORE-646] Disable default tracking in RAGAS by @lpiwowar in #49
[LCORE-648] Fix processing of float('NaN') values when OutputParserException by @lpiwowar in #48
allow none llm for LS API by @asamal4 in #53
feat: Added parallelism for answer generation by @are-ces in #39
update readme by @asamal4 in #54
fix: propagate arg output dir by @asamal4 in #57
Turn metric override by @asamal4 in #55
feat: add support for custom embedding model by @VladimirKadlec in #56
keep original input file intact by @asamal4 in #59
docs: add links to metrics docs by @VladimirKadlec in #60
Retrieved RAG context from lightspeed-stack API by @bsatapat-jpg in #58
Setting the execution bit only if it's not set by @andrej1991 in #61
provider vertex support for judge llm by @andrej1991 in #29
update tool call property by @asamal4 in #64
add vertex to main eval & refactor by @asamal4 in #63
Env setup/cleanup ability and verify through script by @asamal4 in #62
add example & check for vLLM hosted inference server by @asamal4 in #66
fix sample data by @asamal4 in #69
use absolute imports by @asamal4 in #68
fix: propagate api error message by @asamal4 in #72
add common custom llm by @asamal4 in #70
LCORE-723: Compute correct confidence interval by @bsatapat-jpg in #71
Simplify custom prompt handling & re-organize by @asamal4 in #73
add support for caching llm and api responses by @VladimirKadlec in #75
standardize file name as per framework name in metric by @asamal4 in #76
add intent eval by @asamal4 in #77