Add optional property tag to group eval conversations by asamal4 · Pull Request #134 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-01-08T12:26:54Z

Description

Added optional property tag (default value: eval) to group eval conversations.
Added statistics by tag

Some refactoring/modularization was already done to make it minimal change. #132 & #131
Additional refactoring can be done for errors.py (will be covered in separate PR)

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue #
Closes # LEADS-150

Checklist before requesting a review

[] I have performed a self-review of my code.
[] PR has passed all pre-merge test jobs.
[] If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added an optional "tag" field (default "eval") for grouping and filtering evaluation conversations.
- Added tag-based breakdowns in evaluation statistics and output summaries.
Documentation
- Updated docs and examples to document the new tag field and CSV column.
Tests
- Added tests for tag behavior, defaulting, validation, and tag-based statistics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-08T12:27:04Z

Walkthrough

Adds a new optional string field tag (default "eval") across the evaluation pipeline: models, config, CSV constants, statistics aggregation, output generation, error/result creation, and tests to enable grouping/filtering of conversations.

Changes

Cohort / File(s)	Summary
Docs & Config `README.md`, `docs/configuration.md`, `config/evaluation_data.yaml`	Documented `tag` column, added `tag` to csv_columns example and sample groups (`basic`, `tools`, `infrastructure`).
Constants & Models `src/lightspeed_evaluation/core/constants.py`, `src/lightspeed_evaluation/core/models/data.py`	Added `"tag"` to `SUPPORTED_CSV_COLUMNS`; added `tag: str = Field(default="eval", ...)` to `EvaluationData` and `EvaluationResult` (validation and descriptions).
Statistics & Output `src/lightspeed_evaluation/core/output/statistics.py`, `src/lightspeed_evaluation/core/output/generator.py`	Introduced `by_tag` aggregation in detailed stats; included `by_tag` and `tag` when rendering JSON and text summaries.
Pipeline: Errors & Evaluator `src/lightspeed_evaluation/pipeline/evaluation/errors.py`, `src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`	Added `tag` parameter (default `"eval"`) to error/result creation paths and propagated `request.conv_data.tag` into `EvaluationResult` instances across success/error flows.
Tests `tests/unit/core/models/test_data.py`, `tests/unit/core/output/test_statistics.py`	Added tests for `tag` field default/validation in models and tests for `by_tag` breakdown and default-tag behavior in statistics.

Sequence Diagram(s)

sequenceDiagram
    participant Conv as Conversation Data (config / input)
    participant Eval as Evaluator
    participant Err as ErrorHandler
    participant Result as EvaluationResult (model)
    participant Stats as Statistics
    participant Out as Output Generator

    Conv->>Eval: request with conv_data.tag
    Eval->>Result: create EvaluationResult(tag=conv_data.tag)
    Eval->>Err: (on errors) create_error_result(tag=conv_data.tag)
    Result->>Stats: accumulate by_metric / by_conversation / by_tag
    Stats->>Out: provide detailed_stats (includes by_tag)
    Out->>Client: render JSON/text including tag and by_tag breakdown

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Skip rest of the eval for an metric failure within a conversation group #121: Overlaps changes to error/result construction and error handler logic that this PR modifies.
add additional fields to output for non-error scenarios #114: Touches the same model and CSV schema updates (EvaluationResult and SUPPORTED_CSV_COLUMNS).
Turn metric override #55: Modifies pipeline evaluation modules that interact with EvaluationData/EvaluationResult propagation paths.

Suggested reviewers

VladimirKadlec
tisnik

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding an optional tag property to group evaluation conversations, which is the core feature introduced across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/lightspeed_evaluation/core/models/data.py (1)
397-399: Consider adding min_length=1 validation for consistency.

The tag field in EvaluationData (Line 331) has min_length=1 validation, but EvaluationResult.tag does not. This inconsistency could allow empty tags in results while preventing them in data models, potentially causing issues with tag-based grouping and statistics.
🔧 Suggested fix for consistency
 tag: str = Field(
-    default="eval", description="Tag for grouping and filtering results"
+    default="eval", 
+    min_length=1,
+    description="Tag for grouping and filtering results"
 )
tests/unit/core/models/test_data.py (1)
297-302: Consider adding a test for empty tag validation.

While the current tests cover default and explicit tag values well, there's no test verifying that empty tags are rejected for EvaluationData (which has min_length=1 validation on Line 331 of data.py). This would ensure the validation works as expected.
🧪 Suggested test addition

Add this test to the TestEvaluationData class:
def test_empty_tag_rejected(self):
    """Test that empty tag is rejected."""
    turn = TurnData(turn_id="turn1", query="Query")
    
    with pytest.raises(ValidationError):
        EvaluationData(conversation_group_id="conv1", turns=[turn], tag="")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef377db and 9d43a51.

📒 Files selected for processing (11)

README.md
config/evaluation_data.yaml
docs/configuration.md
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/output/generator.py
src/lightspeed_evaluation/core/output/statistics.py
src/lightspeed_evaluation/pipeline/evaluation/errors.py
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

🧰 Additional context used

📓 Path-based instructions (3)

src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Provide type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels in lightspeed_evaluation modules

Files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
src/lightspeed_evaluation/core/output/statistics.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/pipeline/evaluation/errors.py
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/output/generator.py

**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest mocking (mocker fixture) instead of unittest.mock

Files:

tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*

Files:

tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

🧠 Learnings (8)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 22
File: lsc_agent_eval/src/lsc_agent_eval/core/agent_goal_eval/eval_data.py:146-153
Timestamp: 2025-07-28T14:26:03.119Z
Learning: In the lsc_agent_eval framework, evaluations are identified by a composite key of (conversation_group, eval_id). This design allows the same eval_id to exist across different conversation groups (logged as warning) but prevents duplicates within the same conversation group (validation error). This supports logical separation and reusable eval_ids across different conversation contexts.

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

📚 Learning: 2025-07-28T14:26:03.119Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 22
File: lsc_agent_eval/src/lsc_agent_eval/core/agent_goal_eval/eval_data.py:146-153
Timestamp: 2025-07-28T14:26:03.119Z
Learning: In the lsc_agent_eval framework, evaluations are identified by a composite key of (conversation_group, eval_id). This design allows the same eval_id to exist across different conversation groups (logged as warning) but prevents duplicates within the same conversation group (validation error). This supports logical separation and reusable eval_ids across different conversation contexts.

Applied to files:

docs/configuration.md
config/evaluation_data.yaml
README.md

📚 Learning: 2025-08-13T14:07:44.195Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 24
File: lsc_agent_eval/README.md:116-136
Timestamp: 2025-08-13T14:07:44.195Z
Learning: In the lsc_agent_eval framework, the expected_tool_calls configuration uses "tool_name" as the key for tool names, not "name". The tool call evaluation implementation specifically looks for the "tool_name" field when comparing expected vs actual tool calls.

Applied to files:

config/evaluation_data.yaml

📚 Learning: 2025-07-16T10:41:09.399Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:274-297
Timestamp: 2025-07-16T10:41:09.399Z
Learning: In the lsc_agent_eval package, the team prefers to focus on core functionality testing first and considers testing cleanup script execution after setup failure as early optimization, noting that there's no guarantee cleanup scripts will run successfully anyway.

Applied to files:

config/evaluation_data.yaml

📚 Learning: 2025-07-16T09:42:00.691Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/data/script/eval3/setup.sh:1-3
Timestamp: 2025-07-16T09:42:00.691Z
Learning: Scripts in the lsc_agent_eval/data directory are meant to be simple examples/samples for teams to customize according to their needs, not production-ready code.

Applied to files:

config/evaluation_data.yaml

📚 Learning: 2025-12-11T10:05:06.422Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Do not add new features to lsc_agent_eval/ directory - add to src/lightspeed_evaluation/ instead

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/errors.py

📚 Learning: 2025-09-19T00:37:23.798Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/errors.py

🧬 Code graph analysis (3)

src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationResult (389-461)

tests/unit/core/models/test_data.py (1)

src/lightspeed_evaluation/core/models/data.py (3)

TurnData (36-315)

EvaluationData (318-386)

EvaluationResult (389-461)

tests/unit/core/output/test_statistics.py (2)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationResult (389-461)

src/lightspeed_evaluation/core/output/statistics.py (1)

calculate_detailed_stats (81-114)

🔇 Additional comments (20)

docs/configuration.md (1)

216-216: LGTM! Documentation is clear and consistent.

The documentation accurately describes the new tag CSV column and its purpose for grouping eval conversations. The placement after conversation_group_id aligns with the implementation in the constants file.

Also applies to: 249-249

README.md (1)

157-157: LGTM! Documentation is accurate and complete.

The README correctly documents the new tag field with its optional nature, default value, and purpose. The documentation is consistent with the implementation in the data models.

Also applies to: 214-214

src/lightspeed_evaluation/core/constants.py (1)

29-29: LGTM! Appropriate placement of the new column.

The tag column is logically positioned after conversation_group_id, which makes sense given that tags are used for grouping conversations.

tests/unit/core/models/test_data.py (1)

336-347: LGTM! Comprehensive test coverage for the tag field.

The tests effectively verify both default and explicit tag values for EvaluationResult, ensuring the new feature works as intended.

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

168-168: LGTM! Tag propagation implemented correctly.

The tag field is correctly propagated from request.conv_data.tag to both successful evaluation results (line 168) and error results (line 222), ensuring consistent tagging across all evaluation outcomes.

Also applies to: 222-222

config/evaluation_data.yaml (1)

5-5: LGTM! Tag field examples are clear and well-documented.

The configuration examples demonstrate meaningful tag usage (basic, tools, infrastructure) with helpful inline documentation of the default value. This provides good guidance for users adopting the new tag feature.

Also applies to: 60-60, 99-99

src/lightspeed_evaluation/pipeline/evaluation/errors.py (6)

18-48: LGTM! Tag parameter properly added to result creation.

The _create_result method now correctly accepts and propagates the tag parameter with an appropriate default value ("eval"), proper documentation, and correct parameter handling (keyword-only with *).

50-72: LGTM! Error result creation includes tag.

The create_error_result method properly propagates the tag parameter through to _create_result, maintaining consistency in error handling.

74-96: LGTM! Skipped result creation includes tag.

The create_skipped_result method correctly handles tag propagation, ensuring skipped evaluations are also properly tagged.

98-149: LGTM! All metrics error marking propagates tag correctly.

The mark_all_metrics_as_error method correctly passes tag=conv_data.tag to all error results for both turn-level and conversation-level metrics, ensuring comprehensive tag propagation during error scenarios.

151-189: LGTM! Turn metrics error marking includes tag.

The mark_turn_metrics_as_error method properly propagates tag=conv_data.tag when creating error results for individual turn metrics.

191-244: LGTM! Cascade marking propagates tag correctly.

The _mark_cascade helper method correctly passes tag=conv_data.tag when marking remaining turns and conversation-level metrics, ensuring tag consistency in cascading error/skip scenarios.

tests/unit/core/output/test_statistics.py (2)

683-740: LGTM! Comprehensive test coverage for by_tag statistics.

The test validates that calculate_detailed_stats correctly aggregates results by tag, checking pass/fail counts, rates, and score statistics for multiple tags (production, staging). This provides good coverage of the new tag-based aggregation functionality.

741-759: LGTM! Default tag behavior properly tested.

The test validates that when no explicit tag is provided, the default "eval" tag is used in the by_tag statistics. This ensures backward compatibility and correct default behavior.

src/lightspeed_evaluation/core/output/statistics.py (1)

81-114: LGTM! By-tag statistics properly integrated.

The calculate_detailed_stats function now correctly:

Initializes and populates by_tag statistics alongside existing categories

Includes score statistics for tag-level aggregation (consistent with by_metric)

Properly finalizes tag statistics with confidence intervals

Returns the expanded structure including by_tag

The implementation follows the same pattern as existing by_metric statistics, ensuring consistency.

src/lightspeed_evaluation/core/output/generator.py (5)

100-100: LGTM! Consistent default structure.

Adding "by_tag": {} to the default empty detailed stats structure ensures consistency with the other breakdown categories and prevents potential KeyError issues downstream.

227-232: LGTM! Good defensive coding.

The use of .get("by_tag", {}) provides a safe fallback and is consistent with how other statistics categories are handled.

241-259: LGTM! Proper inclusion of tag field in JSON output.

The tag field is correctly added to each result object in the JSON output, consistent with the serialization pattern for other fields.

302-304: LGTM! Consistent text output formatting.

The "By Tag" breakdown section is properly implemented with defensive coding and score statistics support, following the same pattern as other breakdown categories.

166-202: Tag field is already included in CSV columns configuration.

Verification confirms that SUPPORTED_CSV_COLUMNS in src/lightspeed_evaluation/core/constants.py includes "tag" at line 29. The CSV generation code correctly handles dynamic column inclusion using hasattr(), so no action is required.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/output/statistics.py (1)
81-114: LGTM! Consistent implementation of by_tag statistics.

The by_tag aggregation follows the same pattern as by_metric, appropriately using include_scores=True and properly finalizing statistics with confidence intervals. The empty-result handling is correct.
📝 Optional: Enhance docstring for clarity

Consider making the docstring more explicit about the returned structure:
 def calculate_detailed_stats(results: list[EvaluationResult]) -> dict[str, Any]:
-    """Calculate detailed statistics broken down by different categories."""
+    """Calculate detailed statistics broken down by different categories.
+    
+    Returns:
+        Dictionary with keys 'by_metric', 'by_conversation', and 'by_tag',
+        each containing aggregated pass/fail/error counts, rates, and score statistics.
+    """

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d43a51 and 6230230.

📒 Files selected for processing (11)

README.md
config/evaluation_data.yaml
docs/configuration.md
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/output/generator.py
src/lightspeed_evaluation/core/output/statistics.py
src/lightspeed_evaluation/pipeline/evaluation/errors.py
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

🚧 Files skipped from review as they are similar to previous changes (4)

config/evaluation_data.yaml
src/lightspeed_evaluation/pipeline/evaluation/errors.py
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/data.py

🧰 Additional context used

📓 Path-based instructions (3)

**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest mocking (mocker fixture) instead of unittest.mock

Files:

tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*

Files:

tests/unit/core/models/test_data.py
tests/unit/core/output/test_statistics.py

src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Provide type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels in lightspeed_evaluation modules

Files:

src/lightspeed_evaluation/core/output/statistics.py
src/lightspeed_evaluation/core/output/generator.py
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

📚 Learning: 2025-07-28T14:26:03.119Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 22
File: lsc_agent_eval/src/lsc_agent_eval/core/agent_goal_eval/eval_data.py:146-153
Timestamp: 2025-07-28T14:26:03.119Z
Learning: In the lsc_agent_eval framework, evaluations are identified by a composite key of (conversation_group, eval_id). This design allows the same eval_id to exist across different conversation groups (logged as warning) but prevents duplicates within the same conversation group (validation error). This supports logical separation and reusable eval_ids across different conversation contexts.

Applied to files:

README.md
docs/configuration.md

📚 Learning: 2025-12-11T10:05:06.422Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Do not add new features to lsc_agent_eval/ directory - add to src/lightspeed_evaluation/ instead

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py

🧬 Code graph analysis (2)

tests/unit/core/models/test_data.py (1)

src/lightspeed_evaluation/core/models/data.py (3)

TurnData (36-315)

EvaluationData (318-386)

EvaluationResult (389-463)

tests/unit/core/output/test_statistics.py (2)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationResult (389-463)

src/lightspeed_evaluation/core/output/statistics.py (1)

calculate_detailed_stats (81-114)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: tests (3.13)
GitHub Check: tests (3.12)
GitHub Check: tests (3.11)

🔇 Additional comments (13)

docs/configuration.md (1)

216-216: LGTM! Documentation is clear and consistent.

The documentation properly describes the new "tag" CSV column and includes it in the example configuration. This aligns well with the field description in README.md.

Also applies to: 249-249

README.md (1)

157-157: LGTM! Clear documentation of the new tag field.

The documentation properly describes the tag field as optional with a default value of "eval", making it easy for users to understand and use this feature for grouping evaluation conversations.

Also applies to: 214-214

tests/unit/core/output/test_statistics.py (1)

683-758: LGTM! Comprehensive test coverage for by_tag functionality.

The tests properly validate:

Tag-based aggregation with explicit tag values ("production", "staging")

Default tag behavior (defaulting to "eval")

Pass/fail/error counts and rates per tag

Score statistics inclusion for tag breakdowns

This provides solid coverage for the new feature.

src/lightspeed_evaluation/core/output/generator.py (4)

100-100: LGTM! Proper default for empty results.

The empty detailed stats correctly includes the by_tag key, ensuring consistent structure across all code paths.

227-232: LGTM! Proper inclusion of by_tag in JSON summary.

The by_tag statistics are correctly added to the summary_stats structure with appropriate fallback to an empty dict.

241-259: LGTM! Tag field properly included in result output.

Each evaluation result now includes the tag field (line 244), enabling downstream consumers to filter or group results by tag.

302-304: LGTM! Text summary includes by_tag breakdown.

The "By Tag" section is appropriately added to the text output with include_scores=True, providing score statistics for tag-based groupings consistent with the by_metric section.

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (2)

166-204: LGTM! Tag propagation implemented correctly.

The tag field is properly propagated from request.conv_data.tag to the EvaluationResult in the success path, enabling conversation grouping by tag.

215-238: LGTM! Tag propagation in error path is consistent.

The tag field is correctly propagated in the error result path, ensuring all evaluation results include the tag regardless of success or failure status.

tests/unit/core/models/test_data.py (4)

276-296: LGTM! Explicit tag value test is comprehensive.

The test correctly verifies that explicit tag values are properly stored in EvaluationData.

297-309: LGTM! Tag validation tests are thorough.

The tests comprehensively cover both the default tag value ("eval") and the validation constraint that rejects empty tags.

327-342: LGTM! Default tag assertion added correctly.

The test properly verifies that EvaluationResult has the default tag value "eval".

343-366: LGTM! EvaluationResult tag tests are comprehensive.

The tests thoroughly verify both explicit tag values and validation constraints for EvaluationResult, maintaining consistency with the EvaluationData tests.

VladimirKadlec

LGTM, one comment for the future refactor.

VladimirKadlec · 2026-01-09T14:52:09Z

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py

            turn_data = request.turn_data
            return EvaluationResult(
                conversation_group_id=request.conv_data.conversation_group_id,
+                tag=request.conv_data.tag,


EvaluationResult is becoming really crazy :-/

Not in this PR, but for some future refactor, I'd recommend:

storing the whole request

storing the whole turn_data

asamal4 force-pushed the add-tag branch from ccb4764 to 9d43a51 Compare January 8, 2026 12:29

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

asamal4 added 2 commits January 8, 2026 18:10

add tag field to data model and result

09fef28

add statistics by tag

6230230

asamal4 force-pushed the add-tag branch from 9d43a51 to 6230230 Compare January 8, 2026 12:40

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

VladimirKadlec approved these changes Jan 9, 2026

View reviewed changes

asamal4 merged commit 4fb4eb8 into lightspeed-core:main Jan 11, 2026
15 checks passed

coderabbitai bot mentioned this pull request Jan 16, 2026

feat: add data filter by tags & conv_ids #143

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional property tag to group eval conversations#134

Add optional property tag to group eval conversations#134
asamal4 merged 2 commits intolightspeed-core:mainfrom
asamal4:add-tag

asamal4 commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

VladimirKadlec left a comment

Uh oh!

VladimirKadlec Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asamal4 commented Jan 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading