docs: Add input data to generate answers documentation by are-ces · Pull Request #36 · lightspeed-core/lightspeed-evaluation

are-ces · 2025-09-01T06:20:25Z

Summary by CodeRabbit

Documentation
- Added an "Input Data" section describing supported evaluation formats: CSV (requires two columns: id and question), Parquet, and JSON.
- Included a sample CSV path and example rows to illustrate the expected schema.
- Added a usage note explaining how the models_to_evaluate list selects which models will be used.
- Expanded the "Running" section to include full CLI usage and options for generate_answers.
- No changes to command logic or runtime behavior.

coderabbitai · 2025-09-01T06:20:31Z

Walkthrough

Adds an “Input Data” section to README-generate-answers.md describing supported evaluation data formats (CSV with required id and question columns, Parquet, JSON) and expands the Running section with full CLI usage for generate_answers. No code or runtime behavior changed.

Changes

Cohort / File(s)	Summary
Documentation update `README-generate-answers.md`	Added “Input Data” section detailing supported formats: CSV (required columns: `id`, `question`, with sample file path and example rows), Parquet, and JSON. Expanded “Running” with full CLI usage and options for `generate_answers`. No code or command behavior changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I twitch my nose at tidy prose,
New data paths the README shows;
CSV, Parquet, JSON gleam—
I hop through formats like a dream. 🥕

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

README-generate-answers.md (4)
40-42: Fix list style (markdownlint) and tighten wording.
Switch asterisks to dashes (MD004) and remove awkward spacing.
-* **CSV** – must contain two columns: `id` and `question`.
-  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+- **CSV** — must contain two columns: `id` and `question`.
+  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+  A header row with `id` and `question` is required (UTF-8).
43-49: Make the CSV sample copy-pasteable.
Add the missing question mark and remove the ellipsis line (invalid CSV).
   ```csv
   id,question
   1,How do I enable VM high availability in my cluster?
   2,How do I migrate a VM to a different project?
-  3,How do I manage RBAC in OpenShift Virtualization
-  ...
+  3,How do I manage RBAC in OpenShift Virtualization?
---

`50-52`: **Align list style and capitalization; clarify phrasing.**
Use dashes (MD004), capitalize Parquet, and add article “the.”


```diff
-* **Parquet** – Lightspeed evaluation parquet format is supported.
-* **JSON** – Lightspeed evaluation JSON format is supported.
+- **Parquet** — The Lightspeed evaluation Parquet format is supported.
+- **JSON** — The Lightspeed evaluation JSON format is supported.
50-52: Document the schema or detection rules for Parquet/JSON.
Readers won’t know expected fields or how formats are detected. Please add a one-liner (fields: id, question) and whether detection is by file extension or auto-sniffing. Also standardize product casing across the doc (“LightSpeed” vs “lightspeed-core” vs “Lightspeed”).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ca9a863 and f2ba4ed.

📒 Files selected for processing (1)

README-generate-answers.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

README-generate-answers.md

[grammar] ~40-~40: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...

(QB_NEW_EN)

[grammar] ~50-~50: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

README-generate-answers.md

40-40: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

50-50: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

51-51: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pylinter
GitHub Check: mypy

🔇 Additional comments (1)

README-generate-answers.md (1)

36-39: Good addition: clear new “Input Data” section.
No issues with placement or intent.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

README-generate-answers.md (2)
36-38: Tighten wording; consider documenting failure behavior when a model is unavailable.

Minor phrasing fix and clearer imperative voice. Also, please document whether generate_answers fails fast or skips entries when a model in models_to_evaluate is missing/unavailable.
-You use the `models_to_evaluate` list to select which of the available models will be used for answer generation.
-All models included in this list must also be defined in the `models` section **and** properly configured and available in the running lightspeed-core service.
+Use the `models_to_evaluate` list to choose which of the available models are used for answer generation.
+All models in this list must also be defined in the `models` section and be available in the running lightspeed-core service.
39-56: Unify list markers; tighten grammar; fix CSV example and punctuation; add minimal schema guidance.

Use dashes to match earlier lists (MD004).

Add “The” before “Lightspeed evaluation Parquet/JSON format”.

Add “Header row is required” to CSV; add missing “?”; replace the “...” row with a valid example.

If JSON/Parquet schemas exist, link them here.
-## Input Data
+## Input Data

-The tool supports multiple input formats for evaluation data:
+The tool supports multiple input formats for evaluation data:

-* **CSV** – must contain two columns: `id` and `question`.
-  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+- **CSV** – must contain two columns: `id` and `question`. Header row is required.
+  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)

-  ```csv
-  id,question
-  1,How do I enable VM high availability in my cluster?
-  2,How do I migrate a VM to a different project?
-  3,How do I manage RBAC in OpenShift Virtualization
-  ...
-  ```
-* **Parquet** – Lightspeed evaluation parquet format is supported.
-* **JSON** – Lightspeed evaluation JSON format is supported.
+  ```csv
+  id,question
+  1,How do I enable VM high availability in my cluster?
+  2,How do I migrate a VM to a different project?
+  3,How do I manage RBAC in OpenShift Virtualization?
+  4,How do I back up a VM?
+  ```
+- **Parquet** – The Lightspeed evaluation Parquet format is supported.
+- **JSON** – The Lightspeed evaluation JSON format is supported.
Follow-up:

Please add links or a brief field schema for the Parquet/JSON formats, or note where they’re defined (e.g., a pydantic/dataclass schema).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f2ba4ed and 12311b0.

📒 Files selected for processing (1)

README-generate-answers.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

README-generate-answers.md

[grammar] ~43-~43: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...

(QB_NEW_EN)

[grammar] ~53-~53: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

README-generate-answers.md

43-43: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

53-53: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

54-54: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

VladimirKadlec

LGTM

tisnik

LGTM

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

docs: Add input data to generate answers documentation

12311b0

are-ces force-pushed the main branch from 144f7ba to 12311b0 Compare September 1, 2025 06:40

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

VladimirKadlec approved these changes Sep 1, 2025

View reviewed changes

tisnik approved these changes Sep 2, 2025

View reviewed changes

tisnik merged commit a10547d into lightspeed-core:main Sep 2, 2025
13 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add input data to generate answers documentation#36

docs: Add input data to generate answers documentation#36
tisnik merged 1 commit intolightspeed-core:mainfrom
are-ces:main

are-ces commented Sep 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

VladimirKadlec left a comment

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

are-ces commented Sep 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

are-ces commented Sep 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)