Skip to content

docs: Add input data to generate answers documentation#36

Merged
tisnik merged 1 commit intolightspeed-core:mainfrom
are-ces:main
Sep 2, 2025
Merged

docs: Add input data to generate answers documentation#36
tisnik merged 1 commit intolightspeed-core:mainfrom
are-ces:main

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Sep 1, 2025

Summary by CodeRabbit

  • Documentation
    • Added an "Input Data" section describing supported evaluation formats: CSV (requires two columns: id and question), Parquet, and JSON.
    • Included a sample CSV path and example rows to illustrate the expected schema.
    • Added a usage note explaining how the models_to_evaluate list selects which models will be used.
    • Expanded the "Running" section to include full CLI usage and options for generate_answers.
    • No changes to command logic or runtime behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 1, 2025

Walkthrough

Adds an “Input Data” section to README-generate-answers.md describing supported evaluation data formats (CSV with required id and question columns, Parquet, JSON) and expands the Running section with full CLI usage for generate_answers. No code or runtime behavior changed.

Changes

Cohort / File(s) Summary
Documentation update
README-generate-answers.md
Added “Input Data” section detailing supported formats: CSV (required columns: id, question, with sample file path and example rows), Parquet, and JSON. Expanded “Running” with full CLI usage and options for generate_answers. No code or command behavior changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I twitch my nose at tidy prose,
New data paths the README shows;
CSV, Parquet, JSON gleam—
I hop through formats like a dream. 🥕

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
README-generate-answers.md (4)

40-42: Fix list style (markdownlint) and tighten wording.
Switch asterisks to dashes (MD004) and remove awkward spacing.

-* **CSV** – must contain two columns: `id` and `question`.
-  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+- **CSV** — must contain two columns: `id` and `question`.
+  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+  A header row with `id` and `question` is required (UTF-8).

43-49: Make the CSV sample copy-pasteable.
Add the missing question mark and remove the ellipsis line (invalid CSV).

   ```csv
   id,question
   1,How do I enable VM high availability in my cluster?
   2,How do I migrate a VM to a different project?
-  3,How do I manage RBAC in OpenShift Virtualization
-  ...
+  3,How do I manage RBAC in OpenShift Virtualization?

---

`50-52`: **Align list style and capitalization; clarify phrasing.**
Use dashes (MD004), capitalize Parquet, and add article “the.”


```diff
-* **Parquet** – Lightspeed evaluation parquet format is supported.
-* **JSON** – Lightspeed evaluation JSON format is supported.
+- **Parquet** — The Lightspeed evaluation Parquet format is supported.
+- **JSON** — The Lightspeed evaluation JSON format is supported.

50-52: Document the schema or detection rules for Parquet/JSON.
Readers won’t know expected fields or how formats are detected. Please add a one-liner (fields: id, question) and whether detection is by file extension or auto-sniffing. Also standardize product casing across the doc (“LightSpeed” vs “lightspeed-core” vs “Lightspeed”).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ca9a863 and f2ba4ed.

📒 Files selected for processing (1)
  • README-generate-answers.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README-generate-answers.md

[grammar] ~40-~40: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...

(QB_NEW_EN)


[grammar] ~50-~50: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
README-generate-answers.md

40-40: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


50-50: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


51-51: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pylinter
  • GitHub Check: mypy
🔇 Additional comments (1)
README-generate-answers.md (1)

36-39: Good addition: clear new “Input Data” section.
No issues with placement or intent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
README-generate-answers.md (2)

36-38: Tighten wording; consider documenting failure behavior when a model is unavailable.

Minor phrasing fix and clearer imperative voice. Also, please document whether generate_answers fails fast or skips entries when a model in models_to_evaluate is missing/unavailable.

-You use the `models_to_evaluate` list to select which of the available models will be used for answer generation.
-All models included in this list must also be defined in the `models` section **and** properly configured and available in the running lightspeed-core service.
+Use the `models_to_evaluate` list to choose which of the available models are used for answer generation.
+All models in this list must also be defined in the `models` section and be available in the running lightspeed-core service.

39-56: Unify list markers; tighten grammar; fix CSV example and punctuation; add minimal schema guidance.

  • Use dashes to match earlier lists (MD004).
  • Add “The” before “Lightspeed evaluation Parquet/JSON format”.
  • Add “Header row is required” to CSV; add missing “?”; replace the “...” row with a valid example.
  • If JSON/Parquet schemas exist, link them here.
-## Input Data
+## Input Data

-The tool supports multiple input formats for evaluation data:
+The tool supports multiple input formats for evaluation data:

-* **CSV** – must contain two columns: `id` and `question`.
-  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)
+- **CSV** – must contain two columns: `id` and `question`. Header row is required.
+  Example file: [`eval_data/questions.csv`](eval_data/questions.csv)

-  ```csv
-  id,question
-  1,How do I enable VM high availability in my cluster?
-  2,How do I migrate a VM to a different project?
-  3,How do I manage RBAC in OpenShift Virtualization
-  ...
-  ```
-* **Parquet** – Lightspeed evaluation parquet format is supported.
-* **JSON** – Lightspeed evaluation JSON format is supported.
+  ```csv
+  id,question
+  1,How do I enable VM high availability in my cluster?
+  2,How do I migrate a VM to a different project?
+  3,How do I manage RBAC in OpenShift Virtualization?
+  4,How do I back up a VM?
+  ```
+- **Parquet** – The Lightspeed evaluation Parquet format is supported.
+- **JSON** – The Lightspeed evaluation JSON format is supported.

Follow-up:

  • Please add links or a brief field schema for the Parquet/JSON formats, or note where they’re defined (e.g., a pydantic/dataclass schema).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f2ba4ed and 12311b0.

📒 Files selected for processing (1)
  • README-generate-answers.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README-generate-answers.md

[grammar] ~43-~43: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
README-generate-answers.md

43-43: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


53-53: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


54-54: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit a10547d into lightspeed-core:main Sep 2, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants