docs: Add input data to generate answers documentation#36
docs: Add input data to generate answers documentation#36tisnik merged 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughAdds an “Input Data” section to README-generate-answers.md describing supported evaluation data formats (CSV with required Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
README-generate-answers.md (4)
40-42: Fix list style (markdownlint) and tighten wording.
Switch asterisks to dashes (MD004) and remove awkward spacing.-* **CSV** – must contain two columns: `id` and `question`. - Example file: [`eval_data/questions.csv`](eval_data/questions.csv) +- **CSV** — must contain two columns: `id` and `question`. + Example file: [`eval_data/questions.csv`](eval_data/questions.csv) + A header row with `id` and `question` is required (UTF-8).
43-49: Make the CSV sample copy-pasteable.
Add the missing question mark and remove the ellipsis line (invalid CSV).```csv id,question 1,How do I enable VM high availability in my cluster? 2,How do I migrate a VM to a different project? - 3,How do I manage RBAC in OpenShift Virtualization - ... + 3,How do I manage RBAC in OpenShift Virtualization?--- `50-52`: **Align list style and capitalization; clarify phrasing.** Use dashes (MD004), capitalize Parquet, and add article “the.” ```diff -* **Parquet** – Lightspeed evaluation parquet format is supported. -* **JSON** – Lightspeed evaluation JSON format is supported. +- **Parquet** — The Lightspeed evaluation Parquet format is supported. +- **JSON** — The Lightspeed evaluation JSON format is supported.
50-52: Document the schema or detection rules for Parquet/JSON.
Readers won’t know expected fields or how formats are detected. Please add a one-liner (fields:id,question) and whether detection is by file extension or auto-sniffing. Also standardize product casing across the doc (“LightSpeed” vs “lightspeed-core” vs “Lightspeed”).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
README-generate-answers.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README-generate-answers.md
[grammar] ~40-~40: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...
(QB_NEW_EN)
[grammar] ~50-~50: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
README-generate-answers.md
40-40: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
50-50: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
51-51: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pylinter
- GitHub Check: mypy
🔇 Additional comments (1)
README-generate-answers.md (1)
36-39: Good addition: clear new “Input Data” section.
No issues with placement or intent.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
README-generate-answers.md (2)
36-38: Tighten wording; consider documenting failure behavior when a model is unavailable.Minor phrasing fix and clearer imperative voice. Also, please document whether
generate_answersfails fast or skips entries when a model inmodels_to_evaluateis missing/unavailable.-You use the `models_to_evaluate` list to select which of the available models will be used for answer generation. -All models included in this list must also be defined in the `models` section **and** properly configured and available in the running lightspeed-core service. +Use the `models_to_evaluate` list to choose which of the available models are used for answer generation. +All models in this list must also be defined in the `models` section and be available in the running lightspeed-core service.
39-56: Unify list markers; tighten grammar; fix CSV example and punctuation; add minimal schema guidance.
- Use dashes to match earlier lists (MD004).
- Add “The” before “Lightspeed evaluation Parquet/JSON format”.
- Add “Header row is required” to CSV; add missing “?”; replace the “...” row with a valid example.
- If JSON/Parquet schemas exist, link them here.
-## Input Data +## Input Data -The tool supports multiple input formats for evaluation data: +The tool supports multiple input formats for evaluation data: -* **CSV** – must contain two columns: `id` and `question`. - Example file: [`eval_data/questions.csv`](eval_data/questions.csv) +- **CSV** – must contain two columns: `id` and `question`. Header row is required. + Example file: [`eval_data/questions.csv`](eval_data/questions.csv) - ```csv - id,question - 1,How do I enable VM high availability in my cluster? - 2,How do I migrate a VM to a different project? - 3,How do I manage RBAC in OpenShift Virtualization - ... - ``` -* **Parquet** – Lightspeed evaluation parquet format is supported. -* **JSON** – Lightspeed evaluation JSON format is supported. + ```csv + id,question + 1,How do I enable VM high availability in my cluster? + 2,How do I migrate a VM to a different project? + 3,How do I manage RBAC in OpenShift Virtualization? + 4,How do I back up a VM? + ``` +- **Parquet** – The Lightspeed evaluation Parquet format is supported. +- **JSON** – The Lightspeed evaluation JSON format is supported.Follow-up:
- Please add links or a brief field schema for the Parquet/JSON formats, or note where they’re defined (e.g., a pydantic/dataclass schema).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
README-generate-answers.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README-generate-answers.md
[grammar] ~43-~43: There might be a mistake here.
Context: ...ontain two columns: id and question. Example file: [`eval_data/questions.cs...
(QB_NEW_EN)
[grammar] ~53-~53: There might be a mistake here.
Context: ... evaluation parquet format is supported. * JSON – Lightspeed evaluation JSON form...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
README-generate-answers.md
43-43: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
53-53: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
54-54: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
Summary by CodeRabbit