-
Notifications
You must be signed in to change notification settings - Fork 15
Add RAG FiQA MRR optimization community notebook #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds rag_fiqa_mrr_optimization.ipynb from the AI Winter 2025 competition notebooks repo (RapidFireAI/ai-winter-2025-competition-notebooks). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| " recalls.append(recall)\n", | ||
| " f1_scores.append(f1)\n", | ||
| " ndcgs.append(compute_ndcg_at_k(retrieved_set, expected_set, k=5))\n", | ||
| " rrs.append(compute_rr(retrieved_set, expected_set))\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set conversion destroys ordering for rank-sensitive metrics
High Severity
The ordered list of retrieved documents (pred) is converted to a Python set via set(pred), which destroys the ranking order. This retrieved_set is then passed to compute_ndcg_at_k and compute_rr, both of which are rank-sensitive metrics that depend on document position. Iterating a set yields arbitrary order, so NDCG and MRR — the notebook's primary optimization target — produce meaningless, non-deterministic values.
Additional Locations (1)
| " ideal_relevance = [3] * ideal_length + [0] * (k - ideal_length)\n", | ||
| " idcg = sum(rel / math.log2(i + 2) for i, rel in enumerate(ideal_relevance))\n", | ||
| "\n", | ||
| " return dcg / idcg if idcg > 0 else 0.0\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NDCG uses mismatched relevance scales in DCG vs IDCG
Medium Severity
In compute_ndcg_at_k, the actual DCG is computed with binary relevance values (0 or 1), but the ideal DCG (idcg) uses a relevance value of 3 for each relevant document. This mismatch means the NDCG score is systematically scaled down by a factor of ~3, making the metric incorrect. Both DCG and IDCG need to use the same relevance scale.
…lation code and cleaning up plotting logic. This simplifies the notebook and enhances readability.


Summary
rag_fiqa_mrr_optimization.ipynbto thecommunity_notebooks/folderTest plan
Made with Cursor
Note
Low Risk
Adds a standalone notebook only; no library/runtime code paths are modified, with risk limited to notebook execution/dependency assumptions.
Overview
Adds a new community Colab notebook,
community_notebooks/rag_fiqa_mrr_optimization.ipynb, that runs a RapidFire AI multi-config RAG evaluation on the FiQA dataset.The notebook installs/initializes RapidFire AI, downsamples and filters FiQA queries/corpus, grid-searches over RAG chunking and reranker
top_nsettings with a vLLM Qwen generator, computes retrieval metrics (including MRR), and outputs a results DataFrame plus simple metric plots/log viewing helpers.Written by Cursor Bugbot for commit 4099c2b. This will update automatically on new commits. Configure here.