Skip to content

Conversation

@kamran-rapidfireAI
Copy link
Collaborator

@kamran-rapidfireAI kamran-rapidfireAI commented Feb 10, 2026

Summary

Test plan

  • Verify notebook renders correctly on GitHub
  • Confirm notebook cells execute without errors

Made with Cursor


Note

Low Risk
Adds a standalone notebook only; no library/runtime code paths are modified, with risk limited to notebook execution/dependency assumptions.

Overview
Adds a new community Colab notebook, community_notebooks/rag_fiqa_mrr_optimization.ipynb, that runs a RapidFire AI multi-config RAG evaluation on the FiQA dataset.

The notebook installs/initializes RapidFire AI, downsamples and filters FiQA queries/corpus, grid-searches over RAG chunking and reranker top_n settings with a vLLM Qwen generator, computes retrieval metrics (including MRR), and outputs a results DataFrame plus simple metric plots/log viewing helpers.

Written by Cursor Bugbot for commit 4099c2b. This will update automatically on new commits. Configure here.

Adds rag_fiqa_mrr_optimization.ipynb from the AI Winter 2025 competition notebooks repo (RapidFireAI/ai-winter-2025-competition-notebooks).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

" recalls.append(recall)\n",
" f1_scores.append(f1)\n",
" ndcgs.append(compute_ndcg_at_k(retrieved_set, expected_set, k=5))\n",
" rrs.append(compute_rr(retrieved_set, expected_set))\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set conversion destroys ordering for rank-sensitive metrics

High Severity

The ordered list of retrieved documents (pred) is converted to a Python set via set(pred), which destroys the ranking order. This retrieved_set is then passed to compute_ndcg_at_k and compute_rr, both of which are rank-sensitive metrics that depend on document position. Iterating a set yields arbitrary order, so NDCG and MRR — the notebook's primary optimization target — produce meaningless, non-deterministic values.

Additional Locations (1)

Fix in Cursor Fix in Web

" ideal_relevance = [3] * ideal_length + [0] * (k - ideal_length)\n",
" idcg = sum(rel / math.log2(i + 2) for i, rel in enumerate(ideal_relevance))\n",
"\n",
" return dcg / idcg if idcg > 0 else 0.0\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDCG uses mismatched relevance scales in DCG vs IDCG

Medium Severity

In compute_ndcg_at_k, the actual DCG is computed with binary relevance values (0 or 1), but the ideal DCG (idcg) uses a relevance value of 3 for each relevant document. This mismatch means the NDCG score is systematically scaled down by a factor of ~3, making the metric incorrect. Both DCG and IDCG need to use the same relevance scale.

Fix in Cursor Fix in Web

…lation code and cleaning up plotting logic. This simplifies the notebook and enhances readability.
@kamran-rapidfireAI kamran-rapidfireAI marked this pull request as draft February 10, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant