-
Notifications
You must be signed in to change notification settings - Fork 8
Added Batch Integration Benchmarking and Auto Benchmarking Logs #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the benchmarking framework with integration-quality metrics, automatic result persistence, and refines the interactive agent testing loop. It also introduces a structured multi-agent configuration and updates dependencies and VCS ignores.
- Add SCIB-based integration metrics and persist benchmark outputs with code snippets.
- Refactor
input_loopto support recursive continuation and graceful exit. - Define agent roles and delegation in a JSON system file; update
.gitignoreand replacescibwithscib-metrics.
Reviewed Changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarking/sandbox/requirements.txt | Swapped scib for scib-metrics in dependencies. |
| benchmarking/prompt_testing/MultiAgentTester.py | Refactored user input loop; added recursive benchmarking. |
| benchmarking/prompt_testing/MultiAgentAutoTester.py | Implemented JSONL persistence, code‐snippet dumping. |
| benchmarking/auto_metrics/IntegrationMetrics.py | New IntegrationMetric class using scib-metrics. |
| benchmarking/agents/integration_system.json | Configured three specialized agents with delegation rules. |
| benchmarking/agents/AgentSystem.py | Appended strict delegation formatting to prompts. |
| benchmarking/.gitignore | Added *.pyc to ignore Python bytecode files. |
Comments suppressed due to low confidence (1)
benchmarking/auto_metrics/IntegrationMetrics.py:11
AutoMetricis not imported, causing a NameError. Add the appropriate import (e.g.,from benchmarking.auto_metrics.BaseMetric import AutoMetric).
class IntegrationMetric(AutoMetric):
| return "break" | ||
| if user_in.lower() == "benchmark" and benchmark_module: | ||
| run_benchmark(mgr, benchmark_module) | ||
| input_loop() # Recurse to continue the loop after benchmarks |
Copilot
AI
Jul 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recursive call to input_loop() is missing a return, so the result isn't propagated back and may lead to incorrect control flow. Change to return input_loop().
| input_loop() # Recurse to continue the loop after benchmarks | |
| return input_loop() # Recurse to continue the loop after benchmarks |
| import anndata | ||
| import numpy as np | ||
|
|
Copilot
AI
Jul 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Neither anndata nor numpy are used in this module; consider removing these imports to reduce unused dependencies.
| import anndata | |
| import numpy as np |
| } | ||
| }, | ||
| "integration_expert": { | ||
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python", |
Copilot
AI
Jul 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: remeber should be remember.
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python", | |
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remember to wrap your code in triple backticks and python", |
| full_prompt += f"\n- Command: `{name}`" | ||
| full_prompt += f"\n - Description: {command.description}" | ||
| full_prompt += f"\n - Target Agent: {command.target_agent}" | ||
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." |
Copilot
AI
Jul 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is inside the loop that appends each command, causing it to repeat multiple times. Move it outside the loop so it appears just once.
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." | |
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." |
This pull request introduces significant enhancements to the benchmarking framework, including new agent configurations, integration quality metrics, and benchmarking persistence. Additionally, it includes minor updates to
.gitignoreand dependencies. Below is a breakdown of the most important changes:Enhancements to Benchmarking Framework
IntegrationMetricclass inbenchmarking/auto_metrics/IntegrationMetrics.pyto compute SCIB integration quality metrics (e.g., batch silhouette, cell type silhouette, isolated label F1) usingscib-metrics. This provides a detailed evaluation of single-cell data integration quality.benchmarking/prompt_testing/MultiAgentAutoTester.pyto persist benchmarking results, including metadata, metrics, and code snippets. Results are stored in JSONL format, and code snippets are saved as separate files for reproducibility. [1] [2] [3] [4] [5] [6]Multi-Agent System Enhancements
integration_system.jsonfile to define three agents (master_agent,general_coder,integration_expert) with specialized roles and delegation commands for single-cell analysis tasks. This establishes a clear hierarchy and task delegation mechanism.Codebase Improvements
benchmarking/prompt_testing/MultiAgentTester.pyto handle user input more cleanly, including recursive continuation after benchmarks and graceful exit handling. [1] [2]Miscellaneous Updates
scibwithscib-metricsinbenchmarking/sandbox/requirements.txtto align with the new integration metrics implementation..gitignoreUpdate: Added*.pycto the.gitignorefile to exclude Python bytecode files from version control.These changes collectively enhance the benchmarking system's capabilities, improve maintainability, and ensure better organization of results and agent configurations.