Skip to content

Conversation

@djriffle
Copy link
Member

@djriffle djriffle commented Jul 8, 2025

This pull request introduces significant enhancements to the benchmarking framework, including new agent configurations, integration quality metrics, and benchmarking persistence. Additionally, it includes minor updates to .gitignore and dependencies. Below is a breakdown of the most important changes:

Enhancements to Benchmarking Framework

  • Integration Quality Metrics: Added a new IntegrationMetric class in benchmarking/auto_metrics/IntegrationMetrics.py to compute SCIB integration quality metrics (e.g., batch silhouette, cell type silhouette, isolated label F1) using scib-metrics. This provides a detailed evaluation of single-cell data integration quality.
  • Benchmarking Persistence: Introduced functionality in benchmarking/prompt_testing/MultiAgentAutoTester.py to persist benchmarking results, including metadata, metrics, and code snippets. Results are stored in JSONL format, and code snippets are saved as separate files for reproducibility. [1] [2] [3] [4] [5] [6]

Multi-Agent System Enhancements

  • Agent Configuration: Added a new integration_system.json file to define three agents (master_agent, general_coder, integration_expert) with specialized roles and delegation commands for single-cell analysis tasks. This establishes a clear hierarchy and task delegation mechanism.

Codebase Improvements

  • Input Loop Refactoring: Refactored the input loop in benchmarking/prompt_testing/MultiAgentTester.py to handle user input more cleanly, including recursive continuation after benchmarks and graceful exit handling. [1] [2]

Miscellaneous Updates

  • Dependency Update: Replaced scib with scib-metrics in benchmarking/sandbox/requirements.txt to align with the new integration metrics implementation.
  • .gitignore Update: Added *.pyc to the .gitignore file to exclude Python bytecode files from version control.

These changes collectively enhance the benchmarking system's capabilities, improve maintainability, and ensure better organization of results and agent configurations.

@djriffle djriffle requested a review from Copilot July 8, 2025 18:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the benchmarking framework with integration-quality metrics, automatic result persistence, and refines the interactive agent testing loop. It also introduces a structured multi-agent configuration and updates dependencies and VCS ignores.

  • Add SCIB-based integration metrics and persist benchmark outputs with code snippets.
  • Refactor input_loop to support recursive continuation and graceful exit.
  • Define agent roles and delegation in a JSON system file; update .gitignore and replace scib with scib-metrics.

Reviewed Changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
benchmarking/sandbox/requirements.txt Swapped scib for scib-metrics in dependencies.
benchmarking/prompt_testing/MultiAgentTester.py Refactored user input loop; added recursive benchmarking.
benchmarking/prompt_testing/MultiAgentAutoTester.py Implemented JSONL persistence, code‐snippet dumping.
benchmarking/auto_metrics/IntegrationMetrics.py New IntegrationMetric class using scib-metrics.
benchmarking/agents/integration_system.json Configured three specialized agents with delegation rules.
benchmarking/agents/AgentSystem.py Appended strict delegation formatting to prompts.
benchmarking/.gitignore Added *.pyc to ignore Python bytecode files.
Comments suppressed due to low confidence (1)

benchmarking/auto_metrics/IntegrationMetrics.py:11

  • AutoMetric is not imported, causing a NameError. Add the appropriate import (e.g., from benchmarking.auto_metrics.BaseMetric import AutoMetric).
class IntegrationMetric(AutoMetric):

return "break"
if user_in.lower() == "benchmark" and benchmark_module:
run_benchmark(mgr, benchmark_module)
input_loop() # Recurse to continue the loop after benchmarks
Copy link

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive call to input_loop() is missing a return, so the result isn't propagated back and may lead to incorrect control flow. Change to return input_loop().

Suggested change
input_loop() # Recurse to continue the loop after benchmarks
return input_loop() # Recurse to continue the loop after benchmarks

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6
import anndata
import numpy as np

Copy link

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Neither anndata nor numpy are used in this module; consider removing these imports to reduce unused dependencies.

Suggested change
import anndata
import numpy as np

Copilot uses AI. Check for mistakes.
}
},
"integration_expert": {
"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python",
Copy link

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: remeber should be remember.

Suggested change
"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python",
"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remember to wrap your code in triple backticks and python",

Copilot uses AI. Check for mistakes.
full_prompt += f"\n- Command: `{name}`"
full_prompt += f"\n - Description: {command.description}"
full_prompt += f"\n - Target Agent: {command.target_agent}"
full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."
Copy link

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is inside the loop that appends each command, causing it to repeat multiple times. Move it outside the loop so it appears just once.

Suggested change
full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."
full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."

Copilot uses AI. Check for mistakes.
@djriffle djriffle closed this Jul 8, 2025
@djriffle djriffle deleted the IntegrationBenchmarking branch July 8, 2025 19:03
@djriffle djriffle restored the IntegrationBenchmarking branch July 8, 2025 19:11
@djriffle djriffle reopened this Jul 8, 2025
@djriffle djriffle merged commit 0112634 into main Jul 8, 2025
3 checks passed
@djriffle djriffle deleted the IntegrationBenchmarking branch July 9, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants