Add provider for CUA Cloud V2 batch job execution#1079
Add provider for CUA Cloud V2 batch job execution#1079r33drichards wants to merge 3 commits intomainfrom
Conversation
Add a new session provider that hits the /v1/batch-jobs API to run CUABench evaluations on Incus VMs via the CloudV2 infrastructure. Supports arbitrary solver images, configurable parallelism, and per-task timeouts. Register as 'incus' or 'cloudv2' provider in the session manager. https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
📦 Publishable packages changed
Add |
Rename the provider file from incus.py to cua_cloud.py and the class from IncusProvider to CuaCloudProvider. Register as 'cua_cloud' or 'cloudv2' in the session manager. https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH
📦 Publishable packages changed
Add |
📦 Publishable packages changed
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py`:
- Around line 58-75: The provider opens an aiohttp.ClientSession in
_get_http_client but never ensures _close_http_client is called, leaking
connections; add async context manager support on the provider (implement
__aenter__ to call/return self and __aexit__ to await self._close_http_client())
or expose a public async close() that awaits _close_http_client and update
callers to use "async with <Provider>()" or call await provider.close();
reference the existing methods _get_http_client and _close_http_client when
adding the lifecycle methods so the session is always closed.
🧹 Nitpick comments (4)
libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py (4)
40-51: SQLite connection should use context manager; silent exception swallowing may hide issues.The connection isn't guaranteed to close if an exception occurs between
connect()andclose(). Additionally, catching all exceptions silently could mask important errors (e.g., permission issues, corrupted database).♻️ Proposed fix using context manager
if creds_path.exists(): try: import sqlite3 - conn = sqlite3.connect(str(creds_path)) - cursor = conn.cursor() - cursor.execute("SELECT value FROM credentials WHERE key = 'api_key'") - row = cursor.fetchone() - conn.close() - if row: - return row[0] - except Exception: - pass + with sqlite3.connect(str(creds_path)) as conn: + cursor = conn.cursor() + cursor.execute("SELECT value FROM credentials WHERE key = 'api_key'") + row = cursor.fetchone() + if row: + return row[0] + except (sqlite3.Error, OSError): + pass # Fall through to raise ValueError below
220-236: Phase mapping is case-sensitive; consider normalizing.If the API returns phases with different casing (e.g., "Pending" vs "pending"), the mapping will fall through to the raw value, potentially causing inconsistent status handling downstream.
♻️ Normalize phase to lowercase
# Map batch job phase to local status - phase = result.get("phase", "unknown") + phase = result.get("phase", "unknown").lower() status_map = {
267-291: Method returns status summary, not logs; consider clarifying.The method name
get_session_logssuggests log retrieval, but it returns a status summary. This is likely adapting to theSessionProviderinterface where actual logs aren't available from the batch API. The docstring correctly describes the behavior, but consider adding a note explaining why logs aren't available.
347-367: Client-side pagination could be inefficient for large task lists.All results are fetched before applying the status filter and pagination locally. For batch jobs with many tasks, this fetches more data than needed. If the API supports server-side filtering/pagination, consider using it.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
📦 Publishable packages changed
|
Summary
This PR adds support for running CUABench evaluations on Incus VMs through the CUA Cloud V2 API. The new
IncusProviderenables batch job execution with automatic VM provisioning and solver container orchestration.Key Changes
New
IncusProviderclass (libs/cua-bench/cua_bench/sessions/providers/incus.py):SessionProviderinterface for Incus VM-based benchmark execution/v1/batch-jobsAPI for creating and managing batch jobsCUA_API_KEYenvironment variable or stored credentialsstart_session,get_session_status,stop_session,get_session_logs,get_results, andlist_tasksUpdated provider factory (
libs/cua-bench/cua_bench/sessions/manager.py):make()factory functionNotable Implementation Details
CUA_API_KEY) and SQLite credential storage (~/.cua/cli.sqlite)https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH
Summary by CodeRabbit