Add provider for CUA Cloud V2 batch job execution by r33drichards · Pull Request #1079 · trycua/cua

r33drichards · 2026-02-12T04:26:12Z

Summary

This PR adds support for running CUABench evaluations on Incus VMs through the CUA Cloud V2 API. The new IncusProvider enables batch job execution with automatic VM provisioning and solver container orchestration.

Key Changes

New IncusProvider class (libs/cua-bench/cua_bench/sessions/providers/incus.py):
- Implements SessionProvider interface for Incus VM-based benchmark execution
- Supports CUA Cloud V2 /v1/batch-jobs API for creating and managing batch jobs
- Each batch job provisions N Incus VMs (one per task) running cua-xfce desktop + solver container
- Handles API authentication via CUA_API_KEY environment variable or stored credentials
- Implements core session lifecycle methods: start_session, get_session_status, stop_session, get_session_logs, get_results, and list_tasks
Updated provider factory (libs/cua-bench/cua_bench/sessions/manager.py):
- Added support for "incus" and "cloudv2" provider names in make() factory function
- Updated error message to document all supported providers

Notable Implementation Details

Authentication: Supports both environment variable (CUA_API_KEY) and SQLite credential storage (~/.cua/cli.sqlite)
Configuration: Accepts flexible solver configuration including agent name, model, max steps, parallelism, and VM image selection
Environment variables: Automatically passes through API keys (Anthropic, OpenAI, Google) to solver containers
Status mapping: Maps CUA Cloud batch job phases to local status values (pending, starting, running, completed, failed, stopped)
Async HTTP client: Uses aiohttp with proper session management and timeout handling
Error handling: Provides specific error messages for authentication failures, rate limiting, and connection issues

https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH

Summary by CodeRabbit

New Features
- Introduced new CUA Cloud provider enabling benchmark execution on cloud virtual machines with support for both API key and local credential authentication
- Cloud provider variant supports comprehensive session management including batch job submission, status tracking, log retrieval, and results collection with optional pagination
- Seamlessly integrates with existing provider selection mechanism

Add a new session provider that hits the /v1/batch-jobs API to run CUABench evaluations on Incus VMs via the CloudV2 infrastructure. Supports arbitrary solver images, configurable parallelism, and per-task timeouts. Register as 'incus' or 'cloudv2' provider in the session manager. https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH

vercel · 2026-02-12T04:26:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Feb 12, 2026 4:57am

coderabbitai · 2026-02-12T04:26:20Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (1)

rabbit

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/connect-cuabench-cloud-vm-tEIzq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-12T04:26:23Z

📦 Publishable packages changed

pypi/bench

Add release:<service> labels to auto-release on merge (+ optional bump:minor or bump:major, default is patch).
Or add no-release to skip.

Rename the provider file from incus.py to cua_cloud.py and the class from IncusProvider to CuaCloudProvider. Register as 'cua_cloud' or 'cloudv2' in the session manager. https://claude.ai/code/session_01N62q5oNTPtXfTNZXsqiCyH

github-actions · 2026-02-12T04:28:08Z

📦 Publishable packages changed

pypi/bench

Add release:<service> labels to auto-release on merge (+ optional bump:minor or bump:major, default is patch).
Or add no-release to skip.

github-actions · 2026-02-12T04:31:14Z

📦 Publishable packages changed

pypi/bench — will auto-release on merge

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py`:
- Around line 58-75: The provider opens an aiohttp.ClientSession in
_get_http_client but never ensures _close_http_client is called, leaking
connections; add async context manager support on the provider (implement
__aenter__ to call/return self and __aexit__ to await self._close_http_client())
or expose a public async close() that awaits _close_http_client and update
callers to use "async with <Provider>()" or call await provider.close();
reference the existing methods _get_http_client and _close_http_client when
adding the lifecycle methods so the session is always closed.

🧹 Nitpick comments (4)

libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py (4)
40-51: SQLite connection should use context manager; silent exception swallowing may hide issues.

The connection isn't guaranteed to close if an exception occurs between connect() and close(). Additionally, catching all exceptions silently could mask important errors (e.g., permission issues, corrupted database).
♻️ Proposed fix using context manager
         if creds_path.exists():
             try:
                 import sqlite3

-                conn = sqlite3.connect(str(creds_path))
-                cursor = conn.cursor()
-                cursor.execute("SELECT value FROM credentials WHERE key = 'api_key'")
-                row = cursor.fetchone()
-                conn.close()
-                if row:
-                    return row[0]
-            except Exception:
-                pass
+                with sqlite3.connect(str(creds_path)) as conn:
+                    cursor = conn.cursor()
+                    cursor.execute("SELECT value FROM credentials WHERE key = 'api_key'")
+                    row = cursor.fetchone()
+                    if row:
+                        return row[0]
+            except (sqlite3.Error, OSError):
+                pass  # Fall through to raise ValueError below
220-236: Phase mapping is case-sensitive; consider normalizing.

If the API returns phases with different casing (e.g., "Pending" vs "pending"), the mapping will fall through to the raw value, potentially causing inconsistent status handling downstream.
♻️ Normalize phase to lowercase
         # Map batch job phase to local status
-        phase = result.get("phase", "unknown")
+        phase = result.get("phase", "unknown").lower()
         status_map = {
267-291: Method returns status summary, not logs; consider clarifying.

The method name get_session_logs suggests log retrieval, but it returns a status summary. This is likely adapting to the SessionProvider interface where actual logs aren't available from the batch API. The docstring correctly describes the behavior, but consider adding a note explaining why logs aren't available.

347-367: Client-side pagination could be inefficient for large task lists.

All results are fetched before applying the status filter and pagination locally. For batch jobs with many tasks, this fetches more data than needed. If the API supports server-side filtering/pagination, consider using it.

libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

github-actions · 2026-02-12T04:55:40Z

📦 Publishable packages changed

pypi/bench — will auto-release on merge

r33drichards changed the title ~~Add Incus provider for CUA Cloud V2 batch job execution~~ Add provider for CUA Cloud V2 batch job execution Feb 12, 2026

vercel bot deployed to Preview February 12, 2026 04:27 View deployment

vercel bot deployed to Preview February 12, 2026 04:30 View deployment

r33drichards added the release:pypi/bench Release pypi/bench on merge label Feb 12, 2026

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py Show resolved Hide resolved

Update libs/cua-bench/cua_bench/sessions/providers/cua_cloud.py

5038149

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

vercel bot deployed to Preview February 12, 2026 04:57 View deployment

f-trycua approved these changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add provider for CUA Cloud V2 batch job execution#1079

Add provider for CUA Cloud V2 batch job execution#1079
r33drichards wants to merge 3 commits intomainfrom
claude/connect-cuabench-cloud-vm-tEIzq

r33drichards commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited by r33drichards

Loading

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited by r33drichards

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

r33drichards commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Notable Implementation Details

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Feb 12, 2026 • edited by r33drichards Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Publishable packages changed

Uh oh!

github-actions bot commented Feb 12, 2026

📦 Publishable packages changed

Uh oh!

github-actions bot commented Feb 12, 2026

📦 Publishable packages changed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 12, 2026 • edited by r33drichards Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Publishable packages changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

r33drichards commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Feb 12, 2026 •

edited

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

github-actions bot commented Feb 12, 2026 •

edited by r33drichards

Loading

github-actions bot commented Feb 12, 2026 •

edited by r33drichards

Loading