Add custom task system for BrowserGym environment by atchudhansg · Pull Request #201 · huggingface/OpenEnv

atchudhansg · 2025-11-16T06:45:24Z

Add Custom Task System for BrowserGym

Summary

Adds a custom task framework that allows users to create domain-specific BrowserGym tasks without depending on official benchmark packages.

Changes

Custom task framework: Base classes, models, and registry system for creating tasks
Example tasks: Copy-paste (single-tab) and copy-paste-multitab implementations
Documentation: Comprehensive guide in README with examples and best practices
Example script: browsergym_custom_example.py demonstrating usage
Test coverage: Basic tests for custom task system

Key Features

No BrowserGym package dependencies required
Direct Playwright browser control
Simple registration system
Rapid prototyping for domain-specific tasks
Follows MiniWoB design philosophy (minimal HTML, functional UI)

Usage

env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "custom",
    "BROWSERGYM_TASK_NAME": "copy-paste"
})

Files Changed

custom - Custom task system implementation
README.md - Updated with custom tasks documentation
browsergym_custom_example.py - Usage example
.gitignore - Added myenv to exclude virtual environments

pankit-eng · 2025-12-03T17:57:39Z

@atchudhansg - thanks for sharing this PR. Supporting generic integration with training tasks is definitely part of the roadmap(cc: @Darktex ) but we are still in early stages of other features. That being said, does it make sense to have an RFC on how we will support tasks more generically that run inside/alongside OpenEnv? This wont be specific to browser tasks only. Let me know what you think.

atchudhansg · 2025-12-06T12:29:59Z

Thanks for the feedback @pankit-eng ! I completely agree that a generic approach for custom tasks across all OpenEnv environments (not just BrowserGym) would be valuable.

I've drafted an RFC (RFC 005) to propose a standardized interface for this. The core idea is to allow users to inject task logic (rewards, observations, termination) at runtime without needing to rebuild containers or modify the core environment code.

I've added the RFC file (rfcs/005-generic-task-integration.md) to this PR for review. Let me know your thoughts on the proposed interface!

Darktex

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.

Based on my review of the PR, design documents, and existing codebase, I can now provide my comprehensive alignment review.

PR #201 Review: Add Custom Task System for BrowserGym Environment

Previous Review Summary

Key points from @pankit-eng and @atchudhansg discussion:

Need for generic task integration across all OpenEnv environments, not just BrowserGym
RFC 005 drafted to propose standardized interface for custom tasks
Core idea: Allow runtime injection of task logic (rewards, observations, termination) without rebuilding containers

Tier 1: Bugs, Lint, Security, Debug Code

❌ CRITICAL: Incomplete Code in `custom_base.py`

Location: src/envs/browsergym_env/server/custom/custom_base.py:310

The _get_or_create_event_loop method is truncated mid-implementation:

def step(self, action: CustomGymAction) -> CustomGymObservation:
    loop = self._get_or_create_event_loop()
    return loop.run_until_complete(self._async_step(action))

# Code cuts off here - missing async_close and other methods

Impact: This will cause runtime errors. The file appears incomplete.

Fix needed: Complete the implementation or verify the diff is showing the full file.

⚠️ Lint Issues

The lint check failed due to missing uv:

Error: 'uv' is not installed or not in PATH

Action required: Run lint manually with ruff format --check once uv is available, or use Docker-based lint.

⚠️ Debug Code in Example

Location: examples/browsergym_custom_example.py

Multiple print() statements throughout the example file (lines 21-128). While acceptable for examples, consider whether this should use logging for consistency.

Recommendation: Examples are typically allowed debug output. No action required unless project style dictates otherwise.

⚠️ Emoji Removal

Location: Multiple files in README updates

The PR removes emojis from documentation:

Changed from Recommended to Recommended
Removed various visual indicators

Observation: This aligns with project style (CLAUDE.md says "only use emojis if user explicitly requests"). Good cleanup.

🔒 Security: No Critical Issues

No credential exposure detected
Container isolation maintained
No SQL injection or XSS vulnerabilities introduced
File paths properly validated

Tier 2: Alignment with Principles & Invariants

🚨 ALIGNMENT FLAG #1: RFC Required for Architectural Change

Principle at stake: "Changes affecting core APIs or architecture require RFCs" (CONTRIBUTING.md:28-34)

The concern:
This PR introduces a parallel environment system (CustomBrowserGymEnvironment) that bypasses the standard BrowserGym integration pattern. This is a significant architectural decision that:

Adds a second environment implementation path - Custom tasks use Playwright directly while official benchmarks use BrowserGym's gym registration
Introduces new abstractions - CustomGymAction, CustomGymObservation, CustomGymState parallel the existing models
Changes the interface contract - The benchmark="custom" parameter fundamentally changes how BrowserGymEnvironment operates

Per CONTRIBUTING.md:

Required for:

New core APIs in src/openenv/core/

Breaking changes to existing APIs

Major architectural decisions

New abstractions or design patterns

The proposed RFC 005 is currently "Draft" status and has critical issues (see FLAG #2 below).

Suggested reviewer: @pankit-eng (already engaged in discussion), @Darktex

Recommendation:

Do not merge until RFC 005 is reviewed and accepted
RFC should address whether this pattern should be generalized to other environments first
Consider whether this belongs in browsergym_env or deserves a separate "custom_env" module

🚨 ALIGNMENT FLAG #2: RFC 005 Has Wrong RFC Number

Location: rfcs/005-generic-task-integration.md:6

**RFC ID:** 004  # Should be 005!

Current RFCs on main:

000, 001, 002, 003 exist
Next available: 004 (not 005)

Additionally:

RFC 005 claims to be "004" in the ID field
File is named 005-generic-task-integration.md but should be 004-generic-task-integration.md

Fix needed: Rename file to 004-generic-task-integration.md and update RFC ID to 004.

⚠️ ALIGNMENT FLAG #3: Dual API Boundary Potentially Violated

Invariant at stake: "Dual API boundary" (INVARIANTS.md:40-58)

The concern:

The custom task system allows arbitrary JavaScript execution:

# In custom_base.py:_execute_action()
else:
    # Try to execute as JavaScript if not recognized
    await self._page.evaluate(action_str)

From INVARIANTS.md:

The Gym-like API is NOT accessible to the agent being trained.

Violations to flag:

Exposing reset(), step(), or state() via MCP tools

Giving agents direct access to the Gym-like WebSocket API

Any mechanism that lets an agent trigger simulation control

Analysis:

While this doesn't directly expose reset(), allowing arbitrary JavaScript could theoretically let an agent:

Manipulate the browser state in ways that affect reward calculation
Access browser APIs that could leak information about the simulation
Potentially interfere with the observation extraction logic

However, this appears to be standard BrowserGym behavior - official benchmarks also execute arbitrary JavaScript for complex actions.

Recommendation:

Document in RFC 005 that custom tasks inherit BrowserGym's action execution model
Clarify that JavaScript sandboxing is BrowserGym's responsibility
Consider whether custom tasks need additional guardrails

Suggested reviewer: @Darktex (architectural invariants)

⚠️ ALIGNMENT FLAG #4: Rewards Outside Environment Boundary?

Principle at stake: "Rewards inside environment" (PRINCIPLES.md:31, INVARIANTS.md:65-67)

The concern:

From INVARIANTS.md:

Reward computation must stay inside environment boundary

The custom task system allows users to define reward functions in task classes:

def _calculate_reward(self, page_data, action, error=None) -> float:
    """Calculate reward based on page state."""
    # User-defined reward logic

Analysis:

This appears compliant because:

Tasks are instantiated inside the container (server-side only)
Task classes are part of the environment implementation
Rewards are computed server-side and returned in observations

However, the RFC 005 proposal mentions:

"Users often need to test agents on proprietary or local tasks"

If "local tasks" means tasks defined client-side and somehow injected, this would violate the invariant.

The current implementation appears safe - tasks are defined in server/custom/custom_tasks.py on the server side.

Recommendation:

RFC 005 should explicitly state that custom tasks are server-side only
Clarify that reward logic cannot be injected from client
Document the volume mount mechanism mentioned in RFC 005 more clearly

⚠️ Principle Alignment: Client-Server Separation

Invariant: "Client-server separation" (INVARIANTS.md:59-62)

Observation:

The PR adds imports at the top of browsergym_environment.py:

# Add the server directory to sys.path
_SERVER_DIR = os.path.dirname(os.path.abspath(__file__))
if _SERVER_DIR not in sys.path:
    sys.path.insert(0, _SERVER_DIR)

# Import custom models
from custom.custom_models import (
    CustomGymAction,
    CustomGymObservation,
    CustomGymState,
)

Analysis:

This is acceptable because:

Server code importing other server code (same directory)
Client code in client.py does not import custom models
Custom types are converted to BrowserGymObservation before leaving the server

Compliant ✓

📋 Missing: Tests

Observation:

The PR adds:

New environment classes
New action parsing logic
New observation conversion logic
Error handling paths

But no tests are included.

From .claude/hooks/test.sh:

PYTHONPATH=src:envs uv run pytest tests/ -v --tb=short

Recommendation:

Add tests/envs/test_browsergym_custom.py
Test custom task registration
Test action parsing (especially edge cases in _execute_action)
Test observation conversion
Test error handling when custom task not found

📋 Missing: Documentation in RFC 005

Current RFC 005 gaps:

No implementation details - How does task registration work? Where do HTML files go?
No security model - Who can add custom tasks? How are they sandboxed?
No migration path - How does this relate to existing BrowserGym benchmarks?
Vague volume mounting - The RFC mentions volume mounting but the PR doesn't implement it
No rejection criteria - When should we reject a custom task approach?

From CONTRIBUTING.md, RFC template requires:

Architecture Overview
Core Abstractions
Key Design Decisions with rationale and trade-offs
Examples (✓ present)

Recommendation:
RFC 005 needs significant expansion before acceptance.

✅ Positive Alignment: Task-Specific Logic

Principle: "Domain knowledge encapsulated in env, not external" (PRINCIPLES.md:31)

The custom task system correctly keeps:

Reward logic inside environment
Termination logic inside environment
Observation extraction inside environment

This aligns well with the "Rewards inside environment" principle. ✓

✅ Positive: Type Safety Maintained

Invariant: "Generic type safety" (INVARIANTS.md:13-16)

The PR maintains type safety:

Custom types extend the same Pydantic patterns
Observation conversion preserves types
Generic types match across boundaries

Compliant ✓

Summary

Critical Issues (Must Fix Before Merge)

Incomplete code in custom_base.py - file appears truncated
RFC 005 wrong number - should be RFC 004
RFC 005 not accepted - still in "Draft", needs review per CONTRIBUTING.md

Alignment Flags (Need Human Review)

RFC required - Architectural change needs accepted RFC before merge
RFC 005 incomplete - Needs expansion on security, architecture, trade-offs
Dual API boundary - Arbitrary JavaScript execution needs documentation
Rewards boundary - Clarify that tasks are server-side only

Missing Components

Tests - No test coverage for new functionality
Lint verification - Unable to run due to uv missing

Positive Aspects

Maintains type safety ✓
Respects client-server separation ✓
Keeps rewards inside environment ✓
Good documentation in README ✓
Removes emojis per style guide ✓

Recommendation

DO NOT MERGE until:

✅ Complete the truncated code in custom_base.py
✅ Rename RFC to 004-generic-task-integration.md with correct ID
✅ Expand RFC 004 with architecture, security, trade-offs
✅ Get RFC 004 reviewed and accepted per CONTRIBUTING.md
✅ Add test coverage for custom task system
✅ Run and pass lint checks
✅ Address whether this should be generalized beyond BrowserGym

Timeline: Human review needed on architectural decisions before implementation can proceed.

Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

zkwentz · 2026-01-21T14:46:03Z

@greptile

greptile-apps · 2026-01-21T15:00:01Z

Greptile Summary

Introduces a custom task framework for BrowserGym environments that enables domain-specific tasks without official benchmark package dependencies, using direct Playwright browser control
Implements example copy-paste tasks (single-tab and multi-tab) with HTML templates, registration system, and comprehensive documentation
Adds dual-path architecture in browsergym_environment.py to support both standard benchmarks and custom tasks while maintaining API compatibility

Important Files Changed

Filename	Overview
`src/envs/browsergym_env/server/browsergym_environment.py`	Added custom benchmark support with dual initialization paths and observation type conversion
`src/envs/browsergym_env/server/custom/custom_base.py`	New base class providing Gym interface over Playwright with async/sync integration and action parsing
`src/envs/browsergym_env/server/custom/custom_tasks.py`	Task registry system with example copy-paste tasks and hardcoded text validation logic

Confidence score: 3/5

This PR requires careful review due to several implementation concerns that could cause runtime issues or security vulnerabilities
Score lowered due to sys.path manipulations that risk import conflicts, potential security issues with JavaScript execution fallbacks, async/sync integration complexity, and hardcoded validation logic that reduces maintainability
Pay close attention to custom_base.py and custom_tasks.py for async handling, JavaScript execution security, and path manipulation patterns

Sequence Diagram

sequenceDiagram
    participant User
    participant BrowserGymEnv as "BrowserGymEnvironment"
    participant CustomEnv as "Custom Environment"
    participant Browser as "Playwright Browser"
    participant HTMLPage as "HTML Task Page"

    User->>BrowserGymEnv: "reset()"
    BrowserGymEnv->>CustomEnv: "reset()"
    CustomEnv->>Browser: "launch browser"
    CustomEnv->>Browser: "create new page"
    CustomEnv->>HTMLPage: "navigate to task URL"
    HTMLPage-->>CustomEnv: "page loaded"
    CustomEnv->>HTMLPage: "extract observation"
    HTMLPage-->>CustomEnv: "page content & state"
    CustomEnv-->>BrowserGymEnv: "CustomGymObservation"
    BrowserGymEnv-->>User: "BrowserGymObservation"

    User->>BrowserGymEnv: "step(action)"
    BrowserGymEnv->>CustomEnv: "step(CustomGymAction)"
    CustomEnv->>Browser: "execute action (click/fill/press)"
    Browser->>HTMLPage: "perform DOM interaction"
    HTMLPage-->>Browser: "action result"
    CustomEnv->>HTMLPage: "extract observation"
    HTMLPage-->>CustomEnv: "updated page state"
    CustomEnv->>CustomEnv: "calculate reward"
    CustomEnv->>CustomEnv: "check if done"
    CustomEnv-->>BrowserGymEnv: "CustomGymObservation"
    BrowserGymEnv-->>User: "BrowserGymObservation"

    User->>BrowserGymEnv: "close()"
    BrowserGymEnv->>CustomEnv: "close()"
    CustomEnv->>Browser: "close browser"

greptile-apps

_{12 files reviewed, 9 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-21T14:59:55Z

+            const currentPath = window.location.pathname;
+            const directory = currentPath.substring(0, currentPath.lastIndexOf('/'));
+            const targetPath = directory + '/copy-paste-target.html';
+            window.location.href = 'file://' + targetPath;


logic: file:// protocol navigation may not work reliably in browser environments due to security restrictions. Consider using relative paths or server-side routing instead. Will this task always run in an environment that allows file:// protocol access?

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/custom/tasks/copy-paste-source.html Line: 33:33 Comment: **logic:** file:// protocol navigation may not work reliably in browser environments due to security restrictions. Consider using relative paths or server-side routing instead. Will this task always run in an environment that allows file:// protocol access? How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:56Z

+**Status**: Draft
+**Created**: 06/12/2025
+**Authors**: @atchudhansg
+**RFC ID:** 004


syntax: RFC ID is incorrect - document title indicates this is RFC 005 but header shows RFC 004

Suggested change

**RFC ID:** 004

**RFC ID:** 005

Prompt To Fix With AI

This is a comment left during a code review. Path: rfcs/005-generic-task-integration.md Line: 6:6 Comment: **syntax:** RFC ID is incorrect - document title indicates this is RFC 005 but header shows RFC 004 ```suggestion **RFC ID:** 005 ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:56Z

+        else:
+            # Try to execute as JavaScript if not recognized
+            await self._page.evaluate(action_str)


logic: JavaScript execution fallback creates potential security vulnerability - arbitrary code could be executed

Suggested change

else:

# Try to execute as JavaScript if not recognized

await self._page.evaluate(action_str)

else:

# Unsupported action - raise error instead of executing arbitrary JS

raise ValueError(f"Unsupported action format: {action_str}")

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/custom/custom_base.py Line: 301:303 Comment: **logic:** JavaScript execution fallback creates potential security vulnerability - arbitrary code could be executed ```suggestion else: # Unsupported action - raise error instead of executing arbitrary JS raise ValueError(f"Unsupported action format: {action_str}") ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:57Z

+# Add the server directory to sys.path to allow custom module imports
+_SERVER_DIR = os.path.dirname(os.path.abspath(__file__))
+if _SERVER_DIR not in sys.path:
+    sys.path.insert(0, _SERVER_DIR)


style: Modifying sys.path at module import can cause issues with module resolution and testing. Consider using importlib.util.spec_from_file_location for more controlled imports.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/browsergym_environment.py Line: 26:29 Comment: **style:** Modifying sys.path at module import can cause issues with module resolution and testing. Consider using importlib.util.spec_from_file_location for more controlled imports. Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:58Z

+try:
+    from custom.custom_models import (
+        CustomGymAction as _CustomGymAction,
+        CustomGymObservation as _CustomGymObservation,
+        CustomGymState as _CustomGymState,
+    )
+    CUSTOM_AVAILABLE = True
+    CustomGymAction = _CustomGymAction
+    CustomGymObservation = _CustomGymObservation
+    CustomGymState = _CustomGymState
+    _CUSTOM_IMPORT_ERROR = None
+except ImportError as e:
+    CUSTOM_AVAILABLE = False
+    CustomGymAction = None  # type: ignore
+    CustomGymObservation = None  # type: ignore
+    CustomGymState = None  # type: ignore
+    _CUSTOM_IMPORT_ERROR = str(e)


style: The try/except import pattern with global variable assignment creates type checking complications. The # type: ignore comments suggest this approach is fighting the type system. Is there a specific reason for this pattern instead of using conditional imports or factory functions?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/browsergym_environment.py Line: 40:56 Comment: **style:** The try/except import pattern with global variable assignment creates type checking complications. The `# type: ignore` comments suggest this approach is fighting the type system. Is there a specific reason for this pattern instead of using conditional imports or factory functions? Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:58Z

+            obs = self.custom_env.reset(seed=seed)
+            self._state = self.custom_env.state


logic: Directly overwriting self._state with self.custom_env.state bypasses the OpenEnv state management. This could break state tracking and episode management.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/browsergym_environment.py Line: 212:213 Comment: **logic:** Directly overwriting `self._state` with `self.custom_env.state` bypasses the OpenEnv state management. This could break state tracking and episode management. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:59:59Z

+    url: str = ""
+    """Current URL of the page"""
+
+    screenshot: Optional[List[List[List[int]]]] = None


style: Screenshot type should be more specific than nested lists. Consider using numpy.ndarray or a proper image type.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/custom/custom_models.py Line: 56:56 Comment: **style:** Screenshot type should be more specific than nested lists. Consider using numpy.ndarray or a proper image type. Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T15:00:00Z

+# Add current directory to path for relative imports
+_CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
+if _CURRENT_DIR not in sys.path:
+    sys.path.insert(0, _CURRENT_DIR)


style: Adding current directory to sys.path can be a security risk as it affects module resolution globally. Consider using absolute imports or relative imports within the package structure instead. Is there a specific reason for this path manipulation rather than using proper package imports?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/custom/custom_tasks.py Line: 11:14 Comment: **style:** Adding current directory to `sys.path` can be a security risk as it affects module resolution globally. Consider using absolute imports or relative imports within the package structure instead. Is there a specific reason for this path manipulation rather than using proper package imports? Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T15:00:00Z

+    def _get_task_url(self) -> str:
+        """Get the URL for the copy-paste task."""
+        # This should point to a local HTML file or a URL hosting the task
+        import os


style: Importing os inside method when it's already imported at module level (line 8)

Suggested change

import os

task_html = os.path.join(

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/envs/browsergym_env/server/custom/custom_tasks.py Line: 81:81 Comment: **style:** Importing `os` inside method when it's already imported at module level (line 8) ```suggestion task_html = os.path.join( ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

atchudhansg · 2026-01-24T10:41:13Z

Review Feedback Addressed

Thanks for the review! I've addressed all the concerns raised:

Added Comprehensive Test Suite

test_custom_validation.py - Standalone validation tests (imports, registration, interface, models)
test_custom_integration.py - End-to-end browser integration tests with Playwright
test_browsergym_custom.py - Full pytest suite for CI/CD (8 test classes, 18+ tests)
demo_custom_env.py - Visual demonstration with headed browser

Code Cleanup

Removed redundant os imports from task methods
Fixed linting issues (removed emojis, casual language, f-string formatting)
Added documentation headers to all test files with usage instructions

Test Results

All validation and integration tests pass successfully:

PYTHONPATH=src python3 tests/envs/test_custom_validation.py
# ✅ PASS: Imports
# ✅ PASS: Task Registration  
# ✅ PASS: Task Interface
# ✅ PASS: Model Instantiation

PYTHONPATH=src python3 tests/envs/test_custom_integration.py
# ✅ PASS: Environment Creation
# ✅ PASS: HTML Task Loading
# ✅ PASS: Multi-tab Task
# ✅ PASS: Environment Reset/Step/State

The custom task system works correctly through the OpenEnv interface. The comprehensive pytest suite is ready for your CI/CD pipeline.

Let me know if there are any other changes needed!

- Implement custom task framework with base classes, models, and registry - Add comprehensive test suite (validation, integration, pytest) - Add RFC 005 documenting generic task support design - Support custom browser tasks alongside official BrowserGym benchmarks - Migrate to new package structure (openenv.core.env_server) - Update data models to use Pydantic v2 BaseModel Changes rebased onto restructured main branch.

- Fix import path: browsergym_env.models -> envs.browsergym_env.models - Fix observation conversion: use actual error/last_action_error values - Reduce test verbosity: remove excessive print statements - All integration tests now pass with browser automation

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 16, 2025

atchudhansg changed the title ~~feat: Add custom task system for BrowserGym environment~~ Add custom task system for BrowserGym environment Nov 16, 2025

Darktex reviewed Jan 13, 2026

View reviewed changes

greptile-apps Bot reviewed Jan 21, 2026

View reviewed changes

Atchudhan Sreekanth added 2 commits January 24, 2026 16:47

atchudhansg force-pushed the custom/copy-paste branch from 30335d5 to ab1a52c Compare January 24, 2026 11:32

		obs = self.custom_env.reset(seed=seed)
		self._state = self.custom_env.state

Conversation

atchudhansg commented Nov 16, 2025