Skip to content

Add custom task system for BrowserGym environment#201

Open
atchudhansg wants to merge 2 commits into
huggingface:mainfrom
atchudhansg:custom/copy-paste
Open

Add custom task system for BrowserGym environment#201
atchudhansg wants to merge 2 commits into
huggingface:mainfrom
atchudhansg:custom/copy-paste

Conversation

@atchudhansg

Copy link
Copy Markdown

Add Custom Task System for BrowserGym

Summary

Adds a custom task framework that allows users to create domain-specific BrowserGym tasks without depending on official benchmark packages.

Changes

  • Custom task framework: Base classes, models, and registry system for creating tasks
  • Example tasks: Copy-paste (single-tab) and copy-paste-multitab implementations
  • Documentation: Comprehensive guide in README with examples and best practices
  • Example script: browsergym_custom_example.py demonstrating usage
  • Test coverage: Basic tests for custom task system

Key Features

  • No BrowserGym package dependencies required
  • Direct Playwright browser control
  • Simple registration system
  • Rapid prototyping for domain-specific tasks
  • Follows MiniWoB design philosophy (minimal HTML, functional UI)

Usage

env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "custom",
    "BROWSERGYM_TASK_NAME": "copy-paste"
})

Files Changed

  • custom - Custom task system implementation
  • README.md - Updated with custom tasks documentation
  • browsergym_custom_example.py - Usage example
  • .gitignore - Added myenv to exclude virtual environments

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 16, 2025
@atchudhansg atchudhansg changed the title feat: Add custom task system for BrowserGym environment Add custom task system for BrowserGym environment Nov 16, 2025
@pankit-eng

Copy link
Copy Markdown
Contributor

@atchudhansg - thanks for sharing this PR. Supporting generic integration with training tasks is definitely part of the roadmap(cc: @Darktex ) but we are still in early stages of other features. That being said, does it make sense to have an RFC on how we will support tasks more generically that run inside/alongside OpenEnv? This wont be specific to browser tasks only. Let me know what you think.

@atchudhansg

atchudhansg commented Dec 6, 2025

Copy link
Copy Markdown
Author

Thanks for the feedback @pankit-eng ! I completely agree that a generic approach for custom tasks across all OpenEnv environments (not just BrowserGym) would be valuable.

I've drafted an RFC (RFC 005) to propose a standardized interface for this. The core idea is to allow users to inject task logic (rewards, observations, termination) at runtime without needing to rebuild containers or modify the core environment code.

I've added the RFC file (rfcs/005-generic-task-integration.md) to this PR for review. Let me know your thoughts on the proposed interface!

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.


Based on my review of the PR, design documents, and existing codebase, I can now provide my comprehensive alignment review.


PR #201 Review: Add Custom Task System for BrowserGym Environment

Previous Review Summary

Key points from @pankit-eng and @atchudhansg discussion:

  • Need for generic task integration across all OpenEnv environments, not just BrowserGym
  • RFC 005 drafted to propose standardized interface for custom tasks
  • Core idea: Allow runtime injection of task logic (rewards, observations, termination) without rebuilding containers

Tier 1: Bugs, Lint, Security, Debug Code

❌ CRITICAL: Incomplete Code in custom_base.py

Location: src/envs/browsergym_env/server/custom/custom_base.py:310

The _get_or_create_event_loop method is truncated mid-implementation:

def step(self, action: CustomGymAction) -> CustomGymObservation:
    loop = self._get_or_create_event_loop()
    return loop.run_until_complete(self._async_step(action))

# Code cuts off here - missing async_close and other methods

Impact: This will cause runtime errors. The file appears incomplete.

Fix needed: Complete the implementation or verify the diff is showing the full file.


⚠️ Lint Issues

The lint check failed due to missing uv:

Error: 'uv' is not installed or not in PATH

Action required: Run lint manually with ruff format --check once uv is available, or use Docker-based lint.


⚠️ Debug Code in Example

Location: examples/browsergym_custom_example.py

Multiple print() statements throughout the example file (lines 21-128). While acceptable for examples, consider whether this should use logging for consistency.

Recommendation: Examples are typically allowed debug output. No action required unless project style dictates otherwise.


⚠️ Emoji Removal

Location: Multiple files in README updates

The PR removes emojis from documentation:

  • Changed from Recommended to Recommended
  • Removed various visual indicators

Observation: This aligns with project style (CLAUDE.md says "only use emojis if user explicitly requests"). Good cleanup.


🔒 Security: No Critical Issues

  • No credential exposure detected
  • Container isolation maintained
  • No SQL injection or XSS vulnerabilities introduced
  • File paths properly validated

Tier 2: Alignment with Principles & Invariants

🚨 ALIGNMENT FLAG #1: RFC Required for Architectural Change

Principle at stake: "Changes affecting core APIs or architecture require RFCs" (CONTRIBUTING.md:28-34)

The concern:
This PR introduces a parallel environment system (CustomBrowserGymEnvironment) that bypasses the standard BrowserGym integration pattern. This is a significant architectural decision that:

  1. Adds a second environment implementation path - Custom tasks use Playwright directly while official benchmarks use BrowserGym's gym registration
  2. Introduces new abstractions - CustomGymAction, CustomGymObservation, CustomGymState parallel the existing models
  3. Changes the interface contract - The benchmark="custom" parameter fundamentally changes how BrowserGymEnvironment operates

Per CONTRIBUTING.md:

Required for:

  • New core APIs in src/openenv/core/
  • Breaking changes to existing APIs
  • Major architectural decisions
  • New abstractions or design patterns

The proposed RFC 005 is currently "Draft" status and has critical issues (see FLAG #2 below).

Suggested reviewer: @pankit-eng (already engaged in discussion), @Darktex

Recommendation:

  1. Do not merge until RFC 005 is reviewed and accepted
  2. RFC should address whether this pattern should be generalized to other environments first
  3. Consider whether this belongs in browsergym_env or deserves a separate "custom_env" module

🚨 ALIGNMENT FLAG #2: RFC 005 Has Wrong RFC Number

Location: rfcs/005-generic-task-integration.md:6

**RFC ID:** 004  # Should be 005!

Current RFCs on main:

  • 000, 001, 002, 003 exist
  • Next available: 004 (not 005)

Additionally:

  • RFC 005 claims to be "004" in the ID field
  • File is named 005-generic-task-integration.md but should be 004-generic-task-integration.md

Fix needed: Rename file to 004-generic-task-integration.md and update RFC ID to 004.


⚠️ ALIGNMENT FLAG #3: Dual API Boundary Potentially Violated

Invariant at stake: "Dual API boundary" (INVARIANTS.md:40-58)

The concern:

The custom task system allows arbitrary JavaScript execution:

# In custom_base.py:_execute_action()
else:
    # Try to execute as JavaScript if not recognized
    await self._page.evaluate(action_str)

From INVARIANTS.md:

The Gym-like API is NOT accessible to the agent being trained.

Violations to flag:

  • Exposing reset(), step(), or state() via MCP tools
  • Giving agents direct access to the Gym-like WebSocket API
  • Any mechanism that lets an agent trigger simulation control

Analysis:

While this doesn't directly expose reset(), allowing arbitrary JavaScript could theoretically let an agent:

  1. Manipulate the browser state in ways that affect reward calculation
  2. Access browser APIs that could leak information about the simulation
  3. Potentially interfere with the observation extraction logic

However, this appears to be standard BrowserGym behavior - official benchmarks also execute arbitrary JavaScript for complex actions.

Recommendation:

  • Document in RFC 005 that custom tasks inherit BrowserGym's action execution model
  • Clarify that JavaScript sandboxing is BrowserGym's responsibility
  • Consider whether custom tasks need additional guardrails

Suggested reviewer: @Darktex (architectural invariants)


⚠️ ALIGNMENT FLAG #4: Rewards Outside Environment Boundary?

Principle at stake: "Rewards inside environment" (PRINCIPLES.md:31, INVARIANTS.md:65-67)

The concern:

From INVARIANTS.md:

Reward computation must stay inside environment boundary

The custom task system allows users to define reward functions in task classes:

def _calculate_reward(self, page_data, action, error=None) -> float:
    """Calculate reward based on page state."""
    # User-defined reward logic

Analysis:

This appears compliant because:

  1. Tasks are instantiated inside the container (server-side only)
  2. Task classes are part of the environment implementation
  3. Rewards are computed server-side and returned in observations

However, the RFC 005 proposal mentions:

"Users often need to test agents on proprietary or local tasks"

If "local tasks" means tasks defined client-side and somehow injected, this would violate the invariant.

The current implementation appears safe - tasks are defined in server/custom/custom_tasks.py on the server side.

Recommendation:

  • RFC 005 should explicitly state that custom tasks are server-side only
  • Clarify that reward logic cannot be injected from client
  • Document the volume mount mechanism mentioned in RFC 005 more clearly

⚠️ Principle Alignment: Client-Server Separation

Invariant: "Client-server separation" (INVARIANTS.md:59-62)

Observation:

The PR adds imports at the top of browsergym_environment.py:

# Add the server directory to sys.path
_SERVER_DIR = os.path.dirname(os.path.abspath(__file__))
if _SERVER_DIR not in sys.path:
    sys.path.insert(0, _SERVER_DIR)

# Import custom models
from custom.custom_models import (
    CustomGymAction,
    CustomGymObservation,
    CustomGymState,
)

Analysis:

This is acceptable because:

  • Server code importing other server code (same directory)
  • Client code in client.py does not import custom models
  • Custom types are converted to BrowserGymObservation before leaving the server

Compliant


📋 Missing: Tests

Observation:

The PR adds:

  • New environment classes
  • New action parsing logic
  • New observation conversion logic
  • Error handling paths

But no tests are included.

From .claude/hooks/test.sh:

PYTHONPATH=src:envs uv run pytest tests/ -v --tb=short

Recommendation:

  • Add tests/envs/test_browsergym_custom.py
  • Test custom task registration
  • Test action parsing (especially edge cases in _execute_action)
  • Test observation conversion
  • Test error handling when custom task not found

📋 Missing: Documentation in RFC 005

Current RFC 005 gaps:

  1. No implementation details - How does task registration work? Where do HTML files go?
  2. No security model - Who can add custom tasks? How are they sandboxed?
  3. No migration path - How does this relate to existing BrowserGym benchmarks?
  4. Vague volume mounting - The RFC mentions volume mounting but the PR doesn't implement it
  5. No rejection criteria - When should we reject a custom task approach?

From CONTRIBUTING.md, RFC template requires:

  • Architecture Overview
  • Core Abstractions
  • Key Design Decisions with rationale and trade-offs
  • Examples (✓ present)

Recommendation:
RFC 005 needs significant expansion before acceptance.


✅ Positive Alignment: Task-Specific Logic

Principle: "Domain knowledge encapsulated in env, not external" (PRINCIPLES.md:31)

The custom task system correctly keeps:

  • Reward logic inside environment
  • Termination logic inside environment
  • Observation extraction inside environment

This aligns well with the "Rewards inside environment" principle. ✓


✅ Positive: Type Safety Maintained

Invariant: "Generic type safety" (INVARIANTS.md:13-16)

The PR maintains type safety:

  • Custom types extend the same Pydantic patterns
  • Observation conversion preserves types
  • Generic types match across boundaries

Compliant


Summary

Critical Issues (Must Fix Before Merge)

  1. Incomplete code in custom_base.py - file appears truncated
  2. RFC 005 wrong number - should be RFC 004
  3. RFC 005 not accepted - still in "Draft", needs review per CONTRIBUTING.md

Alignment Flags (Need Human Review)

  1. RFC required - Architectural change needs accepted RFC before merge
  2. RFC 005 incomplete - Needs expansion on security, architecture, trade-offs
  3. Dual API boundary - Arbitrary JavaScript execution needs documentation
  4. Rewards boundary - Clarify that tasks are server-side only

Missing Components

  1. Tests - No test coverage for new functionality
  2. Lint verification - Unable to run due to uv missing

Positive Aspects

  • Maintains type safety ✓
  • Respects client-server separation ✓
  • Keeps rewards inside environment ✓
  • Good documentation in README ✓
  • Removes emojis per style guide ✓

Recommendation

DO NOT MERGE until:

  1. ✅ Complete the truncated code in custom_base.py
  2. ✅ Rename RFC to 004-generic-task-integration.md with correct ID
  3. ✅ Expand RFC 004 with architecture, security, trade-offs
  4. ✅ Get RFC 004 reviewed and accepted per CONTRIBUTING.md
  5. ✅ Add test coverage for custom task system
  6. ✅ Run and pass lint checks
  7. ✅ Address whether this should be generalized beyond BrowserGym

Timeline: Human review needed on architectural decisions before implementation can proceed.


Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@zkwentz

zkwentz commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

@greptile

@greptile-apps

greptile-apps Bot commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

  • Introduces a custom task framework for BrowserGym environments that enables domain-specific tasks without official benchmark package dependencies, using direct Playwright browser control
  • Implements example copy-paste tasks (single-tab and multi-tab) with HTML templates, registration system, and comprehensive documentation
  • Adds dual-path architecture in browsergym_environment.py to support both standard benchmarks and custom tasks while maintaining API compatibility

Important Files Changed

Filename Overview
src/envs/browsergym_env/server/browsergym_environment.py Added custom benchmark support with dual initialization paths and observation type conversion
src/envs/browsergym_env/server/custom/custom_base.py New base class providing Gym interface over Playwright with async/sync integration and action parsing
src/envs/browsergym_env/server/custom/custom_tasks.py Task registry system with example copy-paste tasks and hardcoded text validation logic

Confidence score: 3/5

  • This PR requires careful review due to several implementation concerns that could cause runtime issues or security vulnerabilities
  • Score lowered due to sys.path manipulations that risk import conflicts, potential security issues with JavaScript execution fallbacks, async/sync integration complexity, and hardcoded validation logic that reduces maintainability
  • Pay close attention to custom_base.py and custom_tasks.py for async handling, JavaScript execution security, and path manipulation patterns

Sequence Diagram

sequenceDiagram
    participant User
    participant BrowserGymEnv as "BrowserGymEnvironment"
    participant CustomEnv as "Custom Environment"
    participant Browser as "Playwright Browser"
    participant HTMLPage as "HTML Task Page"

    User->>BrowserGymEnv: "reset()"
    BrowserGymEnv->>CustomEnv: "reset()"
    CustomEnv->>Browser: "launch browser"
    CustomEnv->>Browser: "create new page"
    CustomEnv->>HTMLPage: "navigate to task URL"
    HTMLPage-->>CustomEnv: "page loaded"
    CustomEnv->>HTMLPage: "extract observation"
    HTMLPage-->>CustomEnv: "page content & state"
    CustomEnv-->>BrowserGymEnv: "CustomGymObservation"
    BrowserGymEnv-->>User: "BrowserGymObservation"

    User->>BrowserGymEnv: "step(action)"
    BrowserGymEnv->>CustomEnv: "step(CustomGymAction)"
    CustomEnv->>Browser: "execute action (click/fill/press)"
    Browser->>HTMLPage: "perform DOM interaction"
    HTMLPage-->>Browser: "action result"
    CustomEnv->>HTMLPage: "extract observation"
    HTMLPage-->>CustomEnv: "updated page state"
    CustomEnv->>CustomEnv: "calculate reward"
    CustomEnv->>CustomEnv: "check if done"
    CustomEnv-->>BrowserGymEnv: "CustomGymObservation"
    BrowserGymEnv-->>User: "BrowserGymObservation"

    User->>BrowserGymEnv: "close()"
    BrowserGymEnv->>CustomEnv: "close()"
    CustomEnv->>Browser: "close browser"
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 9 comments

Edit Code Review Agent Settings | Greptile

const currentPath = window.location.pathname;
const directory = currentPath.substring(0, currentPath.lastIndexOf('/'));
const targetPath = directory + '/copy-paste-target.html';
window.location.href = 'file://' + targetPath;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: file:// protocol navigation may not work reliably in browser environments due to security restrictions. Consider using relative paths or server-side routing instead. Will this task always run in an environment that allows file:// protocol access?

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/custom/tasks/copy-paste-source.html
Line: 33:33

Comment:
**logic:** file:// protocol navigation may not work reliably in browser environments due to security restrictions. Consider using relative paths or server-side routing instead. Will this task always run in an environment that allows file:// protocol access?

How can I resolve this? If you propose a fix, please make it concise.

Comment thread rfcs/005-generic-task-integration.md Outdated
**Status**: Draft
**Created**: 06/12/2025
**Authors**: @atchudhansg
**RFC ID:** 004

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: RFC ID is incorrect - document title indicates this is RFC 005 but header shows RFC 004

Suggested change
**RFC ID:** 004
**RFC ID:** 005
Prompt To Fix With AI
This is a comment left during a code review.
Path: rfcs/005-generic-task-integration.md
Line: 6:6

Comment:
**syntax:** RFC ID is incorrect - document title indicates this is RFC 005 but header shows RFC 004

```suggestion
**RFC ID:** 005
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +301 to +303
else:
# Try to execute as JavaScript if not recognized
await self._page.evaluate(action_str)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: JavaScript execution fallback creates potential security vulnerability - arbitrary code could be executed

Suggested change
else:
# Try to execute as JavaScript if not recognized
await self._page.evaluate(action_str)
else:
# Unsupported action - raise error instead of executing arbitrary JS
raise ValueError(f"Unsupported action format: {action_str}")
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/custom/custom_base.py
Line: 301:303

Comment:
**logic:** JavaScript execution fallback creates potential security vulnerability - arbitrary code could be executed

```suggestion
        else:
            # Unsupported action - raise error instead of executing arbitrary JS
            raise ValueError(f"Unsupported action format: {action_str}")
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +26 to +29
# Add the server directory to sys.path to allow custom module imports
_SERVER_DIR = os.path.dirname(os.path.abspath(__file__))
if _SERVER_DIR not in sys.path:
sys.path.insert(0, _SERVER_DIR)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Modifying sys.path at module import can cause issues with module resolution and testing. Consider using importlib.util.spec_from_file_location for more controlled imports.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/browsergym_environment.py
Line: 26:29

Comment:
**style:** Modifying sys.path at module import can cause issues with module resolution and testing. Consider using importlib.util.spec_from_file_location for more controlled imports.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +40 to +56
try:
from custom.custom_models import (
CustomGymAction as _CustomGymAction,
CustomGymObservation as _CustomGymObservation,
CustomGymState as _CustomGymState,
)
CUSTOM_AVAILABLE = True
CustomGymAction = _CustomGymAction
CustomGymObservation = _CustomGymObservation
CustomGymState = _CustomGymState
_CUSTOM_IMPORT_ERROR = None
except ImportError as e:
CUSTOM_AVAILABLE = False
CustomGymAction = None # type: ignore
CustomGymObservation = None # type: ignore
CustomGymState = None # type: ignore
_CUSTOM_IMPORT_ERROR = str(e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The try/except import pattern with global variable assignment creates type checking complications. The # type: ignore comments suggest this approach is fighting the type system. Is there a specific reason for this pattern instead of using conditional imports or factory functions?

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/browsergym_environment.py
Line: 40:56

Comment:
**style:** The try/except import pattern with global variable assignment creates type checking complications. The `# type: ignore` comments suggest this approach is fighting the type system. Is there a specific reason for this pattern instead of using conditional imports or factory functions?

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +212 to +213
obs = self.custom_env.reset(seed=seed)
self._state = self.custom_env.state

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Directly overwriting self._state with self.custom_env.state bypasses the OpenEnv state management. This could break state tracking and episode management.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/browsergym_environment.py
Line: 212:213

Comment:
**logic:** Directly overwriting `self._state` with `self.custom_env.state` bypasses the OpenEnv state management. This could break state tracking and episode management.

How can I resolve this? If you propose a fix, please make it concise.

url: str = ""
"""Current URL of the page"""

screenshot: Optional[List[List[List[int]]]] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Screenshot type should be more specific than nested lists. Consider using numpy.ndarray or a proper image type.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/custom/custom_models.py
Line: 56:56

Comment:
**style:** Screenshot type should be more specific than nested lists. Consider using numpy.ndarray or a proper image type.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +11 to +14
# Add current directory to path for relative imports
_CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
if _CURRENT_DIR not in sys.path:
sys.path.insert(0, _CURRENT_DIR)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Adding current directory to sys.path can be a security risk as it affects module resolution globally. Consider using absolute imports or relative imports within the package structure instead. Is there a specific reason for this path manipulation rather than using proper package imports?

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/custom/custom_tasks.py
Line: 11:14

Comment:
**style:** Adding current directory to `sys.path` can be a security risk as it affects module resolution globally. Consider using absolute imports or relative imports within the package structure instead. Is there a specific reason for this path manipulation rather than using proper package imports?

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

def _get_task_url(self) -> str:
"""Get the URL for the copy-paste task."""
# This should point to a local HTML file or a URL hosting the task
import os

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Importing os inside method when it's already imported at module level (line 8)

Suggested change
import os
task_html = os.path.join(

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/browsergym_env/server/custom/custom_tasks.py
Line: 81:81

Comment:
**style:** Importing `os` inside method when it's already imported at module level (line 8)

```suggestion
        task_html = os.path.join(
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@atchudhansg

Copy link
Copy Markdown
Author

Review Feedback Addressed

Thanks for the review! I've addressed all the concerns raised:

Added Comprehensive Test Suite

  • test_custom_validation.py - Standalone validation tests (imports, registration, interface, models)
  • test_custom_integration.py - End-to-end browser integration tests with Playwright
  • test_browsergym_custom.py - Full pytest suite for CI/CD (8 test classes, 18+ tests)
  • demo_custom_env.py - Visual demonstration with headed browser

Code Cleanup

  • Removed redundant os imports from task methods
  • Fixed linting issues (removed emojis, casual language, f-string formatting)
  • Added documentation headers to all test files with usage instructions

Test Results

All validation and integration tests pass successfully:

PYTHONPATH=src python3 tests/envs/test_custom_validation.py
# ✅ PASS: Imports
# ✅ PASS: Task Registration  
# ✅ PASS: Task Interface
# ✅ PASS: Model Instantiation

PYTHONPATH=src python3 tests/envs/test_custom_integration.py
# ✅ PASS: Environment Creation
# ✅ PASS: HTML Task Loading
# ✅ PASS: Multi-tab Task
# ✅ PASS: Environment Reset/Step/State

The custom task system works correctly through the OpenEnv interface. The comprehensive pytest suite is ready for your CI/CD pipeline.

Let me know if there are any other changes needed!


Atchudhan Sreekanth added 2 commits January 24, 2026 16:47
- Implement custom task framework with base classes, models, and registry
- Add comprehensive test suite (validation, integration, pytest)
- Add RFC 005 documenting generic task support design
- Support custom browser tasks alongside official BrowserGym benchmarks
- Migrate to new package structure (openenv.core.env_server)
- Update data models to use Pydantic v2 BaseModel

Changes rebased onto restructured main branch.
- Fix import path: browsergym_env.models -> envs.browsergym_env.models
- Fix observation conversion: use actual error/last_action_error values
- Reduce test verbosity: remove excessive print statements
- All integration tests now pass with browser automation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants