feat(grpo): add pluggable action format (DSL vs JSON)

The GRPO trainer hardcodes DSL format (CLICK/TYPE/WAIT/DONE) in prompt construction, parsing, and formatting. External RL training use cases need JSON format (`{"type": "click", "x": 0.461, "y": 0.021}`).

Introduce an `ActionCodec` protocol with `encode`/`decode`/`build_prompt` methods and DSL/JSON implementations.

**Affected functions:**
- `_build_agent_messages`
- `_parse_vlm_output_to_action`
- `_format_action_as_text`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grpo): add pluggable action format (DSL vs JSON) #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(grpo): add pluggable action format (DSL vs JSON) #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions