The GRPO trainer hardcodes DSL format (CLICK/TYPE/WAIT/DONE) in prompt construction, parsing, and formatting. External RL training use cases need JSON format ({"type": "click", "x": 0.461, "y": 0.021}).
Introduce an ActionCodec protocol with encode/decode/build_prompt methods and DSL/JSON implementations.
Affected functions:
_build_agent_messages
_parse_vlm_output_to_action
_format_action_as_text