feat(grpo): extend action types (scroll, key, noop)

GRPO DSL only supports CLICK/TYPE/WAIT/DONE. The schema (`episode.py`) supports 24 action types. RL training requires scroll, key, and noop at minimum.

`_format_action_as_text` silently converts unknown types to DONE, causing gradient signal loss. Extend parser, formatter, and prompt to support the full action vocabulary.