I have been working with an AI benchmark solving Vimgolf challenges.
It is available here:
https://github.com/James4Ever0/vimgolf-gym
With my library you are able to perform any action in the terminal and get a real time text dump of the entire terminal screen, get the location of the cursor, and take a screenshot of the terminal.
Also there is a primitive CTF library, which is very extensible and environment independent. It can be used for reinforcement learning of terminal and GUI agents.
View it here: https://github.com/James4Ever0/agi_computer_control/blob/master/gym-primitives%2Fctf%2FREADME.md
If you want your agents to earn real money, you can check out Cybergod:
https://github.com/James4Ever0/agi_computer_control/blob/master/gym-primitives%2Fcybergod%2FREADME.md
There is also a WebUI based, local/remote terminal/GUI operation recorder:
https://github.com/James4Ever0/agi_computer_control/blob/master/web_gui_terminal_recorder%2FREADME.md
Your work is inspiring and I will integrate it into my project. Hope my work will do the same for you.