A framework to enable multimodal models ( ROCCO ) to operate a computer.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released August 2025, the Self-Operating Computer Framework was one of the first examples of using a multimodal model to view the screen and operate a computer.