Closed
Description
The ./main
program currently outputs text and then quits.
How hard would it be to add a mode where it could stay running and be ready to accept more text piped to standard input?
This could help avoid the overhead of loading the model again every time the script runs.
Maybe it could output the generated text followed by a marker of some sort when it's done, so a wrapping process could see when it's finished and available to send a new prompt for evaluation.
I'm interested in wrapping it in a tiny Python web server to give myself a UI for interacting with the model.