Skip to content

A program synthesis agent that autonomously fixes its output by running tests!

License

Notifications You must be signed in to change notification settings

modal-labs/devlooper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐥 devlooper

devlooper is a program synthesis agent that autonomously fixes its output by running tests!

Here's devlooper in action, taking 11 iterations to create a Python library that generates voronoi diagrams:

devlooper demo

⚙️ How it works

This project extends smol developer by giving it access to a sandbox to run tests in. The agent iterates until all tests pass, by updating the code and fixing the environment (installing packages).

📦 Environment templates

The project uses environment "templates" to define the basic setup and test harness for a given language/framework. For now, three templates are provided:

  • React + Jest
  • Python
  • Rust

However, any language/framework should work, as long as it can be installed within a container. Contributions for more templates are welcome (see env_templates.py).

🏖️ Sandbox

We use Modal's new Sandbox primitive to run tests in an isolated environment and fetch the output. This allows us to construct the image incrementally as well (similar to building up a Dockerfile in layers that are cached).

🤖 Debug loop

In each iteration, the agent runs the test command for the environment. If a non-zero exit code is received, the agent passes the stdout and stderr from the sandbox to the LLM to diagnose the error. This diagnosis is used in a separate step to generate a DebugPlan consisting of three types of actions:

  1. Inspect and fix a file
  2. Install a package in the image
  3. Run commands in the image

More types of actions can be implemented pretty easily — once again, contributions are welcome!

Running the diagnosis as a separate step seems to boost model accuracy by quite a bit (instead of immediately predicting the DebugPlan). We suspect the benefits are similar to why Chain-of-Thought prompting works so well.

🧑‍🚀 Usage

Set up

  • Create a Modal account (reach out to us if you are still on the waitlist!)
  • Install modal in your current Python environment
pip install modal
  • Create a Modal token
modal token new

Generate!

You're ready to generate! From the root directory of this repo, modal run the program with your choice of prompt and template:

modal run src.main --prompt="a simple 2D graphics library" --template="rust"
modal run src.main --prompt="a todo-list app" --template="react"
modal run src.main --prompt="a webscraper that checks if there are new reservations for a given restaurant on Resy" --template="python"

Once all tests pass, the output will be written to output/ in the same directory by default. This can be overridden using --output-path.

✨ Showcase

Coming soon

🔮 Future directions

This project is mostly a proof of concept, and there's a lot of cool additions that will make this better. Here are some ideas:

  • Allowing feedback from users in the loop, or accepting an existing project + plan as input and making suggested changes to it.
  • Making the debugging prompt better with relevant parts of the code, retrieved using embeddings.
  • Go out and fetch the documentation for a package if needed.
  • Using previous edits in the prompt somewhere to prevent the model from going into a loop.
  • Synthesizing EnvTemplates from scratch.
  • Generalizing this to more LLMs, including open-source ones!

About

A program synthesis agent that autonomously fixes its output by running tests!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages