|  | 
|  | 1 | +# GAIA Framework - Code Interpreter Tool | 
|  | 2 | + | 
|  | 3 | +An open source version of the flow used to create a basic python code interpreter tool, used similarly within the currently unreleased `gaia-framework` and able to run and test with locally using Docker | 
|  | 4 | + | 
|  | 5 | +This tool aims to act similarly with basic functionality to the `code interpreter` tool that ChatGPT uses to execute python code. In this example, it's ran as a non-stateful jupyter notebook environment that allows execution of python code including internet access and allows for persisting of files to local, blob, or other storage locations depending on the setup. Though, it can be easily extended to support stateful execution as needed and left as a non-stateful tool here for simplicity. | 
|  | 6 | + | 
|  | 7 | +**Models tested**: | 
|  | 8 | + | 
|  | 9 | +- `GPT` models that support tool execution but best with `gpt-4` models | 
|  | 10 | + | 
|  | 11 | +- `claude-3` models that support tool execution using the new Tools Beta, though requires some additional prompting in the system instructions to get claude to perform similary to GPT | 
|  | 12 | + | 
|  | 13 | +**Examples using the tool**: | 
|  | 14 | + | 
|  | 15 | +- Currently only an example with `GPT` using the [OpenAI Node API](https://www.npmjs.com/package/openai) | 
|  | 16 | + | 
|  | 17 | +- See the [Running Examples](#running-examples) section | 
|  | 18 | + | 
|  | 19 | +**How the tool works**: | 
|  | 20 | + | 
|  | 21 | +An LLM such as GPT or Claude decides to call the code-interpreter tool and passes either generated or user-provided code as an argument to the tool. The tool then executes it as a script in a docker container with a short-lived lifetime, only spun up to execute the script and is removed afterwards. | 
|  | 22 | + | 
|  | 23 | +This script executed is written in python and saved to a temp directory, mounted to the conatainer on startup and to be executed in a jupyter notebook environment like below: | 
|  | 24 | + | 
|  | 25 | +```typescript | 
|  | 26 | +const imageName = 'jupyter-runtime'; | 
|  | 27 | +const executionPath = `/app/${notebookName}.ipynb`; | 
|  | 28 | +const outputPath = `/app/${notebookName}_output.ipynb`; | 
|  | 29 | +const dockerCommand = [ | 
|  | 30 | +  "docker run --rm", | 
|  | 31 | +  `-v "${tmpDir}:/app"`, | 
|  | 32 | +  `-v "${outputDir}:/mnt/data"`, // Used to save and persist user files & output | 
|  | 33 | +  imageName, | 
|  | 34 | +  `/bin/bash -c "xvfb-run -a jupyter nbconvert --to notebook --execute ${executionPath} --output ${outputPath} && cat ${outputPath}"` | 
|  | 35 | +].join(" "); | 
|  | 36 | +``` | 
|  | 37 | + | 
|  | 38 | +- A Jupyter Notebook environment is used for executing the script because it provides a convenient and automated way to run Python code, retrieve outputs, and persist files using an LLM | 
|  | 39 | + | 
|  | 40 | +  - Due to the nature of .ipynb files being in JSON and the way outputs are structured after execution, we can take advantage of that and use it as an execution environment, parsing its output to understand the results and post-execution processing | 
|  | 41 | + | 
|  | 42 | +  - This allows use cases such as accessing persisted user files, image data, or other output data similar to ChatGPT's code interpreter and returning the results back to the LLM | 
|  | 43 | + | 
|  | 44 | +    - Depending on the `output_type` of a notebook cell, we can access paths to the persisted files such as `/mnt/data/test.txt` or raw base64 image data directly, which can then be retrieved and uploaded to blob storage or other options | 
|  | 45 | + | 
|  | 46 | +- Docker must already be running on the machine the code is executing on | 
|  | 47 | + | 
|  | 48 | +  - Depending on the system/architecture where the tool is used and ran, options like Azure Container Instances (ACI) or others can be used in place of Docker | 
|  | 49 | +   | 
|  | 50 | +  - One approach would be to have some sort of orchestration service/tool to determine whether to use Docker, ACI, or some other provider depending on a parameter or local vs production environment | 
|  | 51 | + | 
|  | 52 | +- Any files persisted to `/mnt/data/` are shared with the output directory and since this example is a non-stateful jupyter environment, that output directory is used to retrieve and upload persisted items to external storage (or access from the mounted output directory itself if accessible) | 
|  | 53 | + | 
|  | 54 | +  - If its changed to run in a stateful environment, determining when and how files persisted to /mnt/data/ are uploaded to external storage can be updated depending on the preference | 
|  | 55 | + | 
|  | 56 | +- **Security Considerations**: | 
|  | 57 | + | 
|  | 58 | +  When implementing this tool in your own projects, consider the following security measures: | 
|  | 59 | + | 
|  | 60 | +  - **Input Sanitization:** Ensure all user inputs are sanitized to prevent injection attacks | 
|  | 61 | + | 
|  | 62 | +  - **Execution Environment:** Execute code within a secure, isolated sandbox environment | 
|  | 63 | + | 
|  | 64 | +  - **Resource Limits:** Set strict limits on CPU, memory, and execution time to avoid system strain | 
|  | 65 | + | 
|  | 66 | +  - **Feature Restrictions:** Disable unnecessary features to minimize potential attack surfaces | 
|  | 67 | + | 
|  | 68 | +  - **Error Handling:** Configure error handling to avoid revealing sensitive information | 
|  | 69 | + | 
|  | 70 | +## Example flow | 
|  | 71 | + | 
|  | 72 | +1. User prompts LLM to create a script and execute it | 
|  | 73 | + | 
|  | 74 | +2. LLM decides to call tool | 
|  | 75 | + | 
|  | 76 | +3. LLM executes tool with code as the input and result is created based on the notebook response | 
|  | 77 | + | 
|  | 78 | +4. (Optional) Before sending results to LLM, parse the notebook output and upload any persisted files to local or external storage | 
|  | 79 | + | 
|  | 80 | +    - Alternatively, an LLM can be prompted in its system instructions or user message to call another tool that uploads files persisted in the evironment to local/external storage | 
|  | 81 | + | 
|  | 82 | +      - This could be done using the mounted output folder or if stateful, accessing whats in /mnt/data/ directly to upload | 
|  | 83 | + | 
|  | 84 | +    - In this example, we parse the notebook output after execution and return the local output filePath | 
|  | 85 | + | 
|  | 86 | +      - If we were to upload to blob storage instead of currently returning the output filePath, we'd return the blob url generated from the upload instead | 
|  | 87 | + | 
|  | 88 | +5. LLM receives and processes results then returns response to user, or tries to fix errors with the tool call if any up to 3 retries | 
|  | 89 | + | 
|  | 90 | +    - The *runTools(...)* method from the [OpenAI Node API](https://www.npmjs.com/package/openai) simplifies tool calling and feeding in errors to fix for us in the `example_openai.ts` code along with system instructions for GPT | 
|  | 91 | + | 
|  | 92 | +    - At this time a custom feedback loop is needed when attempting this flow with Claude | 
|  | 93 | + | 
|  | 94 | +6. User receives response, including the link to the path where files were persisted (if any) or the results of the execution in general | 
|  | 95 | + | 
|  | 96 | +    - If files were generated during the execution but weren't uploaded to local/external storage in an intermediary step, the response will include the inaccessible file path within the environment, such as `/mnt/data/test.txt`, instead of a publicly accessible URL like `someblobstorageurl.com/path/to/file/test.txt`. | 
|  | 97 | + | 
|  | 98 | +See [Example Run Outputs](#example-run-outputs) | 
|  | 99 | + | 
|  | 100 | +## Running Examples | 
|  | 101 | + | 
|  | 102 | +### Build the Image | 
|  | 103 | + | 
|  | 104 | +For the tool to execute properly the docker image must be built first: | 
|  | 105 | + | 
|  | 106 | +1. Start docker | 
|  | 107 | + | 
|  | 108 | +2. Navigate to the `environments/jupyter` folder in terminal | 
|  | 109 | + | 
|  | 110 | +3. Run the following comand: | 
|  | 111 | + | 
|  | 112 | +      ```bash | 
|  | 113 | +      docker build -t jupyter-runtime . | 
|  | 114 | +      ``` | 
|  | 115 | + | 
|  | 116 | +### Run Typescript Example | 
|  | 117 | + | 
|  | 118 | +1. Navigate to `examples/typescript` in terminal | 
|  | 119 | + | 
|  | 120 | +2. Create a `.env` file based on the `.env.example` and add your value for the `OPENAI_API_KEY` | 
|  | 121 | + | 
|  | 122 | +3. Run the following commands sequentially: | 
|  | 123 | + | 
|  | 124 | +    ```bash | 
|  | 125 | +    yarn install | 
|  | 126 | +    yarn build | 
|  | 127 | +    yarn start | 
|  | 128 | +    ``` | 
|  | 129 | + | 
|  | 130 | +4. Enter a prompt that would make GPT choose the tool | 
|  | 131 | +   - e.g., `execute a python script to add two numbers together and show the result` | 
|  | 132 | + | 
|  | 133 | +## Example Run Outputs | 
|  | 134 | + | 
|  | 135 | +View [output examples](docs/output_examples.md) to see example run outputs using the tool with `.runTools(...)` from the [OpenAI Node API](https://www.npmjs.com/package/openai) for easy usage and handling tool errors | 
|  | 136 | + | 
|  | 137 | +- If there's an error in the tool call, returning a string of the error back as the tool response can enable GPT to try and fix errors on its own | 
|  | 138 | +
 | 
|  | 139 | +## Ethical Use Guidelines | 
|  | 140 | +
 | 
|  | 141 | +This open-source tool is provided with the intent to foster innovation and aid in development, particularly in educational, research, and development contexts. Users are urged to utilize the tool responsibly and ethically. Here are some guidelines to consider: | 
|  | 142 | +
 | 
|  | 143 | +- **Responsible Usage**: Ensure that the use of this tool does not harm individuals or groups. This includes avoiding the processing or analysis of data in ways that infringe on privacy or propagate bias. | 
|  | 144 | +
 | 
|  | 145 | +- **Prohibited Uses**: Do not use this tool for: | 
|  | 146 | +  - Illegal activities | 
|  | 147 | +  - Creating or spreading malware | 
|  | 148 | +  - Conducting surveillance or gathering sensitive data without consent | 
|  | 149 | +  - Activities that could cause harm, such as cyberbullying or online harassment | 
|  | 150 | +
 | 
|  | 151 | +- **Transparency**: Users should be transparent about how scripts are generated and used, particularly when the outputs are shared publicly or used in decision-making processes. | 
|  | 152 | +
 | 
|  | 153 | +- **Data Privacy**: Be mindful of data privacy laws and regulations. Ensure that any data used with this tool complies with relevant legal standards, such as GDPR in Europe, CCPA in California, etc. | 
|  | 154 | +
 | 
|  | 155 | +- **Intellectual Property**: Respect the intellectual property rights of others. Ensure that all content processed by or generated with this tool does not violate copyrights or other intellectual property laws. | 
|  | 156 | +
 | 
|  | 157 | +- **Quality Control**: Regularly review and test the code executed by this tool to ensure its accuracy and reliability, especially when used in critical or production environments. | 
|  | 158 | +
 | 
|  | 159 | +## Reporting Issues | 
|  | 160 | +
 | 
|  | 161 | +If you encounter any issues or bugs while using this tool, please report them via [GitHub Issues](https://github.com/gaia-framework-ai/code-interpreter-tool/issues). | 
|  | 162 | +
 | 
|  | 163 | +## License | 
|  | 164 | +
 | 
|  | 165 | +This project is licensed under the MIT license, see the [LICENSE](LICENSE) file included with the project. | 
|  | 166 | +
 | 
|  | 167 | +## Contributions | 
|  | 168 | +
 | 
|  | 169 | +Coming soon | 
0 commit comments