Convert all files in git repository to .txt files. This is useful for training LLMs on your codebase.
- Create new .env file by copying example.env
cp example.env .env
- Add necessary fields. The default fields are good to start with.
GIT_PROJECT_DIRECTORY=/path/to/git/repo
IGNORE_FILES=.env,package-lock.json
IGNORE_DIRS=.git,.vscode,node_modules
SAVE_DIRECTORY=training_data
SKIP_EMPTY_FILES=true
- Install dependencies. Using a virtual environment is recommended.
python -m pip install -r requirements.txt
- Run program
python main.py
- You'll see your data files in the
training_data/
directory. This will be different if you changed the path viaSAVE_DIRECTORY
in.env
file.
- This program requires Python version 3.6 or later. It uses the f-string formatting technique introduced in Python 3.6.