A desktop app to scrape websites, Git repositories, or package local files into a single file, optimized for consumption by LLMs.
- Web Crawling: Scrape a website, convert pages to Markdown, and package them into one file.
- Git Repository Cloning: Enter a Git URL to automatically clone the repository and switch to local packaging mode.
- Local Packaging: Package a local directory (e.g., a code repository) into a single file.
- Multiple Output Formats: Package files as
.md
,.txt
, or.xml
. - Smart Filtering: Automatically respects
.gitignore
rules and provides an option to hide common binary and image files from the list. - Customizable: Scraping options (depth, paths, speed) and file exclusions can be configured.
- External Configuration: Key settings can be modified in a
config.json
file. - Cross-Platform: Light and Dark theme support (detects system theme on Windows, macOS, and Linux).
-
Install a Web Browser: The web crawling feature requires one of the following browsers to be installed:
- Microsoft Edge
- Google Chrome
- Mozilla Firefox
-
Install Git: The Git repository cloning feature requires
git
to be installed and accessible in your system's PATH. -
Clone the repository or download the source code.
-
It is recommended to create a virtual environment:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the required dependencies from the virtual environment:
pip install -r requirements.txt
-
Run the application:
python app.py
The application has two main modes, selectable via radio buttons.
This mode is for scraping online content or cloning Git repositories.
- Select the "Web Crawl" radio button.
- Enter the Start URL.
- For a website, enter the full URL to begin scraping.
- For a Git repository, enter the repository's clone URL (e.g.,
https://github.com/user/repo.git
). The app will detect it, clone the repo, and switch to Local Directory mode.
- Adjust the crawling options as needed (these are ignored for Git URLs).
- Click "Download & Convert".
This mode is for packaging local files.
- Select the "Local Directory" radio button.
- Choose the Input Directory you want to package.
- Use the Excludes text area to list any files or directories you wish to exclude. These patterns are combined with the rules in your
.gitignore
file. - Use the checkboxes to include subdirectories or hide common binary and image files from the list.
- Click "Package". The application will package all visible files into a single file in your Downloads folder, using the format selected in the output dropdown.
On first run, the application creates a config.json
file in the same directory. You can edit this file to customize:
user_agents
: The list of user-agents available in the dropdown.default_local_excludes
: The default patterns that appear in the "Excludes" text box.binary_file_patterns
: The list of file patterns to hide when "Hide Images + Binaries" is checked.
You can create a standalone executable using PyInstaller.
-
Install PyInstaller:
pip install pyinstaller
-
The repository includes a pre-configured
ContextPacker.spec
file and a runtime hook (pyi_rth_selenium.py
) to correctly handle Selenium's dependencies. -
Run the build command from the project root:
pyinstaller --clean ContextPacker.spec
-
The final single-file executable will be located in the
dist
folder.
- UI: Python with wxPython
- Web Scraping: Selenium + Beautiful Soup
- HTML to Markdown: markdownify
- File Packaging: repomix