LLMs often ignore instructions to avoid smart quotes, EM/EN dashes, and other symbols. This macOS menu bar app combines spaCy NLP for context-aware processing with a rule-based system to scrub typographic characters from LLM (or any other) output.
See TODO.md for planned improvements.
- Menu Bar: Runs as a menu bar app
- NLP Processing: Uses spaCy for context detection
- Configurable: All character replacements can be customized via JSON config
- Smart Quotes: Replaces
"
"
'
'
with straight quotes"
'
- Smart Dashes: Converts em dashes
โ
and en dashesโ
to hyphens-
with context-aware logic - Ellipsis: Replaces
โฆ
with three dots...
- Symbols: Converts typographic symbols to ASCII equivalents
- Unicode: Handles accented characters by removing diacritics
- Various Others: Supports trademarks, fractions, mathematical symbols, currency, units, and more
- Smart Quotes: Replaces
- Notifications: Shows success/error notifications
- NLP Stats: Built-in performance monitoring and statistics
# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Build and install the app
make build
make install
# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Set up environment (handles Python version compatibility and spaCy model)
make setup
# Run the app
make run
# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies (includes spaCy and English language model)
pip install -e .[dev,build]
# Run the app
PYTHONPATH=src python src/run_app.py
- Copy LLM output with smart quotes or typographic characters
- Click the robot icon ๐ค in your menu bar
- Select "Scrub Clipboard" from the menu
- Paste anywhere - now with plain ASCII characters!
The app uses spaCy's natural language processing for context-aware EM dash replacement:
The system uses spaCy's linguistic analysis instead of hardcoded wordlists:
- Part-of-Speech (POS) Analysis: Identifies nouns, verbs, adjectives, etc.
- Dependency Parsing: Understands grammatical relationships
- Sentence Structure Analysis: Detects boundaries and context
- Token-level Processing: Analyzes individual words and their roles
The system detects and handles these EM dash contexts:
- Compound Words:
selfโdriving
โself-driving
- Parenthetical/Appositive:
textโadditional infoโmore text
โtext, additional info, more text
- Emphasis:
The resultโamazinglyโwas perfect
โThe result, amazingly, was perfect
- Dialogue:
"Hello"โshe said
โ"Hello", she said
- Conjunctions:
Aโor B
โA, or B
- Default Cases:
simpleโtext
โsimple-text
All settings can be managed via the app's menu:
- Click the menu bar icon ๐ค and select "Configuration"
- Toggle any setting or sub-setting by number
- Restore defaults with option 0
A JSON config file is also stored at ~/.llm_output_scrub/config.json
for advanced/manual editing.
Setting | Effect |
---|---|
Decompose Unicode | Converts composed chars (รฉ) to base + accent (e + ฬ) |
Remove Accent Marks | Removes combining marks (e + ฬ โ e) |
Remove All Non-ASCII | Removes any character not in standard ASCII |
Clean Up Extra Spacing | Normalizes whitespace, trims excess, removes extra blank lines |
Enable Debug Mode | Shows "NLP Stats" menu item for performance monitoring |
Category | Replacement |
---|---|
Smart Quotes | " " ' ' โ " ' |
Em Dashes | โ โ - (context-aware, see below) |
En Dashes | โ โ - |
Ellipsis | โฆ โ ... |
Angle Quotes | โน โบ ยซ ยป โ < > << >> |
Trademarks | โข ยฎ โ (TM) (R) |
Mathematical | โค โฅ โ โ ยฑ โ <= >= != ~ +/- |
Fractions | ยผ ยฝ ยพ โ 1/4 1/2 3/4 |
Footnotes | โ โก โ * ** |
Units | ร รท โฐ โฑ โ * / per thousand per ten thousand |
Currency | โฌ ยฃ ยฅ ยข โ EUR GBP JPY cents |
Em Dashes โ Contextual/NLP mode: When enabled (default), EM dashes are replaced using spaCy NLP for context-aware output. When off, a simple hyphen is used. Toggle this in the menu.
make setup # Set up environment
make build # Build the standalone macOS app
make install # Install the app to /Applications
make run # Run the app
make test-unit # Unit tests
make test # Integration tests
make clean # Clean build artifacts
make distclean # Remove all build artifacts and the virtual environment
make uninstall # Remove the app from /Applications
- Virtual environment issues: Run
make clean-venv && make setup
to recreate the environment. - Import errors: The app uses package-style imports. Run with
make run
or manually withPYTHONPATH=src python src/run_app.py
.
Follow existing code style, add tests for new features, and run make test-unit
before submitting PRs.
llm_output_scrub/
โโโ src/llm_output_scrub/ # Source code
โ โโโ __init__.py # Python init
โ โโโ app.py # Main application
โ โโโ config_manager.py # Configuration management
โ โโโ nlp.py # spaCy-based NLP processing
โ โโโ py.typed # Type hints marker
โโโ src/run_app.py # Entry point script
โโโ tests/ # Test suite
โ โโโ test_scrub.py # Unit tests
โ โโโ integration-test.sh # Integration test script
โ โโโ input.txt # Test input data
โโโ assets/ # App assets (icons, spaCy model)
โโโ typings/ # Type stubs (e.g., rumps.pyi)
โโโ pyproject.toml # Project configuration & dependencies
โโโ setup.py # py2app build configuration
โโโ Makefile # Build commands
โโโ TODO.md # Development roadmap
โโโ LICENSE # MIT license
Key dependencies: rumps
(menu bar), pyperclip
(clipboard), spacy
(NLP), py2app
(bundling). See pyproject.toml
for full list.
This project is licensed under the MIT License - see the LICENSE file for details.