Skip to content

LLMs often ignore instructions to avoid smart quotes, EM/EN dashes, and other symbols. This macOS menu bar app combines spaCy NLP for context-aware processing with a rule-based system to scrub typographic characters from LLM (or any other) output.

License

Notifications You must be signed in to change notification settings

nisc/LLM-output-scrub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงน LLM Output Scrub

LLMs often ignore instructions to avoid smart quotes, EM/EN dashes, and other symbols. This macOS menu bar app combines spaCy NLP for context-aware processing with a rule-based system to scrub typographic characters from LLM (or any other) output.

See TODO.md for planned improvements.

CI License Python macOS Code style: black Linting: flake8 Type checking: mypy Imports: isort

โœจ Features

  • Menu Bar: Runs as a menu bar app
  • NLP Processing: Uses spaCy for context detection
  • Configurable: All character replacements can be customized via JSON config
    • Smart Quotes: Replaces " " ' ' with straight quotes " '
    • Smart Dashes: Converts em dashes โ€” and en dashes โ€“ to hyphens - with context-aware logic
    • Ellipsis: Replaces โ€ฆ with three dots ...
    • Symbols: Converts typographic symbols to ASCII equivalents
    • Unicode: Handles accented characters by removing diacritics
    • Various Others: Supports trademarks, fractions, mathematical symbols, currency, units, and more
  • Notifications: Shows success/error notifications
  • NLP Stats: Built-in performance monitoring and statistics

๐Ÿš€ Quick Start

Option 1: Build Standalone App (Recommended)

# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub

# Build and install the app
make build
make install

Option 2: Automated Development Setup

# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub

# Set up environment (handles Python version compatibility and spaCy model)
make setup

# Run the app
make run

Option 3: Manual Development Setup

# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies (includes spaCy and English language model)
pip install -e .[dev,build]

# Run the app
PYTHONPATH=src python src/run_app.py

๐Ÿ“– Usage

  1. Copy LLM output with smart quotes or typographic characters
  2. Click the robot icon ๐Ÿค– in your menu bar
  3. Select "Scrub Clipboard" from the menu
  4. Paste anywhere - now with plain ASCII characters!

๐Ÿง  Advanced EM Dash Processing

The app uses spaCy's natural language processing for context-aware EM dash replacement:

NLP-Based Approach

The system uses spaCy's linguistic analysis instead of hardcoded wordlists:

  • Part-of-Speech (POS) Analysis: Identifies nouns, verbs, adjectives, etc.
  • Dependency Parsing: Understands grammatical relationships
  • Sentence Structure Analysis: Detects boundaries and context
  • Token-level Processing: Analyzes individual words and their roles

Context Detection

The system detects and handles these EM dash contexts:

  • Compound Words: selfโ€”driving โ†’ self-driving
  • Parenthetical/Appositive: textโ€”additional infoโ€”more text โ†’ text, additional info, more text
  • Emphasis: The resultโ€”amazinglyโ€”was perfect โ†’ The result, amazingly, was perfect
  • Dialogue: "Hello"โ€”she said โ†’ "Hello", she said
  • Conjunctions: Aโ€”or B โ†’ A, or B
  • Default Cases: simpleโ€”text โ†’ simple-text

โš™๏ธ Configuration

All settings can be managed via the app's menu:

  • Click the menu bar icon ๐Ÿค– and select "Configuration"
  • Toggle any setting or sub-setting by number
  • Restore defaults with option 0

A JSON config file is also stored at ~/.llm_output_scrub/config.json for advanced/manual editing.

General Settings

Setting Effect
Decompose Unicode Converts composed chars (รฉ) to base + accent (e + ฬ)
Remove Accent Marks Removes combining marks (e + ฬ โ†’ e)
Remove All Non-ASCII Removes any character not in standard ASCII
Clean Up Extra Spacing Normalizes whitespace, trims excess, removes extra blank lines
Enable Debug Mode Shows "NLP Stats" menu item for performance monitoring

Character Replacement Categories

Category Replacement
Smart Quotes " " ' ' โ†’ " '
Em Dashes โ€” โ†’ - (context-aware, see below)
En Dashes โ€“ โ†’ -
Ellipsis โ€ฆ โ†’ ...
Angle Quotes โ€น โ€บ ยซ ยป โ†’ < > << >>
Trademarks โ„ข ยฎ โ†’ (TM) (R)
Mathematical โ‰ค โ‰ฅ โ‰  โ‰ˆ ยฑ โ†’ <= >= != ~ +/-
Fractions ยผ ยฝ ยพ โ†’ 1/4 1/2 3/4
Footnotes โ€  โ€ก โ†’ * **
Units ร— รท โ€ฐ โ€ฑ โ†’ * / per thousand per ten thousand
Currency โ‚ฌ ยฃ ยฅ ยข โ†’ EUR GBP JPY cents

Em Dashes โ€” Contextual/NLP mode: When enabled (default), EM dashes are replaced using spaCy NLP for context-aware output. When off, a simple hyphen is used. Toggle this in the menu.

๐Ÿ› ๏ธ Development and Testing

make setup       # Set up environment
make build       # Build the standalone macOS app
make install     # Install the app to /Applications
make run         # Run the app
make test-unit   # Unit tests
make test        # Integration tests
make clean       # Clean build artifacts
make distclean   # Remove all build artifacts and the virtual environment
make uninstall   # Remove the app from /Applications

Common Issues

  • Virtual environment issues: Run make clean-venv && make setup to recreate the environment.
  • Import errors: The app uses package-style imports. Run with make run or manually with PYTHONPATH=src python src/run_app.py.

Contributing

Follow existing code style, add tests for new features, and run make test-unit before submitting PRs.

๐Ÿ“ Project Structure

llm_output_scrub/
โ”œโ”€โ”€ src/llm_output_scrub/     # Source code
โ”‚   โ”œโ”€โ”€ __init__.py           # Python init
โ”‚   โ”œโ”€โ”€ app.py                # Main application
โ”‚   โ”œโ”€โ”€ config_manager.py     # Configuration management
โ”‚   โ”œโ”€โ”€ nlp.py                # spaCy-based NLP processing
โ”‚   โ””โ”€โ”€ py.typed              # Type hints marker
โ”œโ”€โ”€ src/run_app.py            # Entry point script
โ”œโ”€โ”€ tests/                    # Test suite
โ”‚   โ”œโ”€โ”€ test_scrub.py         # Unit tests
โ”‚   โ”œโ”€โ”€ integration-test.sh   # Integration test script
โ”‚   โ””โ”€โ”€ input.txt             # Test input data
โ”œโ”€โ”€ assets/                   # App assets (icons, spaCy model)
โ”œโ”€โ”€ typings/                  # Type stubs (e.g., rumps.pyi)
โ”œโ”€โ”€ pyproject.toml            # Project configuration & dependencies
โ”œโ”€โ”€ setup.py                  # py2app build configuration
โ”œโ”€โ”€ Makefile                  # Build commands
โ”œโ”€โ”€ TODO.md                   # Development roadmap
โ””โ”€โ”€ LICENSE                   # MIT license

๐Ÿ“ฆ Dependencies

Key dependencies: rumps (menu bar), pyperclip (clipboard), spacy (NLP), py2app (bundling). See pyproject.toml for full list.

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

LLMs often ignore instructions to avoid smart quotes, EM/EN dashes, and other symbols. This macOS menu bar app combines spaCy NLP for context-aware processing with a rule-based system to scrub typographic characters from LLM (or any other) output.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published