Skip to content

Normalizes Unicode to ASCII equivalents and remove Unicode from AI generated text from ChatGPT, Anthropic, Google and more.

License

Notifications You must be signed in to change notification settings

unixwzrd/UnicodeFix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UnicodeFix

Normalizes Unicode to ASCII equivalents.

I'm getting this out quickly as people need it. Updates will follow to polish this up more soon.

Installation

Clone the repository somewhere on your system. You will need to pop open a terminal window to do this.

Then copy and paste the following commands into the terminal:

git clone https://github.com/unixwzrd/UnicodeFix.git
cd UnicodeFix
bash setup.sh

Setup will create a virtual environment to keep your system Python clean. I also have a whole set of Virtual Environment Utilities repo it's likely overkill for most people., but it does contain a lot of useful utilities and tools for managing Python Virtual environments using Pip and Conda, along with many other handy tools for AI and Unix.

It will also add the items needed to start the script into your .bashrc.

Look at the setup.sh file to see exactly what it does if you like — it's very simple.

The .bashrc items are necessary because I have a Shortcut you may use from the macOS context menu to run the script directly.

Usage

(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ python bin/cleanup-text.py --help
usage: cleanup-text.py [-h] [infile ...]

Clean Unicode quirks from text.

positional arguments:
  infile                Input file(s)

options:
  -h, --help            Show this help message and exit

Example:
python bin/cleanup-text.py <input_file>

The output file will be named the same as the input file, but with a .clean.txt extension.

You can select multiple files at once.

Shortcut for macOS

There is a "Shortcut" file in the macOS/ directory which may be imported into the Shortcuts app.
It will allow the script to be run as a Quick Action from the Finder "Right Click" menu.
This allows selecting multiple files and scrubbing the Unicode quirks from them in bulk.

To add the shortcut:

  1. Open the "Shortcuts" app.

  2. Go to File -> Import...

    Shortcuts App Menu

  3. Navigate to the macOS directory in this repository and select the Strip Unicode.shortcut file.

    Import Shortcut

  4. You will need to open the shortcut and change the location path of the cleanup-text.py script.

    Edit Shortcut Script Path

  5. You may have to restart Finder (use Command+Option+Esc, select Finder, and click "Relaunch").

  6. Once setup, right-click on a file or multiple files in Finder, go to Quick Actions, and select Strip Unicode.

    Select Shortcut File

    This will invoke the script on the selected files and create .clean.txt versions.

Strip all the Unicode quirks out of your text files right in the finder using a Quick Action!

If you know a better way for Linux or Windows users, feel free to submit a PR with your improvements.

What's in This Repo:

  • bin/cleanup-text.py — The script that cleans up the text.
  • bin/cleanup-text — A symlink without the .py extension for prettier usage in scripts.
  • setup.sh — A script that sets up the virtual environment.
  • LICENSE — The license for the project.
  • README.md — This file.
  • requirements.txt — The dependencies needed to run.
  • data/ — Sample files full of Unicode issues for testing.
  • docs/ — Supporting documentation for the project.
  • macOS/ — The Shortcut file for macOS users.

Contributing

If you have suggestions, enhancements, or fixes, feel free to open an issue or pull request!
Testing and feedback are also very welcome.

Support This and Other Projects I Have

AI and Unix are my passions — but I need to pay the bills too.

If you find this project useful, please tell others, and consider supporting my work:

Thank you!

Changelog

2025-04-27

  • bug fix for filtering STDIO pipes
  • added a shell script wrapper to source in your .bashrc, presumable with the virtual environment activated.

2025-04-26

  • Initial release.

License

Copyright 2025
unixwzrd@unixwzrd.ai

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.