Normalizes Unicode to ASCII equivalents.
I'm getting this out quickly as people need it. Updates will follow to polish this up more soon.
Clone the repository somewhere on your system. You will need to pop open a terminal window to do this.
Then copy and paste the following commands into the terminal:
git clone https://github.com/unixwzrd/UnicodeFix.git
cd UnicodeFix
bash setup.sh
Setup will create a virtual environment to keep your system Python clean. I also have a whole set of Virtual Environment Utilities repo it's likely overkill for most people., but it does contain a lot of useful utilities and tools for managing Python Virtual environments using Pip and Conda, along with many other handy tools for AI and Unix.
It will also add the items needed to start the script into your .bashrc
.
Look at the setup.sh file to see exactly what it does if you like — it's very simple.
The .bashrc
items are necessary because I have a Shortcut you may use from the macOS context menu to run the script directly.
(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ python bin/cleanup-text.py --help
usage: cleanup-text.py [-h] [infile ...]
Clean Unicode quirks from text.
positional arguments:
infile Input file(s)
options:
-h, --help Show this help message and exit
Example:
python bin/cleanup-text.py <input_file>
The output file will be named the same as the input file, but with a .clean.txt
extension.
You can select multiple files at once.
There is a "Shortcut" file in the macOS/
directory which may be imported into the Shortcuts app.
It will allow the script to be run as a Quick Action from the Finder "Right Click" menu.
This allows selecting multiple files and scrubbing the Unicode quirks from them in bulk.
-
Open the "Shortcuts" app.
-
Go to
File -> Import...
-
Navigate to the
macOS
directory in this repository and select theStrip Unicode.shortcut
file. -
You will need to open the shortcut and change the location path of the
cleanup-text.py
script. -
You may have to restart Finder (use
Command+Option+Esc
, select Finder, and click "Relaunch"). -
Once setup, right-click on a file or multiple files in Finder, go to
Quick Actions
, and selectStrip Unicode
.This will invoke the script on the selected files and create
.clean.txt
versions.
Strip all the Unicode quirks out of your text files right in the finder using a Quick Action!
If you know a better way for Linux or Windows users, feel free to submit a PR with your improvements.
- bin/cleanup-text.py — The script that cleans up the text.
- bin/cleanup-text — A symlink without the
.py
extension for prettier usage in scripts. - setup.sh — A script that sets up the virtual environment.
- LICENSE — The license for the project.
- README.md — This file.
- requirements.txt — The dependencies needed to run.
- data/ — Sample files full of Unicode issues for testing.
- docs/ — Supporting documentation for the project.
- macOS/ — The Shortcut file for macOS users.
If you have suggestions, enhancements, or fixes, feel free to open an issue or pull request!
Testing and feedback are also very welcome.
AI and Unix are my passions — but I need to pay the bills too.
If you find this project useful, please tell others, and consider supporting my work:
Thank you!
- bug fix for filtering STDIO pipes
- added a shell script wrapper to source in your .bashrc, presumable with the virtual environment activated.
- Initial release.
Copyright 2025
unixwzrd@unixwzrd.ai
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.