A powerful IntelliJ IDEA plugin that detects and cleans problematic Unicode characters that can trigger AI detection systems or cause compatibility issues. Helps normalize text to use standard ASCII characters for better cross-platform compatibility.
AI detection tools often flag content based on subtle Unicode "fingerprints" that indicate machine generation:
- Smart quotes (
"") instead of straight quotes (") - Em dashes (
—) instead of hyphens (-) - Hidden characters like zero-width spaces, soft hyphens
- Full-width characters from Asian character sets
- Typographic punctuation like ellipsis (
…) instead of three periods (...)
This plugin helps you create content that appears naturally human-typed while maintaining perfect readability.
- Highlights problematic Unicode characters as you type
- Shows Unicode code points and character descriptions
- Color-coded warnings by severity level
- Individual fixes: Replace single characters with ASCII equivalents
- Category fixes: Clean all quotes, dashes, or spaces at once
- File-wide fixes: Clean entire documents in one click
- Project-wide fixes: Batch process multiple files
Enable/disable detection for specific character types:
- ✅ Hidden/Control Characters - Zero-width spaces, soft hyphens, directional marks
- ✅ Smart Quotes & Apostrophes - Typographic quotes (
""'') - ✅ Em & En Dashes - Long dashes (
—–) → hyphens (-) - ✅ Ellipsis & Bullets - Special punctuation (
…•) → ASCII equivalents - ✅ Full-Width Characters - Asian character variants (
ABC) → standard ASCII - ✅ Non-Standard Spaces - Non-breaking, ideographic spaces → regular spaces
- ✅ Variation Selectors - Unicode formatting modifiers
Configurable file type filtering supports:
- Text files:
.txt,.md,.rst - Source code:
.java,.js,.ts,.py,.cpp,.c,.h - Configuration:
.xml,.json,.yaml,.yml,.properties - Web files:
.html,.css - Custom extensions: Add your own file types
- Menu actions: Edit → Unicode Cleaner → [action]
- Keyboard shortcuts:
Ctrl+Shift+U- Clean current fileCtrl+Alt+U- Clean selected text
- Context menus: Right-click for quick access
- Project view: Clean selected files from project tree
- Open IntelliJ IDEA
- Go to File → Settings → Plugins
- Search for "Unicode Text Cleaner"
- Click Install and restart IntelliJ
- Download the latest release from GitHub Releases
- Open IntelliJ IDEA
- Go to File → Settings → Plugins
- Click ⚙️ gear icon → Install Plugin from Disk...
- Select the downloaded ZIP file
- Restart IntelliJ IDEA
- Open any text file with Unicode characters
- See highlighted warnings on problematic characters
- Click the lightbulb 💡 or press
Alt+Enterfor quick fixes - Use menu actions for bulk operations
Before cleaning:
This text has "smart quotes" and em—dashes.
It also contains ellipsis… and bullets •
Some hidden characters like softhyphens.
Full-width numbers: 12345
After cleaning:
This text has "smart quotes" and em-dashes.
It also contains ellipsis... and bullets *
Some hidden characters like soft-hyphens.
Full-width numbers: 12345
- Go to File → Settings → Unicode Cleaner
- Enable/disable character categories you want to detect
- Configure file extensions to process
- Set performance options for large files
- Current file:
Edit → Unicode Cleaner → Clean Unicode Characters in File - Selected text: Select text, then
Edit → Unicode Cleaner → Clean Selected Text - Entire project:
Edit → Unicode Cleaner → Clean Unicode Characters in Project - Selected files: Right-click files in Project view → "Clean Unicode Characters"
- Warning highlights: Problematic characters are underlined
- Hover tooltips: Show Unicode code points and descriptions
- Problem severity: Different colors for different issue types
- 🔴 Red: Hidden/control characters (serious issues)
- 🟡 Yellow: Typographic characters (moderate issues)
- 🟠 Orange: Space and punctuation issues (minor issues)
The plugin is designed to preserve legitimate international characters:
- ✅ German umlauts: ä, ö, ü, Ä, Ö, Ü, ß
- ✅ French accents: é, è, à, ç, ê, ë, î, ï, ô, ù, û, ü, ÿ
- ✅ Spanish characters: ñ, Ñ, á, é, í, ó, ú
- ✅ Nordic characters: å, Å, æ, Æ, ø, Ø
- ✅ Slavic characters: č, š, ž, ř, ň, etc.
- ✅ Other Latin scripts: All legitimate Latin-1 and Latin Extended characters
Only problematic characters that commonly indicate AI generation:
- ❌ Smart quotes and fancy punctuation
- ❌ Hidden/invisible characters
- ❌ Full-width Asian variants of ASCII characters
- ❌ Typographic dashes and spaces
📝 Complete List of Detected Characters
Hidden/Control Characters
| Character | Unicode | Description | Action |
|---|---|---|---|
| | U+00AD | Soft hyphen | Remove |
| | U+200B | Zero width space | Remove |
| | U+200C | Zero width non-joiner | Remove |
| | U+200D | Zero width joiner | Remove |
| | U+200E | Left-to-right mark | Remove |
| | U+200F | Right-to-left mark | Remove |
| | U+2060 | Word joiner | Remove |
| Character | Unicode | Description | Replacement |
|---|---|---|---|
| " | U+201C | Left double quote | " |
| " | U+201D | Right double quote | " |
| ' | U+2018 | Left single quote | ' |
| ' | U+2019 | Right single quote | ' |
| „ | U+201E | Double low-9 quote | " |
| ‚ | U+201A | Single low-9 quote | ' |
| Character | Unicode | Description | Replacement |
|---|---|---|---|
| — | U+2014 | Em dash | - |
| – | U+2013 | En dash | - |
| ‒ | U+2012 | Figure dash | - |
| − | U+2212 | Minus sign | - |
| Character | Unicode | Description | Replacement |
|---|---|---|---|
| … | U+2026 | Horizontal ellipsis | ... |
| • | U+2022 | Bullet | * |
| · | U+00B7 | Middle dot | * |
| Character | Unicode | Description | Replacement |
|---|---|---|---|
| ! | U+FF01 | Full-width exclamation | ! |
| ? | U+FF1F | Full-width question mark | ? |
| 123 | U+FF11-FF19 | Full-width digits | 123 |
| ABC | U+FF21-FF3A | Full-width letters | ABC |
Configure which types of characters to detect:
☑️ Hidden/Control Characters (Recommended: ON)
☑️ Smart Quotes & Apostrophes (Recommended: ON)
☑️ Em & En Dashes (Recommended: ON)
☑️ Ellipsis & Bullets (Recommended: ON)
☑️ Full-Width Characters (Recommended: ON)
☑️ Non-Standard Spaces (Recommended: ON)
☑️ Variation Selectors (Recommended: ON)
Specify which file extensions to process:
Default: txt,md,rst,java,js,ts,py,cpp,c,h,xml,json,yaml,yml,properties,html,css
Custom: Add your own comma-separated extensions
- Max file size: Set limit for real-time detection (default: 10MB)
- Real-time detection: Enable/disable live highlighting
- Batch processing: Configure timeouts for large operations
| Action | Default Shortcut | Description |
|---|---|---|
| Clean Current File | Ctrl+Shift+U |
Clean all Unicode issues in current file |
| Clean Selected Text | Ctrl+Alt+U |
Clean Unicode issues in selected text |
| Open Settings | - | Open Unicode Cleaner settings panel |
Shortcuts can be customized in IntelliJ's Keymap settings
git clone https://github.com/unicodecleaner/intellij-plugin.git
cd intellij-plugin
./gradlew buildPlugin./gradlew test
./gradlew performanceTest./gradlew runIde- Small files (< 1MB): < 10ms detection
- Medium files (1-10MB): < 100ms detection
- Large files (10MB+): < 1000ms detection
- Quick fixes: < 50ms per operation
- Batch processing: ~200 files/second
- Plugin overhead: < 50MB
- Large projects: < 200MB additional memory
- Automatic cleanup: Caches cleared after 1 hour
Q: Plugin not appearing in menus
- A: Check Settings → Plugins → Installed → "Unicode Text Cleaner" is enabled
- Restart IntelliJ if needed
Q: Characters not being highlighted
- A: Check Settings → Unicode Cleaner → ensure character categories are enabled
- Verify file extension is in the configured list
Q: Performance issues with large files
- A: Increase max file size limit in settings
- Disable real-time detection for very large files
Q: Plugin incompatible with IntelliJ version
- A: Check compatibility - requires IntelliJ 2023.1 or later
- Download latest plugin version
- 🐛 Report bugs: GitHub Issues
- 💬 Ask questions: GitHub Discussions
- 📧 Contact: kontakt@stoitschev.de
❓ Will this plugin delete my international characters?
No! The plugin is designed to preserve all legitimate international characters including:
- German umlauts (ä, ö, ü)
- French accents (é, è, à, ç)
- Spanish ñ, Nordic å/æ/ø
- All Latin-1 and Latin Extended characters
Only problematic Unicode characters that indicate AI generation are targeted.
❓ What's the difference between this and a simple find/replace?
This plugin:
- Automatically detects 50+ problematic character types
- Real-time highlighting shows issues as you type
- Smart categorization lets you clean specific types
- Bulk operations across multiple files
- Preserves formatting and legitimate international text
- Configurable rules for different use cases
❓ Will this make my text look robotic?
No! The plugin converts fancy typography to standard ASCII, which is how most people actually type. The result looks more natural and human-like, not robotic.
Before: This is "fancy" typography—with special characters…
After: This is "normal" typography-with regular characters...
❓ Can I use this for code files?
Yes! The plugin is safe for code files and only targets problematic characters that shouldn't appear in source code anyway. It's particularly useful for:
- Cleaning copy-pasted code from documentation
- Fixing smart quotes in string literals
- Removing hidden characters that cause syntax errors
❓ Does this work with all file types?
The plugin can work with any text-based file. By default, it processes common file types:
- Text:
.txt,.md,.rst - Code:
.java,.js,.py,.cpp, etc. - Config:
.json,.xml,.yaml, etc.
You can add custom file extensions in the settings.
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the growing need for AI detection avoidance
- Thanks to the IntelliJ Platform team for excellent plugin APIs
- Community feedback and suggestions from beta testers
Made with ❤️ for content creators who want their text to appear naturally human-typed.
For more tools and resources, visit stoitschev.de