MarkLang is a robust, production-ready CLI tool for translating Hugo Markdown blog posts (including frontmatter, tags, and categories) from one language to another. It supports custom per-language dictionaries, Google Translate, and offline transliteration for technical terms and proper nouns.
- CLI-based: Translate files with a single command.
- Frontmatter Support: Translates title, description, tags, categories, and more.
- Custom Dictionary: Per-language CSV files for preferred translations of tags/categories.
- Google Translate Fallback: Uses Google Translate for tags/categories if not found in the dictionary.
- Offline Transliteration: For technical terms, falls back to script transliteration (e.g., Devanagari for Hindi).
- Robust Logging: Detailed logs for every step, including dictionary usage and translation fallbacks.
- Production-Ready: Modular, type-annotated, and well-documented code.
- Input: You provide a Markdown file with YAML frontmatter (e.g.,
en/blog/example.md). - Translation: The script translates the title, description, tags, categories, and content to the target language.
- Custom Dictionary: For tags/categories, it first checks a per-language CSV (e.g.,
translations_hi.csv). - Fallbacks: If not found, it uses Google Translate; if that fails, it uses offline transliteration.
- Output: The translated file is written to the corresponding target language directory (e.g.,
hi/blog/example.md).
- Clone the repository:
git clone https://github.com/rohanbatrain/MarkLang cd MarkLang - Install dependencies:
For Hindi/Thai transliteration, also install:
pip install -r requirements.txt
pip install indic-transliteration aksharamukha
python main.py <input_file> <target_lang> [--source_lang en] [--model llama3.2:3b]<input_file>: Path to the input Markdown file (with frontmatter)<target_lang>: Target language code (e.g.,hi,fr,de)--source_lang: Source language code (default:en)--model: Translation model to use (default:llama3.2:3b)
Example:
python main.py en/blog/example.md hiThis will create hi/blog/example.md with all content translated to Hindi.
- Place per-language CSVs in the
translations/directory, e.g.,translations_hi.csv,translations_fr.csv. - Each CSV should have columns:
word,translation - Example (
translations_hi.csv):word,translation Automation,ऑटोमेशन Linux,लिनक्स SSH,एसएसएच
- The script will always check the dictionary first for tags/categories.
- Initial Version: Focused on translating titles/descriptions using a translation API.
- Enhancements:
- Added support for arrays (tags/categories) and batch translation.
- Implemented robust error handling and logging.
- Added custom dictionary support and offline transliteration.
- Refactored for CLI usage and production-readiness.
- Fallback Logic: Always tries dictionary → Google Translate → offline transliteration.
- Validation: Ensures frontmatter is valid and all keys have values.
- Extensible: Easy to add new languages or extend dictionary files.
- English (
en) - Hindi (
hi) - French (
fr) - German (
de) - Italian (
it) - Portuguese (
pt) - Spanish (
es) - Thai (
th)
- All major steps are logged to the console.
- Errors in translation, dictionary loading, or file writing are clearly reported.
- If a translation fails, the script falls back gracefully and logs the fallback used.
- PRs are welcome! Please add tests for new features.
- For new languages, add a
translations_<lang>.csvfile in thetranslations/directory.
MIT License
