Skip to content

Major Update: Configuration System, Gitignore Support, Auto Language Detection + Code Quality #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,14 @@ export AZURE_OPENAI_API_VERSION='2024-02-01'

### Basic Translation
```bash
# Translate to German
# Translate to German (default: shows warnings/errors only)
gpt-po-translator --folder ./locales --lang de

# Multiple languages
gpt-po-translator --folder ./locales --lang de,fr,es --bulk
# With progress information
gpt-po-translator --folder ./locales --lang de -v

# Multiple languages with verbose output
gpt-po-translator --folder ./locales --lang de,fr,es -v --bulk
```

### Different AI Providers
Expand Down Expand Up @@ -131,15 +134,18 @@ This helps you:
| Option | Description |
|--------|-------------|
| `--folder` | Path to .po files |
| `--lang` | Target languages (e.g., `de,fr,es`) |
| `--lang` | Target languages (e.g., `de,fr,es`, `fr_CA`, `pt_BR`) |
| `--provider` | AI provider: `openai`, `azure_openai`, `anthropic`, `deepseek` |
| `--bulk` | Enable batch translation (recommended) |
| `--bulk` | Enable batch translation (recommended for large files) |
| `--bulksize` | Entries per batch (default: 50) |
| `--model` | Specific model to use |
| `--list-models` | Show available models |
| `--fix-fuzzy` | Translate fuzzy entries |
| `--folder-language` | Auto-detect languages from folders |
| `--no-ai-comment` | Disable AI tagging |
| `-v, --verbose` | Show progress information (use `-vv` for debug) |
| `-q, --quiet` | Only show errors |
| `--version` | Show version and exit |

## πŸ› οΈ Development

Expand Down
6 changes: 6 additions & 0 deletions docker-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ if [ $# -eq 0 ]; then
echo " Format: -v /host/path:/container/path"
echo " The '/container/path' is what you'll use with the --folder parameter."
echo
echo "Configuration:"
echo " The tool automatically loads configuration from pyproject.toml files found in:"
echo " β€’ Mounted volume directories"
echo " β€’ The target translation folder and its parent directories"
echo " See examples/docker-pyproject.toml for Docker-optimized configuration."
echo
echo "Examples:"
echo " # Translate files in the current directory to German"
echo " docker run -v $(pwd):/data -e OPENAI_API_KEY=<your_key> ghcr.io/pescheckit/python-gpt-po --folder /data --lang de"
Expand Down
197 changes: 195 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Below is a detailed explanation of all command-line arguments:
*Behind the scenes:* The tool recursively scans this folder and processes every file ending with `.po`.

- **`--lang <language_codes>`**
*Description:* A comma-separated list of ISO 639-1 language codes (e.g., `de,fr`).
*Description:* A comma-separated list of ISO 639-1 language codes (e.g., `de,fr`) or locale codes (e.g., `fr_CA,pt_BR`).
*Behind the scenes:* The tool filters PO files by comparing these codes with the file metadata and folder names (if `--folder-language` is enabled).

### Optional Options
Expand Down Expand Up @@ -173,13 +173,206 @@ Below is a detailed explanation of all command-line arguments:

- **`--folder-language`**
*Description:* Enables inferring the target language from the folder structure.
*Behind the scenes:* The tool inspects the path components (directory names) of each PO file and matches them against the provided language codes.
*Behind the scenes:* The tool inspects the path components (directory names) of each PO file and matches them against the provided language codes. Supports locale codes (e.g., folder `fr_CA` matches `-l fr_CA` for Canadian French, or falls back to `-l fr` for standard French).

- **`--no-ai-comment`**
*Description:* Disables the automatic addition of 'AI-generated' comments to translated entries.
*Behind the scenes:* **By default (without this flag), every translation made by the AI is marked with a `#. AI-generated` comment in the PO file.** This flag prevents that marking, making AI translations indistinguishable from human translations in the file.
*Note:* AI tagging is enabled by default for tracking, compliance, and quality assurance purposes.

- **`-v, --verbose`**
*Description:* Increases output verbosity. Can be used multiple times for more detail.
*Behind the scenes:* Controls the logging level:
- No flag: Shows only warnings and errors (default)
- `-v`: Shows info messages including progress tracking
- `-vv`: Shows debug messages for troubleshooting
*Note:* Progress tracking shows translation progress for both single and bulk modes.

- **`-q, --quiet`**
*Description:* Reduces output to only show errors.
*Behind the scenes:* Sets logging level to ERROR, suppressing all info and warning messages.

- **`--version`**
*Description:* Shows the program version and exits.
*Behind the scenes:* Displays the current version from package metadata.

---

## Locale and Regional Variant Handling

### Overview

The tool now fully supports locale codes (e.g., `fr_CA`, `pt_BR`, `en_US`) in addition to simple language codes. This allows you to translate content for specific regional variants of a language.

### How Locale Matching Works

The tool uses a smart matching system that:
1. **First tries exact match**: `fr_CA` matches `fr_CA`
2. **Then tries format conversion**: `fr_CA` matches `fr-CA` (underscore ↔ hyphen)
3. **Finally tries base language fallback**: `fr_CA` matches `fr`

### Language Detection Priority

When a PO file is processed, the language is determined in this order:
1. **File metadata**: The `Language` field in the PO file header
2. **Folder structure** (with `--folder-language`): Directory names in the file path

### Examples

**Working with Canadian French:**
```bash
# Translate specifically to Canadian French
gpt-po-translator --folder ./locales --lang fr_CA

# With detailed language name for better AI context
gpt-po-translator --folder ./locales --lang fr_CA --detail-lang "Canadian French"

# Process files in fr_CA folders
gpt-po-translator --folder ./locales --lang fr_CA --folder-language
```

**Working with Brazilian Portuguese:**
```bash
# Translate to Brazilian Portuguese (different vocabulary from European Portuguese)
gpt-po-translator --folder ./locales --lang pt_BR --detail-lang "Brazilian Portuguese"

# Fall back to European Portuguese
gpt-po-translator --folder ./locales --lang pt
```

### What the AI Sees

The language code or detail name is passed directly to the AI in the translation prompt:

| Command | AI Sees in Prompt |
|---------|-------------------|
| `-l fr` | "Translate to fr" |
| `-l fr_CA` | "Translate to fr_CA" |
| `-l fr_CA --detail-lang "Canadian French"` | "Translate to Canadian French" |
| `-l pt_BR --detail-lang "Brazilian Portuguese"` | "Translate to Brazilian Portuguese" |

### Folder Language Behavior

With `--folder-language`, the tool matches folder names against your `-l` parameter:

| Folder | `-l` Parameter | Result |
|--------|----------------|--------|
| `locales/fr_CA/` | `fr_CA` | Translates to Canadian French |
| `locales/fr_CA/` | `fr` | Translates to standard French (fallback) |
| `locales/pt_BR/` | `pt_BR` | Translates to Brazilian Portuguese |
| `locales/pt_BR/` | `pt` | Translates to European Portuguese (fallback) |

### Best Practices

1. **For regional variants**, always use the full locale code:
```bash
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR,en_US
```

2. **Add detail names** for better AI understanding:
```bash
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR \
--detail-lang "Canadian French,Brazilian Portuguese"
```

3. **Use folder detection** for projects with locale-based directory structure:
```bash
# Processes files in locales/fr_CA/, locales/pt_BR/, etc.
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR --folder-language
```

---

## Performance and Progress Tracking

### Overview

The tool provides intelligent performance warnings and progress tracking to help you manage large translation tasks efficiently.

### Performance Modes

1. **Single Mode (Default)**: Makes one API call per translation
- Better for small files (< 30 entries)
- More accurate for context-sensitive translations
- Shows progress for each entry with `-v` flag

2. **Bulk Mode (`--bulk`)**: Batches multiple translations per API call
- Recommended for large files (> 30 entries)
- Significantly faster (up to 10x for large files)
- Shows progress per batch with `-v` flag

### Automatic Performance Warnings

When processing files with more than 30 entries in single mode, the tool will:
1. Display a performance warning with time estimates
2. Recommend switching to bulk mode
3. For very large files (>100 entries), provide a 10-second countdown to cancel

Example warning:
```
2024-01-15 10:30:45 - WARNING - PERFORMANCE WARNING
2024-01-15 10:30:45 - WARNING - Current mode: SINGLE (1 API call per translation)
2024-01-15 10:30:45 - WARNING - This will make 548 separate API calls
2024-01-15 10:30:45 - WARNING - Estimated time: ~14 minutes
2024-01-15 10:30:45 - WARNING -
2024-01-15 10:30:45 - WARNING - Recommendation: Use BULK mode for faster processing
2024-01-15 10:30:45 - WARNING - Command: add --bulk --bulksize 50
2024-01-15 10:30:45 - WARNING - Estimated time with bulk: ~2 minutes
2024-01-15 10:30:45 - WARNING - Speed improvement: 7x faster
```

### Progress Tracking

Enable progress tracking with the `-v` flag:

```bash
# See progress for each file and translation
gpt-po-translator --folder ./locales --lang fr -v

# Output includes:
# - File processing status
# - Translation progress (X/Y entries)
# - Percentage completion
# - Batch progress (in bulk mode)
```

Example progress output:
```
2024-01-15 10:31:00 - INFO - Processing: ./locales/fr/messages.po (45 entries)
2024-01-15 10:31:01 - INFO - [SINGLE 1/45] Translating entry...
2024-01-15 10:31:02 - INFO - [SINGLE 2/45] Translating entry...
2024-01-15 10:31:10 - INFO - Progress: 10/45 entries completed (22.2%)
```

### Verbosity Levels

Control output detail with verbosity flags:

| Flag | Level | Shows |
|------|-------|-------|
| (default) | WARNING | Performance warnings, errors |
| `-v` | INFO | Progress tracking, status updates |
| `-vv` | DEBUG | Detailed API calls, responses |
| `-q` | ERROR | Only critical errors |

### Best Practices for Large Files

1. **Always use bulk mode for files > 100 entries**:
```bash
gpt-po-translator --folder ./locales --lang fr --bulk --bulksize 50 -v
```

2. **Adjust batch size based on content**:
- Short entries (1-5 words): `--bulksize 100`
- Medium entries (sentences): `--bulksize 50` (default)
- Long entries (paragraphs): `--bulksize 20`

3. **Monitor progress for long-running tasks**:
```bash
# Run with progress tracking
gpt-po-translator --folder ./large-project --lang de,fr,es --bulk -v
```

---

## AI Translation Tracking
Expand Down
97 changes: 88 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,17 @@ requires-python = ">=3.9"
license = {text = "MIT"}
dependencies = [
"polib==1.2.0",
"openai==1.58.1",
"python-dotenv==1.0.0",
"pytest==8.2.2",
"openai==1.99.9",
"python-dotenv==1.0.1",
"pytest==8.3.4",
"tenacity==9.0.0",
"setuptools-scm==8.1.0",
"pycountry==24.6.1",
"anthropic==0.48.0",
"anthropic==0.63.0",
"requests==2.32.3",
"responses==0.25.6",
"isort==6.0.1",
"responses==0.25.8",
"isort==5.13.2",
"tomli==2.2.1",
]
classifiers = [
"Development Status :: 5 - Production/Stable",
Expand All @@ -57,8 +58,86 @@ classifiers = [
[project.scripts]
gpt-po-translator = "python_gpt_po.main:main"

[tool.flake8]
max-line-length = 120

[tool.isort]
line_length = 120

[tool.gpt-po-translator]
# Configuration for gpt-po-translator

# ===== FILE SCANNING =====
# Whether to respect .gitignore files (enabled by default)
respect_gitignore = true

# Additional patterns to ignore (beyond .gitignore)
ignore_patterns = [
"*.pyc",
"__pycache__/",
"*.egg-info/",
".pytest_cache/",
".coverage",
".tox/",
".mypy_cache/",
"htmlcov/",
]

# Default patterns that are always ignored (can be overridden by setting to empty list)
default_ignore_patterns = [
".git/",
".venv/",
"venv/",
"env/",
".env/",
"node_modules/",
".cache/",
"build/",
"dist/",
"*.egg-info/",
"__pycache__/",
".pytest_cache/",
".tox/",
".mypy_cache/",
]

# ===== TRANSLATION BEHAVIOR =====
# Default verbosity level (0=WARNING, 1=INFO, 2=DEBUG)
default_verbosity = 1

# Default batch size for bulk mode
default_batch_size = 50

# Enable bulk mode by default
default_bulk_mode = false

# Whether to mark AI-generated translations with comments by default
mark_ai_generated = true

# Whether to use folder-based language detection by default
folder_language_detection = false

# Whether to fix fuzzy entries by default
fix_fuzzy_entries = false

# ===== PROVIDER DEFAULTS =====
# Default provider to use if multiple API keys are available
# Options: "openai", "anthropic", "groq", "together", "xai"
# default_provider = "openai"

# Default models for each provider (will be used if no model is specified)
default_models = { openai = "gpt-4o-mini", anthropic = "claude-3-5-sonnet-20241022" }

# ===== PERFORMANCE =====
# Maximum retries for failed translations
max_retries = 3

# Timeout for API requests (seconds)
request_timeout = 120

# ===== OUTPUT =====
# Skip files that are already fully translated
skip_translated_files = true

# Show progress indicators during translation
show_progress = true

# Show detailed summary at the end
show_summary = true
Loading