-
Notifications
You must be signed in to change notification settings - Fork 2
feat: overhaul translation system for improved accuracy and stability #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: dcdda83 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
WalkthroughThe translation module was enhanced to support Russian, improved terminology handling with source content filtering, and introduced copy-only directory logic for untranslated files. The system prompt for the translation model was rewritten in English with explicit formatting rules. New utilities for anchor tag preservation and title translation mapping were added, along with updates to language labels and a multilingual title translation map. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI_User
participant TranslateCommand
participant FileSystem
participant TranslateFunction
participant OpenAI_Model
CLI_User->>TranslateCommand: Invoke translate command
TranslateCommand->>FileSystem: Identify files (copy-only or translatable)
alt File is copy-only
TranslateCommand->>FileSystem: Copy file, update frontmatter
else File requires translation
TranslateCommand->>TranslateFunction: Prepare content (anchors, terminology, title)
TranslateFunction->>OpenAI_Model: Send prompt with filtered terminology and title hint
OpenAI_Model-->>TranslateFunction: Return translated content
TranslateFunction->>TranslateCommand: Restore anchors, return translated content
TranslateCommand->>FileSystem: Write translated file, update frontmatter
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
This commit improves the documentation translation system by simplifying the translation process, reducing context overhead, and making translation results more stable. System Prompt Changes: * Convert system prompt from Chinese to English for better AI comprehension * Replace 4-step translation process with single-pass approach to improve output stability and reduce context * Remove target content comparison logic to simplify process and reduce translation context * Restructure prompt with clear baseline requirements and optional additional requirements section Smart Terms Filtering: * Only include terms that actually appear in source content * Support multi-language terms mapping (English, Chinese, Russian) Anchor Link Protection: * Introduce anchor preprocessing mechanism to prevent translation corruption of markdown anchor links * Replace anchors with numbered placeholders (__ANCHOR_N__) before translation * Restore original anchors after translation completion * Handle escaped underscores in anchor IDs to maintain compatibility with MDX processor Title Translation: * Replace i18n.title-based translation with built-in title translation mapping table * Implement automatic title correction through prompt enhancement when titles exist in mapping table * Extract first-level headings from content and apply predefined translations when available * Fallback to AI translation for unmapped titles while preserving consistency for common terms Frontmatter Cleanup: * Remove i18n fields from translated documents Content Processing: * Add support for copy-only files configuration, with all files under apis/ directory set to copy-only by default Technical Changes: * Upgrade model from gpt-4o-mini to gpt-4.1-mini * Update language constants from Chinese to English descriptions * Add title translation mapping table for common sections
commit: |
bcdf602 to
b3e6a54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
src/cli/translate.ts (1)
1-624: 💡 Verification agent🧩 Analysis chain
Address type coverage regression.
The pipeline indicates type coverage has dropped below the 100% target. Please run type coverage analysis locally to identify and fix any untyped code paths introduced by these changes.
🏁 Script executed:
#!/bin/bash # Check for type coverage issues in the modified file npx type-coverage --detail --file-coverage src/cli/translate.ts --strict --at-least 100 || trueLength of output: 19796
Type coverage dropped to 96.41% (target 100%)
The recent changes have introduced multiple untyped parameters across the codebase. Please restore 100% coverage by adding the missing type annotations:• Run the global check:
npx type-coverage --detail --strict --at-least 100 || true• Identify all “any” or untyped parameters (e.g., in TSX components, CLI handlers, plugin functions).
• Add explicit types to function parameters, React component props, and callback signatures.
• Re-run the coverage check to confirm 100% is met.🧰 Tools
🪛 Biome (1.9.4)
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 557-557: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 578-578: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
🪛 ESLint
[error] 7-7: Unable to resolve path to module '@rspress/shared'.
(import-x/no-unresolved)
[error] 8-8: Unable to resolve path to module '@rspress/shared/logger'.
(import-x/no-unresolved)
[error] 9-9: Unable to resolve path to module 'commander'.
(import-x/no-unresolved)
[error] 10-10: Unable to resolve path to module 'ejs'.
(import-x/no-unresolved)
[error] 11-11: Unable to resolve path to module 'gray-matter'.
(import-x/no-unresolved)
[error] 12-12: Unable to resolve path to module 'openai'.
(import-x/no-unresolved)
[error] 13-13: Unable to resolve path to module 'p-ratelimit'.
(import-x/no-unresolved)
[error] 15-15: Unable to resolve path to module 'yoctocolors'.
(import-x/no-unresolved)
🪛 GitHub Actions: CI
[error] 562-562: Type coverage rate (99.98%) is lower than the target (100%).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cli/translate.ts(8 hunks)src/shared/constants.ts(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cli/translate.ts (3)
src/shared/constants.ts (3)
Language(29-33)Language(35-35)TITLE_TRANSLATION_MAP(39-57)src/cli/helpers.ts (2)
parseTerms(70-70)escapeMarkdownHeadingIds(44-52)src/plugins/replace/normalize-img-src.ts (2)
NormalizeImgSrcOptions(13-25)normalizeImgSrc(27-158)
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 557-557: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 578-578: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
🪛 GitHub Actions: CI
src/cli/translate.ts
[error] 562-562: Type coverage rate (99.98%) is lower than the target (100%).
🔇 Additional comments (7)
src/shared/constants.ts (1)
29-57: LGTM! Clean implementation of language constants and title mappings.The conversion from Chinese to English language labels aligns well with the PR objectives, and the
TITLE_TRANSLATION_MAPprovides a good foundation for consistent title translations across supported languages.src/cli/translate.ts (6)
48-54: Good additions for Russian support and copy-only configuration.The inclusion of Russian in
TERMS_SUPPORTED_LANGUAGESaligns with the expanded language support, and theCOPY_ONLY_DIRECTORIESconfiguration appropriately excludes API documentation from translation.
56-110: Excellent rewrite of the system prompt with comprehensive translation rules.The English prompt is well-structured with clear baseline requirements covering MDX format preservation, link handling, technical terms, anchor placeholders, and escape character handling. The conditional sections for title translation and terminology are properly integrated.
121-162: Smart optimization of terminology resolution.The filtering logic effectively reduces the translation context by only including terms that actually appear in the source content. The case-insensitive regex matching with proper escaping ensures accurate term detection.
164-196: Well-implemented anchor preservation mechanism.The anchor placeholder system effectively protects anchor links during translation. The handling of escaped underscores and proper error checking in the restoration function ensures robustness.
490-519: Clean implementation of copy-only file handling.The logic properly preserves file content while updating metadata (sourceSHA) and removing translation-specific fields. This effectively supports the requirement to exclude certain directories from translation.
🧰 Tools
🪛 Biome (1.9.4)
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
198-220: Effective title translation implementation.The system properly extracts first-level headings, provides translation hints to the AI model, and applies predefined title mappings from
TITLE_TRANSLATION_MAP. The fallback logic ensures consistent title handling.Also applies to: 251-263, 566-579
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/cli/translate.ts (1)
556-558: 🛠️ Refactor suggestionRemove redundant sourceSHA assignment and fix delete operator usage.
Two issues here:
sourceSHAis assigned twice (redundant), and thedeleteoperator impacts performance.- const newFrontmatter = { ...sourceFrontmatter, sourceSHA } - delete newFrontmatter.i18n - newFrontmatter.sourceSHA = sourceSHA + const { i18n: _, ...newFrontmatter } = { ...sourceFrontmatter, sourceSHA }🧰 Tools
🪛 Biome (1.9.4)
[error] 557-557: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
🧹 Nitpick comments (1)
src/cli/translate.ts (1)
496-519: Consider optimization for copy-only files.Based on past review feedback, since copy-only files don't require translation, the
sourceSHAcalculation and comparison might be unnecessary overhead. Consider skipping the hash calculation for copy-only files if the goal is pure copying.Would you like me to propose an optimization that skips
sourceSHAcalculation for copy-only files to improve performance?🧰 Tools
🪛 Biome (1.9.4)
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cli/translate.ts(8 hunks)src/shared/constants.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/shared/constants.ts
🧰 Additional context used
🧠 Learnings (1)
src/cli/translate.ts (2)
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid OpenAI model name that can be used in Azure OpenAI API calls, as confirmed by OpenAI's official documentation.
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid and recently released OpenAI model (April 2025) available on Azure OpenAI Service. It's identified as "gpt-4.1-mini-2025-04-14" in Azure OpenAI Service and significantly outperforms GPT-4o while being more cost-effective and faster. It supports up to 1M tokens of context and has multimodal capabilities.
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 557-557: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 579-579: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
🔇 Additional comments (8)
src/cli/translate.ts (8)
23-27: LGTM! Clean import and constant additions.The new imports and constants properly support the enhanced translation features including Russian language support and copy-only directory handling.
Also applies to: 48-54
56-110: Excellent system prompt rewrite with comprehensive instructions.The English rewrite significantly improves clarity and includes detailed baseline requirements that should enhance translation accuracy and consistency. The conditional sections for title translation and terms allow for dynamic prompt enhancement.
121-162: Smart terms filtering implementation enhances translation accuracy.The dynamic filtering of terminology based on source content presence is an excellent improvement over static lists. The regex-based matching with proper escaping and case-insensitive search ensures reliable term detection.
164-220: Well-designed utility functions for anchor preservation and title translation.The anchor handling functions effectively prevent translation corruption by using placeholder replacement, and the title translation lookup provides consistency for common documentation sections. Good error handling and clear separation of concerns.
248-276: Enhanced translate function integrates new features effectively.The function now incorporates smart terminology filtering, title translation hints, and anchor preservation while maintaining a logical flow and backward compatibility.
290-293: Model upgrade and content handling improvements.The upgrade to gpt-4.1-mini leverages a more capable model, and using contentWithPlaceholders correctly implements the anchor preservation strategy.
395-402: Efficient copy-only files detection using glob patterns.The implementation correctly identifies copy-only files and uses a Set for efficient lookup during processing.
563-576: Robust title translation handling with fallback strategy.The implementation correctly handles both AI-translated titles from content and predefined title mappings from frontmatter, with appropriate fallback logic. This addresses the concern about frontmatter vs content title handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/cli/translate.ts (3)
502-502: Consider skipping sourceSHA for copy-only files to improve performance.Since copy-only files are not translated, calculating and storing the
sourceSHAmay be unnecessary overhead. Consider whether this hash is actually needed for copy-only operations.
502-503: 🛠️ Refactor suggestionReplace delete operator with destructuring for better performance.
The static analysis tool correctly identifies that the
deleteoperator can impact performance. Use destructuring to exclude thei18nproperty instead.- const newFrontmatter = { ...sourceFrontmatter, sourceSHA } - delete newFrontmatter.i18n + const { i18n: _, ...newFrontmatter } = { ...sourceFrontmatter, sourceSHA }🧰 Tools
🪛 Biome (1.9.4)
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
580-582: 🛠️ Refactor suggestionReplace delete operator with destructuring for consistency.
For consistency with other frontmatter handling and to address the static analysis performance concern, use destructuring instead of the delete operator.
- if (typeof newFrontmatter.title !== 'string') { - delete newFrontmatter.title - } + const finalFrontmatter = typeof newFrontmatter.title === 'string' + ? newFrontmatter + : (() => { + const { title: _, ...rest } = newFrontmatter + return rest + })()Then use
finalFrontmatterin thematter.stringifycall.🧰 Tools
🪛 Biome (1.9.4)
[error] 581-581: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/translate.ts(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/cli/translate.ts (2)
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid OpenAI model name that can be used in Azure OpenAI API calls, as confirmed by OpenAI's official documentation.
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid and recently released OpenAI model (April 2025) available on Azure OpenAI Service. It's identified as "gpt-4.1-mini-2025-04-14" in Azure OpenAI Service and significantly outperforms GPT-4o while being more cost-effective and faster. It supports up to 1M tokens of context and has multimodal capabilities.
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 503-503: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 581-581: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
🔇 Additional comments (10)
src/cli/translate.ts (10)
23-28: LGTM: Import changes support new features.The addition of
TITLE_TRANSLATION_MAPimport and other constants properly supports the new title translation and terminology features.
48-54: LGTM: Russian language support and copy-only directories added.The expansion to include Russian in
TERMS_SUPPORTED_LANGUAGESand theCOPY_ONLY_DIRECTORIESconstant align well with the PR objectives for multi-language support and selective file handling.
56-110: Excellent system prompt rewrite enhances translation quality.The rewritten English system prompt is comprehensive and well-structured. It includes detailed baseline requirements for preserving MDX format, links, technical terms, and proper handling of frontmatter. The dynamic placeholders for title translation and terminology injection provide context-specific guidance to the AI model.
121-162: Smart terms filtering improves translation context efficiency.The
resolveTermsfunction intelligently filters terms to only include those present in the source content with available translations for both languages. The case-insensitive regex matching and proper escaping are well-implemented.
164-196: Robust anchor preservation mechanism protects link integrity.The anchor handling functions provide a solid solution for preserving markdown anchors during translation. The escaped underscore handling for MDX compatibility and the numbered placeholder system with validation are well-designed.
198-220: Title extraction and translation functions are well-implemented.The
extractFirstLevelHeadingandgetTitleTranslationfunctions provide clean, focused functionality for title handling. The implementation correctly addresses the need for consistent title translations across languages.
243-304: Translate function properly integrates new features.The updates to the
translatefunction successfully integrate terms filtering, title translation hints, and anchor preservation. The model upgrade togpt-4.1-minienhances performance as noted in the PR objectives.
395-402: Copy-only directory logic is well-implemented.The copy-only file detection using glob patterns provides a clean way to handle files that should be copied verbatim instead of translated.
557-561: Frontmatter handling is improved but could use destructuring.The frontmatter handling correctly excludes the
i18nfield and preserves thesourceSHA. However, consider using destructuring instead of the delete operator for consistency.
569-578: Title translation from frontmatter is well-implemented.The code now properly handles title translation from frontmatter using the predefined translation map, addressing the previous review concern about frontmatter title consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/cli/translate.ts (3)
500-500: Replace delete operator for better performance.The static analysis tool correctly identifies that the
deleteoperator can impact performance.- const newFrontmatter = { ...sourceFrontmatter, sourceSHA } - delete newFrontmatter.i18n + const { i18n: _, ...newFrontmatter } = { ...sourceFrontmatter, sourceSHA }🧰 Tools
🪛 Biome (1.9.4)
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
566-575: Consider consolidating title translation logic.The title translation logic could be simplified by handling both frontmatter title and extracted heading title in a single location to avoid potential conflicts.
- if (sourceFrontmatter.title) { - const titleTranslation = getTitleTranslation( - sourceFrontmatter.title, - source, - target, - ) - if (titleTranslation) { - newFrontmatter.title = titleTranslation - } - } + // Apply title translation from frontmatter with priority over AI-translated title + if (sourceFrontmatter.title) { + const titleTranslation = getTitleTranslation( + sourceFrontmatter.title, + source, + target, + ) + if (titleTranslation) { + newFrontmatter.title = titleTranslation + } + } else if (typedData.title && typeof typedData.title === 'string') { + newFrontmatter.title = typedData.title + }
577-584: Simplify frontmatter title handling.The current approach for handling optional title field could be more concise using object destructuring.
- const finalFrontmatter = - typeof newFrontmatter.title === 'string' - ? newFrontmatter - : (() => { - // eslint-disable-next-line @typescript-eslint/no-unused-vars - const { title: _, ...rest } = newFrontmatter - return rest - })() + const finalFrontmatter = typeof newFrontmatter.title === 'string' + ? newFrontmatter + : (() => { + const { title: _, ...rest } = newFrontmatter + return rest + })()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/translate.ts(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/cli/translate.ts (2)
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid OpenAI model name that can be used in Azure OpenAI API calls, as confirmed by OpenAI's official documentation.
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid and recently released OpenAI model (April 2025) available on Azure OpenAI Service. It's identified as "gpt-4.1-mini-2025-04-14" in Azure OpenAI Service and significantly outperforms GPT-4o while being more cost-effective and faster. It supports up to 1M tokens of context and has multimodal capabilities.
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
🔇 Additional comments (11)
src/cli/translate.ts (11)
23-27: Good addition of title translation support.The import of
TITLE_TRANSLATION_MAPenhances the translation system with predefined title mappings, which should improve consistency and accuracy for common documentation sections.
48-48: Russian language support added correctly.The addition of 'ru' to
TERMS_SUPPORTED_LANGUAGESproperly extends terminology resolution to support Russian translations.
51-54: Well-designed copy-only directory configuration.The
COPY_ONLY_DIRECTORIESconstant provides a clean way to specify which directories should be copied verbatim instead of translated. The APIs directory is a logical choice for copy-only behavior.
56-110: Excellent system prompt rewrite for improved AI comprehension.The conversion from Chinese to English with detailed baseline requirements significantly improves clarity. The prompt includes comprehensive rules for:
- MDX format preservation
- Link integrity protection
- Technical term handling
- Frontmatter processing
- Anchor placeholder preservation
- Escape character handling
This structured approach should enhance translation accuracy and consistency.
121-162: Smart optimization for terminology resolution.The
resolveTermsfunction efficiently filters terms to include only those:
- Present in the source content (case-insensitive regex matching)
- Having translations for both source and target languages
This reduces context overhead and improves translation relevance. The regex escaping for special characters is properly implemented.
164-196: Robust anchor preservation mechanism.The anchor handling functions effectively:
- Replace anchors with numbered placeholders before translation
- Handle escaped underscores in anchor IDs for MDX compatibility
- Restore anchors after translation with proper validation
The error handling for invalid anchor indices adds good defensive programming.
198-220: Effective title translation implementation.The combination of
extractFirstLevelHeadingandgetTitleTranslationprovides a clean way to leverage predefined title mappings for consistent translations of common documentation sections.
248-267: Well-integrated new translation features.The updates to the translate function properly incorporate:
- Terms filtering based on source content
- Title translation prompt injection
- Anchor placeholder replacement/restoration
The conditional logic ensures features are only applied when relevant.
293-293: Model upgrade is valid.Based on the retrieved learnings,
gpt-4.1-miniis confirmed as a valid and recently released OpenAI model (April 2025) that offers significant performance improvements over GPT-4o while being more cost-effective.
396-402: Efficient copy-only file detection.The glob-based approach for identifying copy-only files is clean and performant. Using a Set for lookups in the main processing loop is the right optimization.
493-516: Well-implemented copy-only file handling.The copy-only logic properly:
- Updates sourceSHA for tracking changes
- Removes i18n field from frontmatter
- Preserves original content without translation
- Provides clear logging
The sourceSHA update ensures copy-only files can still be tracked for changes, which is important for incremental processing.
🧰 Tools
🪛 Biome (1.9.4)
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
Signed-off-by: JounQin <admin@1stg.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/cli/translate.ts (4)
198-220: Well-implemented title translation utilities.The functions are clean and focused. However, note that
extractFirstLevelHeadingonly extracts markdown headings (# syntax) and doesn't consider titles in frontmatter, which may miss some title translation opportunities.
493-516: Consider optimizing sourceSHA for copy-only files.Since copy-only files don't undergo translation, adding
sourceSHAmay be unnecessary and could be skipped to improve performance slightly.🧰 Tools
🪛 Biome (1.9.4)
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
574-576: 🛠️ Refactor suggestionReplace delete operator with conditional assignment for better performance.
Use conditional logic instead of the
deleteoperator for consistent performance optimization.Apply this diff:
- if (typeof newFrontmatter.title !== 'string') { - delete newFrontmatter.title - } + const finalFrontmatter = typeof newFrontmatter.title === 'string' + ? newFrontmatter + : (() => { + const { title: _, ...rest } = newFrontmatter + return rest + })()Then use
finalFrontmatterin thematter.stringifycall on line 578.🧰 Tools
🪛 Biome (1.9.4)
[error] 575-575: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
499-500: 🛠️ Refactor suggestionReplace delete operator with destructuring for better performance.
The static analysis correctly identifies performance issues with the
deleteoperator.Apply this diff to use destructuring instead:
- const newFrontmatter = { ...sourceFrontmatter, sourceSHA } - delete newFrontmatter.i18n + const { i18n: _, ...newFrontmatter } = { ...sourceFrontmatter, sourceSHA }🧰 Tools
🪛 Biome (1.9.4)
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/translate.ts(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/cli/translate.ts (2)
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid OpenAI model name that can be used in Azure OpenAI API calls, as confirmed by OpenAI's official documentation.
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid and recently released OpenAI model (April 2025) available on Azure OpenAI Service. It's identified as "gpt-4.1-mini-2025-04-14" in Azure OpenAI Service and significantly outperforms GPT-4o while being more cost-effective and faster. It supports up to 1M tokens of context and has multimodal capabilities.
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 500-500: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 554-554: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 575-575: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
🔇 Additional comments (5)
src/cli/translate.ts (5)
23-54: LGTM! Well-organized constants and imports.The new imports align with the translation system overhaul objectives, and the constants provide clear configuration for copy-only directories and supported terminology languages.
121-162: Excellent smart filtering implementation.The terminology resolution system is well-designed with proper regex escaping, case-insensitive matching, and filtering based on source content presence. This should significantly improve translation accuracy and reduce context overhead.
164-196: Robust anchor preservation mechanism.The anchor placeholder system effectively protects anchor links from translation corruption. The handling of escaped underscores for MDX compatibility and proper error handling for invalid indices are excellent design choices.
222-304: Well-integrated translation function updates.The function effectively integrates all new features - smart terminology filtering, title translation prompts, and anchor preservation. The model upgrade to
gpt-4.1-miniis appropriate based on its improved performance characteristics.
563-572: Excellent title translation integration.The title translation logic properly applies predefined mappings from the centralized translation table, which should improve consistency for common documentation sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cli/translate.ts (1)
501-501: Consider replacing delete operators for better performance.The static analysis tool correctly identifies that delete operators can impact performance. While this works functionally, consider using destructuring for consistency and performance:
-const newFrontmatter = { ...sourceFrontmatter, sourceSHA } -delete newFrontmatter.i18n +const { i18n: _, ...newFrontmatter } = { ...sourceFrontmatter, sourceSHA }However, since there's a past review comment indicating preference for the delete approach for simplicity, this can remain as-is if performance isn't a concern.
Also applies to: 555-555, 582-582
🧰 Tools
🪛 Biome (1.9.4)
[error] 501-501: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
docs/en/apis/advanced-apis/event/index.mdx(1 hunks)docs/en/apis/advanced-apis/event/search.mdx(1 hunks)docs/en/apis/advanced-apis/log/aggregation.mdx(1 hunks)docs/en/apis/advanced-apis/log/index.mdx(1 hunks)docs/en/apis/advanced-apis/log/search.mdx(1 hunks)docs/en/start.mdx(7 hunks)docs/en/usage/configuration.md(5 hunks)docs/zh/apis/advanced-apis/event/index.mdx(0 hunks)docs/zh/apis/advanced-apis/event/search.mdx(0 hunks)docs/zh/apis/advanced-apis/log/aggregation.mdx(0 hunks)docs/zh/apis/advanced-apis/log/index.mdx(0 hunks)docs/zh/apis/advanced-apis/log/search.mdx(0 hunks)docs/zh/usage/configuration.md(1 hunks)package.json(1 hunks)src/cli/translate.ts(8 hunks)
💤 Files with no reviewable changes (5)
- docs/zh/apis/advanced-apis/log/aggregation.mdx
- docs/zh/apis/advanced-apis/event/search.mdx
- docs/zh/apis/advanced-apis/event/index.mdx
- docs/zh/apis/advanced-apis/log/index.mdx
- docs/zh/apis/advanced-apis/log/search.mdx
✅ Files skipped from review due to trivial changes (6)
- docs/en/apis/advanced-apis/log/aggregation.mdx
- docs/en/apis/advanced-apis/event/search.mdx
- docs/en/apis/advanced-apis/log/index.mdx
- docs/en/apis/advanced-apis/log/search.mdx
- package.json
- docs/en/apis/advanced-apis/event/index.mdx
🧰 Additional context used
🧠 Learnings (1)
src/cli/translate.ts (2)
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid OpenAI model name that can be used in Azure OpenAI API calls, as confirmed by OpenAI's official documentation.
Learnt from: JounQin
PR: alauda/doom#82
File: src/cli/translate.ts:293-293
Timestamp: 2025-06-09T15:19:13.326Z
Learning: GPT-4.1-mini is a valid and recently released OpenAI model (April 2025) available on Azure OpenAI Service. It's identified as "gpt-4.1-mini-2025-04-14" in Azure OpenAI Service and significantly outperforms GPT-4o while being more cost-effective and faster. It supports up to 1M tokens of context and has multimodal capabilities.
🪛 Biome (1.9.4)
src/cli/translate.ts
[error] 501-501: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 555-555: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 582-582: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
🪛 LanguageTool
docs/en/start.mdx
[uncategorized] ~152-~152: A punctuation mark might be missing here.
Context: ...eview. ### Using Scaffolding Templates {#new} Run yarn new to generate projects, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
[typographical] ~212-~212: Consider adding a comma here.
Context: ...### Exporting PDF {#export} :::warning Please run the yarn build command before exp...
(PLEASE_COMMA)
docs/en/usage/configuration.md
[uncategorized] ~23-~23: Loose punctuation mark.
Context: ...# Basic Configuration {#basic} - lang: Default document language. To accommoda...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... set to null or undefined. - title: Document title, displayed on the browse...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~25-~25: Loose punctuation mark.
Context: ..., displayed on the browser tab. - logo: Logo at the top left of the document, s...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~26-~26: Loose punctuation mark.
Context: ...lt into the doom package. - logoText: Document title, displayed next to the l...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~27-~27: Loose punctuation mark.
Context: ...xt to the logo at the top left. - icon: Document favicon, defaults to the same ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~28-~28: Loose punctuation mark.
Context: ...defaults to the same as logo. - base: Base path of the document, used when de...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~29-~29: Loose punctuation mark.
Context: ...oduct-docs. Defaults to /. - outDir: Build output directory, defaults to di...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~85-~85: Loose punctuation mark.
Context: ...e ``` ### frontmatterMode - `ignore`: Ignore the frontmatter of the reference...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~86-~86: Loose punctuation mark.
Context: ...urrent document's frontmatter. - merge: Merge the frontmatter of the referenced...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...ride the current document's. - replace: Replace the current document's frontmat...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~88-~88: Loose punctuation mark.
Context: ...t of the referenced document. - remove: Remove the current document's frontmatt...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 20 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 18 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 22 on ubuntu-latest
- GitHub Check: Build and Lint with Node.js 24 on ubuntu-latest
🔇 Additional comments (13)
docs/zh/usage/configuration.md (1)
169-229: LGTM! Comprehensive translation system prompt documentation.The new system prompt documentation is well-structured and covers all the critical baseline requirements for accurate translation. The detailed rules for preserving links, handling MDX components, managing placeholders, and maintaining formatting align perfectly with the translation system overhaul described in the PR objectives.
The dynamic injection of
termsandtitleTranslationPromptparameters provides good flexibility for context-aware translations.docs/en/usage/configuration.md (1)
170-229: LGTM! Well-documented translation configuration.The English documentation mirrors the Chinese version perfectly and provides clear, comprehensive guidance for the translation system configuration. The baseline requirements are well-explained and the dynamic prompt injection features are properly documented.
docs/en/start.mdx (2)
241-259: Good addition of lint command documentation.The new lint command documentation section provides clear usage instructions and references to configuration details, improving the completeness of the CLI documentation.
5-77: LGTM! Improved documentation clarity.The section title improvements ("Getting Started", "Creating a Project", "CLI Tool") and various editorial enhancements make the documentation more readable and user-friendly.
src/cli/translate.ts (9)
49-49: Good addition of Russian language support.Adding Russian to the supported terminology languages expands the translation capabilities as intended by the PR objectives.
52-56: Well-implemented copy-only directories feature.The
COPY_ONLY_DIRECTORIESconstant provides a clean way to specify directories that should be copied verbatim without translation, which aligns with the PR's goal of adding copy-only file configurations.
57-111: Excellent system prompt rewrite.The new English system prompt is comprehensive and well-structured. The baseline requirements clearly define critical rules for preserving links, handling MDX format, managing placeholders, and maintaining formatting. The dynamic injection of
titleTranslationPromptandtermsprovides good flexibility for context-aware translations.
122-163: Smart terminology filtering implementation.The
resolveTermsfunction efficiently filters terminology based on:
- Presence of terms in source content (case-insensitive)
- Availability of translations in both source and target languages
This reduces context overhead and improves translation accuracy as intended by the PR objectives.
165-197: Robust anchor preservation mechanism.The anchor placeholder system effectively protects markdown anchor links from translation corruption by:
- Replacing anchors with numbered placeholders before translation
- Handling escaped underscores for MDX processor compatibility
- Restoring original anchors after translation
This addresses a key stability improvement mentioned in the PR objectives.
199-221: Good title translation mapping integration.The functions for extracting first-level headings and mapping title translations provide a clean way to handle predefined title translations, improving consistency as mentioned in the PR objectives.
294-294: Model update to gpt-4.1-mini is correct.Based on the retrieved learnings, gpt-4.1-mini is a valid and recently released OpenAI model that offers improved performance and cost-effectiveness compared to previous models.
396-517: Well-implemented copy-only file handling.The copy-only logic correctly:
- Identifies files matching copy-only directory patterns
- Updates frontmatter with sourceSHA but skips translation
- Maintains proper file structure and logging
This provides the selective copy functionality described in the PR objectives.
🧰 Tools
🪛 Biome (1.9.4)
[error] 501-501: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
518-597: Comprehensive translation logic enhancement.The enhanced translation workflow effectively:
- Handles anchor placeholders to preserve link integrity
- Applies title translations from the mapping table
- Properly manages frontmatter updates including removing i18n fields
- Maintains consistent logging and error handling
This addresses the core objectives of improving translation accuracy and stability.
🧰 Tools
🪛 Biome (1.9.4)
[error] 555-555: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 582-582: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
* Overhaul translation system for improved accuracy and stability This commit improves the documentation translation system by simplifying the translation process, reducing context overhead, and making translation results more stable. System Prompt Changes: * Convert system prompt from Chinese to English for better AI comprehension * Replace 4-step translation process with single-pass approach to improve output stability and reduce context * Remove target content comparison logic to simplify process and reduce translation context * Restructure prompt with clear baseline requirements and optional additional requirements section Smart Terms Filtering: * Only include terms that actually appear in source content * Support multi-language terms mapping (English, Chinese, Russian) Anchor Link Protection: * Introduce anchor preprocessing mechanism to prevent translation corruption of markdown anchor links * Replace anchors with numbered placeholders (__ANCHOR_N__) before translation * Restore original anchors after translation completion * Handle escaped underscores in anchor IDs to maintain compatibility with MDX processor Title Translation: * Replace i18n.title-based translation with built-in title translation mapping table * Implement automatic title correction through prompt enhancement when titles exist in mapping table * Extract first-level headings from content and apply predefined translations when available * Fallback to AI translation for unmapped titles while preserving consistency for common terms Frontmatter Cleanup: * Remove i18n fields from translated documents Content Processing: * Add support for copy-only files configuration, with all files under apis/ directory set to copy-only by default Technical Changes: * Upgrade model from gpt-4o-mini to gpt-4.1-mini * Update language constants from Chinese to English descriptions * Add title translation mapping table for common sections * fix typecov and improve title translation map. * fix issues found by coderabbit ai. * fix issues found be coderabbit ai. * revert to delete frontmatter fields method.
This commit improves the documentation translation system by simplifying the translation process, reducing context overhead, and making translation results more stable.
System Prompt Changes:
Smart Terms Filtering:
Anchor Link Protection:
Title Translation:
Frontmatter Cleanup:
Content Processing:
Technical Changes:
Summary by CodeRabbit