Skip to content

feat(parser): add built-in parser engine with deployment-level fallback and allow list#1474

Open
nixidexiangjiao wants to merge 1 commit into
Tencent:mainfrom
nixidexiangjiao:feat/built-in-parser
Open

feat(parser): add built-in parser engine with deployment-level fallback and allow list#1474
nixidexiangjiao wants to merge 1 commit into
Tencent:mainfrom
nixidexiangjiao:feat/built-in-parser

Conversation

@nixidexiangjiao
Copy link
Copy Markdown

Description

Add a declarative deployment-level fallback mechanism for parser engine configuration via config/builtin_parser_engine.yaml. When a tenant's parser engine config is absent, the system automatically falls back to the built-in config.

Additionally, introduce PARSER_ENGINE_ALLOW_LIST — a deployment-level env var that lets operators restrict which parser engines are visible and selectable, preventing users from configuring disallowed engines.

Key changes:

  • YAML loader (internal/types/builtin_parser_engine_config.go): Loads config/builtin_parser_engine.yaml at startup with ${ENV} and ${ENV:-default} placeholder interpolation, stored in an atomic.Pointer singleton. Supports default_engine, docreader_addr, mineru, mineru_cloud, and weknoracloud blocks.
  • Resolver layer (internal/types/parser_engine_resolve.go): ResolveXxxOverrides functions implement tenant → builtin fallback with package-level granularity (if tenant has MinerU endpoint set, the entire tenant MinerU block wins; otherwise the entire builtin MinerU block is used — no field-level mixing). MergeParserEngineOverrides is the one-stop API for handler/service code.
  • Allow list (internal/handler/parser_engine_allowlist.go): PARSER_ENGINE_ALLOW_LIST env var controls which engines are permitted. Empty env → all engines allowed (backwards compatible). Disallowed engines are marked Allowed=false in API responses and cannot be configured or tested via the UI.
  • Handler refactor (internal/handler/system.go): Replace scattered tenant config extraction with types.MergeParserEngineOverrides(tenant). Apply allow list to all ListParserEngines / CheckParserEngines / ReconnectDocReader responses. Add allowed_providers and default_engine fields to responses.
  • Service refactor (internal/application/service/knowledge.go, internal/handler/session/attachment_processor.go): Use types.MergeParserEngineOverrides instead of inline tenant config extraction.
  • DocReader address resolution (internal/types/parser_engine_resolve.go): ResolveDocReaderAddr adds a builtin.DocReaderAddr fallback when DOCREADER_ADDR env var is unset.
  • Frontend — allow list UI: ParserEngineSettings.vue and KBParserSettings.vue render Allowed=false engines with a distinct "Not allowed" badge and reason hint, disable save/test buttons for disallowed engines, and prevent the "go to settings" link from appearing for allow-list-blocked engines. default_engine from builtin YAML pre-selects the default engine per file-type group when available and allowed.
  • i18n: Added notAllowed / notAllowedHint strings for zh-CN, en-US, ko-KR, ru-RU.
  • API type (internal/types/docparser.go): Added Allowed bool field to ParserEngineInfo.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📚 Documentation update
  • 🎨 Refactor
  • ⚡ Performance improvement
  • 🧪 Test
  • 🔧 Configuration / Build / CI

Related Issue

Fixes #

Testing

All new test files pass with go test ./internal/types/ ./internal/handler/ ./internal/application/service/knowledge/ ./internal/handler/session/:

  • internal/types/builtin_parser_engine_config_test.go — File missing, invalid YAML, env interpolation (string + bool defaults), empty config, env override path, atomic replace
  • internal/types/parser_engine_resolve_test.go — Full matrix for ResolveMinerUOverrides / ResolveMinerUCloudOverrides (tenant-wins, builtin-fallback, both-empty, builtin-endpoint-empty-skip), ResolveWeKnoraCloudAppID (creds-win, builtin-fallback, neither-set), ResolveDocReaderAddr (env-wins, builtin-fallback, empty), MergeParserEngineOverrides integration
  • internal/handler/parser_engine_allowlist_test.go — Empty env → all allowed, comma list, mixed separators + case insensitivity, unknown names dropped, isParserEngineAllowed, firstAllowedParserEngine, allowedParserEnginesSorted, resolveDefaultParserEngine (no builtin, set+allowed, blocked by allow list, unknown name)
  • internal/handler/system_parser_engines_test.goListParserEngines with builtin MinerU endpoint enabling engine visibility, allow list blocking engine, default all-allowed behavior

Checklist

  • make fmt && make lint && make test pass locally
  • Self-reviewed the code
  • Added/updated tests covering the change
  • Updated related documentation (README, docs/, Swagger annotations, etc.)
  • Breaking changes are clearly called out in the description above

Screenshots / Recordings

UI changes: "Not allowed" badge on disallowed parser engines in both KB parser settings and system parser settings; disabled save/test buttons; "default engine" pre-selection respects builtin YAML + allow list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant