feat(parser): add built-in parser engine with deployment-level fallback and allow list#1474
Open
nixidexiangjiao wants to merge 1 commit into
Open
feat(parser): add built-in parser engine with deployment-level fallback and allow list#1474nixidexiangjiao wants to merge 1 commit into
nixidexiangjiao wants to merge 1 commit into
Conversation
…ck and allow list
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add a declarative deployment-level fallback mechanism for parser engine configuration via
config/builtin_parser_engine.yaml. When a tenant's parser engine config is absent, the system automatically falls back to the built-in config.Additionally, introduce
PARSER_ENGINE_ALLOW_LIST— a deployment-level env var that lets operators restrict which parser engines are visible and selectable, preventing users from configuring disallowed engines.Key changes:
internal/types/builtin_parser_engine_config.go): Loadsconfig/builtin_parser_engine.yamlat startup with${ENV}and${ENV:-default}placeholder interpolation, stored in anatomic.Pointersingleton. Supportsdefault_engine,docreader_addr,mineru,mineru_cloud, andweknoracloudblocks.internal/types/parser_engine_resolve.go):ResolveXxxOverridesfunctions implement tenant → builtin fallback with package-level granularity (if tenant has MinerU endpoint set, the entire tenant MinerU block wins; otherwise the entire builtin MinerU block is used — no field-level mixing).MergeParserEngineOverridesis the one-stop API for handler/service code.internal/handler/parser_engine_allowlist.go):PARSER_ENGINE_ALLOW_LISTenv var controls which engines are permitted. Empty env → all engines allowed (backwards compatible). Disallowed engines are markedAllowed=falsein API responses and cannot be configured or tested via the UI.internal/handler/system.go): Replace scattered tenant config extraction withtypes.MergeParserEngineOverrides(tenant). Apply allow list to allListParserEngines/CheckParserEngines/ReconnectDocReaderresponses. Addallowed_providersanddefault_enginefields to responses.internal/application/service/knowledge.go,internal/handler/session/attachment_processor.go): Usetypes.MergeParserEngineOverridesinstead of inline tenant config extraction.internal/types/parser_engine_resolve.go):ResolveDocReaderAddradds abuiltin.DocReaderAddrfallback whenDOCREADER_ADDRenv var is unset.ParserEngineSettings.vueandKBParserSettings.vuerenderAllowed=falseengines with a distinct "Not allowed" badge and reason hint, disable save/test buttons for disallowed engines, and prevent the "go to settings" link from appearing for allow-list-blocked engines.default_enginefrom builtin YAML pre-selects the default engine per file-type group when available and allowed.notAllowed/notAllowedHintstrings for zh-CN, en-US, ko-KR, ru-RU.internal/types/docparser.go): AddedAllowed boolfield toParserEngineInfo.Type of Change
Related Issue
Fixes #
Testing
All new test files pass with
go test ./internal/types/ ./internal/handler/ ./internal/application/service/knowledge/ ./internal/handler/session/:internal/types/builtin_parser_engine_config_test.go— File missing, invalid YAML, env interpolation (string + bool defaults), empty config, env override path, atomic replaceinternal/types/parser_engine_resolve_test.go— Full matrix forResolveMinerUOverrides/ResolveMinerUCloudOverrides(tenant-wins, builtin-fallback, both-empty, builtin-endpoint-empty-skip),ResolveWeKnoraCloudAppID(creds-win, builtin-fallback, neither-set),ResolveDocReaderAddr(env-wins, builtin-fallback, empty),MergeParserEngineOverridesintegrationinternal/handler/parser_engine_allowlist_test.go— Empty env → all allowed, comma list, mixed separators + case insensitivity, unknown names dropped,isParserEngineAllowed,firstAllowedParserEngine,allowedParserEnginesSorted,resolveDefaultParserEngine(no builtin, set+allowed, blocked by allow list, unknown name)internal/handler/system_parser_engines_test.go—ListParserEngineswith builtin MinerU endpoint enabling engine visibility, allow list blocking engine, default all-allowed behaviorChecklist
make fmt && make lint && make testpass locallydocs/, Swagger annotations, etc.)Screenshots / Recordings
UI changes: "Not allowed" badge on disallowed parser engines in both KB parser settings and system parser settings; disabled save/test buttons; "default engine" pre-selection respects builtin YAML + allow list.