feat(search): enable V3 entity index as dual search target alongside V2#17013
Open
loustler wants to merge 1 commit intodatahub-project:masterfrom
Open
feat(search): enable V3 entity index as dual search target alongside V2#17013loustler wants to merge 1 commit intodatahub-project:masterfrom
loustler wants to merge 1 commit intodatahub-project:masterfrom
Conversation
Contributor
|
Linear: PFP-3345 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Bundle ReportBundle size has no change ✅ |
1947dd4 to
421718f
Compare
421718f to
ffabf60
Compare
Add V3 shared entity index patterns to all five search query paths (search, filter, autocomplete, aggregate, scroll) so V3-indexed data is discoverable. Apply _entityType filter to restrict V3 results to requested entity types. Add .keyword subfield to V3 keyword mappings for aggregation compatibility. Support filesystem-first config loading for external analyzer configurations (e.g., Kubernetes ConfigMap mounts).
ffabf60 to
1781124
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DataHub's V3 entity index is currently write-only — data is indexed into V3 during ingestion and metadata changes, but search queries only target V2 per-entity indices. This means V3-only features (unified scoring, tiered search fields, custom analyzers) are not available to end users.
This PR enables the V3 shared entity index as a search target alongside existing V2 indices, making V3 data discoverable while maintaining full backward compatibility with V2. Every search query now targets both
<entity>index_v2and*index_v3patterns simultaneously.Changes
Existing Behavior — Modified
Search queries previously targeted only V2 per-entity indices (e.g.,
datasetindex_v2,chartindex_v2).→ Now target both V2 and V3 indices via
Stream.concat(v2Patterns, indexConvention.getV3EntityIndexPatterns().stream()). This applies to all five search paths:buildSearchRequest()filter()buildAutocompleteRequest()buildAggregateByValue()buildScrollRequest()New Behavior — Added
Entity type filter (
applyEntityTypeFilter()): Since V3 is a shared multi-entity index, queries must be wrapped with an entity type filter to prevent cross-entity contamination. The filter uses aboolquery with twoshouldbranches:must_not exists _entityType— lets V2 documents (which lack this field) pass throughterms _entityType [entityNames]— restricts V3 documents to requested typesEntity names must use camelCase from
EntitySpec.getName()(e.g.,glossaryNode,corpUser) to match V3's stored_entityTypevalues. Lowercase names produce 0 matches on V3.V3 keyword subfield (
MultiEntityMappingsBuilder): Added.keywordsubfield for keyword-type fields in V3 mappings so aggregation queries likeowners.keywordwork consistently across V2 and V3 indices during the transition period.Filesystem config loading (
BaseConfigurationLoader): Added filesystem path fallback before classpath resource lookup. This enables loading external analyzer configurations (e.g., nori, kuromoji) from Kubernetes ConfigMap mounts at paths like/etc/datahub/analyzer-config.yamlwithout requiring them to be on the Java classpath.Key Files Modified
metadata-io/.../ESSearchDAO.javaapplyEntityTypeFilter()metadata-io/.../MultiEntityMappingsBuilder.java.keywordsubfield for V3 keyword fieldsmetadata-io/.../BaseConfigurationLoader.javametadata-io/.../BaseConfigurationLoaderTest.javaConfiguration
No new configuration required. V3 index patterns are derived from the existing
IndexConvention.getV3EntityIndexPatterns()which is controlled by theELASTICSEARCH_ENTITY_INDEX_V3_ENABLEDenvironment variable. If V3 is not enabled, the V3 pattern matches no indices and queries behave identically to before.Migration Notes
Checklist
BaseConfigurationLoaderTestfor filesystem path loading🤖 Generated with Claude Code