fix(openlineage): accept custom producer URIs instead of returning 500#17015
Open
psaikaushik wants to merge 4 commits intodatahub-project:masterfrom
Open
fix(openlineage): accept custom producer URIs instead of returning 500#17015psaikaushik wants to merge 4 commits intodatahub-project:masterfrom
psaikaushik wants to merge 4 commits intodatahub-project:masterfrom
Conversation
Add tests covering: - Custom producer URIs that don't match OpenLineage/Airflow/Trino - Known producers still work as before (OpenLineage, Airflow, Trino) - Edge cases (trailing slash, simple path) - processingEngine override behavior Ref datahub-project#16961
The getOrchestrator method used a hardcoded regex that only matched OpenLineage, Airflow, and Trino producer URIs. Any other producer value caused a RuntimeException (HTTP 500), violating the OpenLineage spec which allows any URI as the producer field. Added extractOrchestratorFromProducerUri() as a fallback that extracts the last path segment from unknown producer URIs to use as the orchestrator name. This preserves existing behavior for known producers while gracefully handling custom ones. Closes datahub-project#16961
Contributor
|
Linear: CAT-1795 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
DatahubOpenlineageConfig uses Lombok @builder and has no public constructor. Changed from new DatahubOpenlineageConfig() to DatahubOpenlineageConfig.builder().build().
Bundle ReportChanges will increase total bundle size by 71.83kB (0.32%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
|
- Replace `var` with explicit DataFlowUrn type (Java 11 compat) - Move test to correct package (io.datahubproject.openlineage) matching existing test files like OpenLineageConfigTest - Use proper imports (URI, DataFlowUrn, FabricType) - Add FabricType.PROD to builder to match existing test patterns
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Author
|
Looks like the failed cypress batch tests are already flaky as per #16839 thanks! this PR should not be triggering Cypress tests. thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The OpenLineage converter rejects events with custom producer URIs that don't match the hardcoded OpenLineage, Airflow, or Trino patterns, returning a 500 error. According to the OpenLineage spec, the producer field is a URI that can be any value.
Closes #16961
What Changed
OpenLineageToDataHub.javaextractOrchestratorFromProducerUri()helper that extracts the last non-empty path segment from a producer URI to use as the orchestrator namegetOrchestrator(), added anelsebranch for unknown producers that calls the new helper instead of falling through to theRuntimeExceptionOpenLineageOrchestratorTest.java(new)Before vs After
Before:
After: