Skip to content

fix(openlineage): accept custom producer URIs instead of returning 500#17015

Open
psaikaushik wants to merge 4 commits intodatahub-project:masterfrom
psaikaushik:fix/16961-custom-openlineage-producer
Open

fix(openlineage): accept custom producer URIs instead of returning 500#17015
psaikaushik wants to merge 4 commits intodatahub-project:masterfrom
psaikaushik:fix/16961-custom-openlineage-producer

Conversation

@psaikaushik
Copy link
Copy Markdown

Summary

The OpenLineage converter rejects events with custom producer URIs that don't match the hardcoded OpenLineage, Airflow, or Trino patterns, returning a 500 error. According to the OpenLineage spec, the producer field is a URI that can be any value.

Closes #16961

What Changed

OpenLineageToDataHub.java

  • Added extractOrchestratorFromProducerUri() helper that extracts the last non-empty path segment from a producer URI to use as the orchestrator name
  • In getOrchestrator(), added an else branch for unknown producers that calls the new helper instead of falling through to the RuntimeException
  • Existing behavior for OpenLineage, Airflow, and Trino producers is unchanged

OpenLineageOrchestratorTest.java (new)

  • 7 tests covering:
    • Custom producer URIs (the bug scenario from the issue)
    • Simple paths, trailing slashes
    • Known producers still work (OpenLineage, Airflow, Trino)
    • processingEngine takes precedence over producer

Before vs After

Before:

POST /openapi/openlineage/api/v1/lineage
{"producer": "https://github.com/myorg/myproducer/blob/v1/client", ...}

=> 500 Internal Server Error (RuntimeException: Unable to determine orchestrator)

After:

POST /openapi/openlineage/api/v1/lineage
{"producer": "https://github.com/myorg/myproducer/blob/v1/client", ...}

=> 200 OK (orchestrator = "client", extracted from last path segment)

Add tests covering:
- Custom producer URIs that don't match OpenLineage/Airflow/Trino
- Known producers still work as before (OpenLineage, Airflow, Trino)
- Edge cases (trailing slash, simple path)
- processingEngine override behavior

Ref datahub-project#16961
The getOrchestrator method used a hardcoded regex that only matched
OpenLineage, Airflow, and Trino producer URIs. Any other producer
value caused a RuntimeException (HTTP 500), violating the OpenLineage
spec which allows any URI as the producer field.

Added extractOrchestratorFromProducerUri() as a fallback that extracts
the last path segment from unknown producer URIs to use as the
orchestrator name. This preserves existing behavior for known producers
while gracefully handling custom ones.

Closes datahub-project#16961
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Linear: CAT-1795

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

DatahubOpenlineageConfig uses Lombok @builder and has no public
constructor. Changed from new DatahubOpenlineageConfig() to
DatahubOpenlineageConfig.builder().build().
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Bundle Report

Changes will increase total bundle size by 71.83kB (0.32%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 22.81MB 71.83kB (0.32%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js 469 bytes 12.49MB 0.0%
assets/aerospike-*.png (New) 71.36kB 71.36kB 100.0% 🚀

- Replace `var` with explicit DataFlowUrn type (Java 11 compat)
- Move test to correct package (io.datahubproject.openlineage)
  matching existing test files like OpenLineageConfigTest
- Use proper imports (URI, DataFlowUrn, FabricType)
- Add FabricType.PROD to builder to match existing test patterns
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@psaikaushik
Copy link
Copy Markdown
Author

psaikaushik commented Apr 14, 2026

Looks like the failed cypress batch tests are already flaky as per #16839 thanks! this PR should not be triggering Cypress tests. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenLineage with custom producer not working

2 participants