Fix Airbyte pipeline filtering to use connection names instead of UUIDs #23673

Copilot · 2025-10-01T17:17:40Z

Problem

Airbyte metadata ingestion pipeline filters were only working with internal UUIDs (connectionId) instead of human-readable connection names. This made regex-based filtering impractical because:

UUIDs are environment-specific and change on re-deployment
UUIDs cannot be meaningfully matched with regex patterns
Users see connection names in both Airbyte and OpenMetadata UI, but had to extract UUIDs from connection URLs to filter them

Example of the Issue

# Users wanted to filter like this:
pipelineFilterPattern:
  includes:
    - "MSSQL.*"      # ❌ Didn't work - entity was named with UUID
    - ".*Postgres"   # ❌ Didn't work - entity was named with UUID

# Only this worked:
pipelineFilterPattern:
  includes:
    - "a10f6d82-4fc6-4c90-ba04-bb773c8fbb0f"  # ✓ Worked but impractical

Solution

Changed the pipeline entity creation to use the connection name (what users see in the UI) instead of the connectionId (internal UUID).

Before:

pipeline_request = CreatePipelineRequest(
    name=EntityName(pipeline_details.connection.get("connectionId")),  # UUID
    displayName=pipeline_details.connection.get("name"),  # Human-readable
    ...
)

After:

pipeline_request = CreatePipelineRequest(
    name=EntityName(pipeline_details.connection.get("name")),  # Human-readable
    displayName=pipeline_details.connection.get("name"),  # Human-readable
    ...
)

Benefits

✅ Users can now filter using intuitive patterns:

MSSQL.* - matches all pipelines starting with "MSSQL"
.*Postgres - matches all pipelines ending with "Postgres"
Production.* - matches all production pipelines

✅ No need to extract UUIDs from Airbyte connection URLs

✅ Entity name matches what users see in OpenMetadata UI

✅ Filtering works consistently across environments (names are stable, UUIDs change)

✅ Aligns with how other pipeline sources handle naming (Fivetran, Databricks, Airflow)

Backward Compatibility

⚠️ Breaking Change: This changes how Airbyte pipeline entities are named in OpenMetadata.

Existing Airbyte pipelines will retain their UUID-based names. After upgrading:

Re-ingest Airbyte connections to create new name-based entities
Update filter patterns from UUID patterns to name-based patterns
Optionally clean up old UUID-named pipeline entities

Testing

Updated unit tests to reflect new behavior
All existing test scenarios pass with updated expectations
Manual verification confirms regex filtering now works with human-readable names

Fixes #[issue-number]

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] Airbyte metadata ingestion pipeline - filtering the internal ID instead of pipeline name</issue_title>
<issue_description>Affected module
Airbyte connector or/and Ingestion Framework

Describe the bug
The Pipeline Filter (Include, Exclude) in Airbyte Metadata Ingestion does filter the Airbyte internal IDs, not the pipeline name (connection name in Airbyte terminology).
This is useless, internal IDs cannot be defined by regexps and changes on every environments or re-deploy.

To Reproduce

Filtering by Pipeline names does not work.
Filtering works by ID strings like 'd3720495-8668-432d-b5cf-cc350b94af2b', you can extract them from Airbyte connection URL

Expected behavior
Using Airbyte pipeline name for filtering is necessary. The name is retrieved by Metadata Ingestion already, OpenMetadata use it as the pipeline name in UI.

Version:

OS: Linux

Python version:

OpenMetadata version: 1.5.5

OpenMetadata Ingestion package version: docker.getcollate.io/openmetadata/ingestion:1.5.5
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #18224

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

Initial plan

66bf3f9

Copilot AI assigned Copilot and harshach Oct 1, 2025

Copilot started work on behalf of harshach October 1, 2025 17:17 View session

Fix Airbyte pipeline filtering to use connection name instead of UUID

98c3ece

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] [BUG] Airbyte metadata ingestion pipeline - filtering the internal ID instead of pipeline name~~ Fix Airbyte pipeline filtering to use connection names instead of UUIDs Oct 1, 2025

Copilot AI requested a review from harshach October 1, 2025 17:34

Copilot finished work on behalf of harshach October 1, 2025 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Airbyte pipeline filtering to use connection names instead of UUIDs #23673

Fix Airbyte pipeline filtering to use connection names instead of UUIDs #23673

Uh oh!

Copilot AI commented Oct 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix Airbyte pipeline filtering to use connection names instead of UUIDs #23673

Are you sure you want to change the base?

Fix Airbyte pipeline filtering to use connection names instead of UUIDs #23673

Uh oh!

Conversation

Copilot AI commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Example of the Issue

Solution

Benefits

Backward Compatibility

Testing

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Copilot AI commented Oct 1, 2025 •

edited

Loading