Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 1, 2025

Problem

Airbyte metadata ingestion pipeline filters were only working with internal UUIDs (connectionId) instead of human-readable connection names. This made regex-based filtering impractical because:

  • UUIDs are environment-specific and change on re-deployment
  • UUIDs cannot be meaningfully matched with regex patterns
  • Users see connection names in both Airbyte and OpenMetadata UI, but had to extract UUIDs from connection URLs to filter them

Example of the Issue

# Users wanted to filter like this:
pipelineFilterPattern:
  includes:
    - "MSSQL.*"      # ❌ Didn't work - entity was named with UUID
    - ".*Postgres"   # ❌ Didn't work - entity was named with UUID

# Only this worked:
pipelineFilterPattern:
  includes:
    - "a10f6d82-4fc6-4c90-ba04-bb773c8fbb0f"  # ✓ Worked but impractical

Solution

Changed the pipeline entity creation to use the connection name (what users see in the UI) instead of the connectionId (internal UUID).

Before:

pipeline_request = CreatePipelineRequest(
    name=EntityName(pipeline_details.connection.get("connectionId")),  # UUID
    displayName=pipeline_details.connection.get("name"),  # Human-readable
    ...
)

After:

pipeline_request = CreatePipelineRequest(
    name=EntityName(pipeline_details.connection.get("name")),  # Human-readable
    displayName=pipeline_details.connection.get("name"),  # Human-readable
    ...
)

Benefits

✅ Users can now filter using intuitive patterns:

  • MSSQL.* - matches all pipelines starting with "MSSQL"
  • .*Postgres - matches all pipelines ending with "Postgres"
  • Production.* - matches all production pipelines

✅ No need to extract UUIDs from Airbyte connection URLs

✅ Entity name matches what users see in OpenMetadata UI

✅ Filtering works consistently across environments (names are stable, UUIDs change)

✅ Aligns with how other pipeline sources handle naming (Fivetran, Databricks, Airflow)

Backward Compatibility

⚠️ Breaking Change: This changes how Airbyte pipeline entities are named in OpenMetadata.

Existing Airbyte pipelines will retain their UUID-based names. After upgrading:

  1. Re-ingest Airbyte connections to create new name-based entities
  2. Update filter patterns from UUID patterns to name-based patterns
  3. Optionally clean up old UUID-named pipeline entities

Testing

  • Updated unit tests to reflect new behavior
  • All existing test scenarios pass with updated expectations
  • Manual verification confirms regex filtering now works with human-readable names

Fixes #[issue-number]

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] Airbyte metadata ingestion pipeline - filtering the internal ID instead of pipeline name</issue_title>
<issue_description>Affected module
Airbyte connector or/and Ingestion Framework

Describe the bug
The Pipeline Filter (Include, Exclude) in Airbyte Metadata Ingestion does filter the Airbyte internal IDs, not the pipeline name (connection name in Airbyte terminology).
This is useless, internal IDs cannot be defined by regexps and changes on every environments or re-deploy.

To Reproduce

Filtering by Pipeline names does not work.
Filtering works by ID strings like 'd3720495-8668-432d-b5cf-cc350b94af2b', you can extract them from Airbyte connection URL

Expected behavior
Using Airbyte pipeline name for filtering is necessary. The name is retrieved by Metadata Ingestion already, OpenMetadata use it as the pipeline name in UI.

Version:

  • OS: Linux
  • Python version:
  • OpenMetadata version: 1.5.5
  • OpenMetadata Ingestion package version: docker.getcollate.io/openmetadata/ingestion:1.5.5
    </issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #18224

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] [BUG] Airbyte metadata ingestion pipeline - filtering the internal ID instead of pipeline name Fix Airbyte pipeline filtering to use connection names instead of UUIDs Oct 1, 2025
@Copilot Copilot AI requested a review from harshach October 1, 2025 17:34
Copilot finished work on behalf of harshach October 1, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Airbyte metadata ingestion pipeline - filtering the internal ID instead of pipeline name
2 participants