-
Notifications
You must be signed in to change notification settings - Fork 24
fix(sql): Add fallback to source_defined_primary_key in CatalogProvider #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(sql): Add fallback to source_defined_primary_key in CatalogProvider #627
Conversation
- Fix CatalogProvider.get_primary_keys() to fall back to stream.source_defined_primary_key when primary_key is empty/None - Add comprehensive unit tests covering all fallback scenarios - Resolves bug where destinations ignore source-defined primary keys when configured primary key is not set - Affects all destinations using CDK SQL processor base classes Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers:
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Looks like our new test suite assumes all Python connectors are sources. Since this is a destination and doesn't support 'discover', the Nothing we need to solve here in this PR and not related to our change.
|
/bump-version |
…y fix - Update airbyte-cdk dependency to use dev branch devin/1751064114-fix-primary-key-fallback - This branch contains the fix for primary key fallback logic in CatalogProvider - Allows end-to-end testing of the primary key fix with MotherDuck destination - References CDK PR: airbytehq/airbyte-python-cdk#627 Co-Authored-By: AJ Steers <aj@airbyte.io>
configured_stream = self.get_configured_stream_info(stream_name) | ||
pks = configured_stream.primary_key | ||
|
||
if not pks: | ||
return [] | ||
pks = configured_stream.stream.source_defined_primary_key | ||
if not pks: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be simpler:
pks = configured_stream.primary_key or configured_stream.stream.source_defined_primary_key or []
You'll have to wrap to make not overflow line length, but this version seems more concise and also more readable.
Address @aaronsteers feedback to use more concise approach: pks = configured_stream.primary_key or configured_stream.stream.source_defined_primary_key or [] This maintains exact same functionality while being more readable. Co-Authored-By: AJ Steers <aj@airbyte.io>
Fix primary key fallback in CatalogProvider for SQL destinations
Summary
This PR fixes a critical bug in the Airbyte Python CDK where the
CatalogProvider.get_primary_keys()
method ignores source-defined primary keys when configured primary keys are empty or None. This affects all destinations using the CDK's SQL processor base classes, not just MotherDuck.Root Cause: The CDK's
CatalogProvider.get_primary_keys()
method only usesConfiguredAirbyteStream.primary_key
and returns an empty list when it's falsy, without falling back tostream.source_defined_primary_key
as specified in the Airbyte protocol.Fix: Added fallback logic to use
stream.source_defined_primary_key
whenprimary_key
is empty/None, with comprehensive unit test coverage.Review & Testing Checklist for Human
source_defined_primary_key
whenprimary_key
is empty/Noneprimary_key
still work correctly (configured primary key should take precedence)configured_stream.stream.source_defined_primary_key
lookup doesn't cause performance issuesRecommended Test Plan
primary_key
but non-emptysource_defined_primary_key
Diagram
Notes
primary_key
inConfiguredAirbyteStream
overrides source-defined primary keys when set, andsource_defined_primary_key
provides the default from the source