Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

⚠️ BLOCKED BY: airbytehq/airbyte-python-cdk#859

This PR depends on unreleased CDK changes and cannot be merged until the CDK PR is merged and this connector is updated to use the new CDK version.

Link to Devin run: https://app.devin.ai/sessions/d5533ef9d14f4ec48f39aeb8eb80ed37

Requested by: Daryna (@DanyloGL)

Related issue: airbytehq/oncall#8683

What

Fixes the invoice_line_items incremental stream which was incorrectly emitting full invoice objects instead of individual line items.

The root cause: The incremental stream uses the Stripe events endpoint which returns invoice objects with nested lines.data arrays. The stream was emitting the entire invoice object instead of extracting and emitting each line item separately.

How

Uses new CDK functionality (expand_records_from_field and remain_original_record parameters in DpathExtractor) to:

  1. Extract nested line items: The expand_records_from_field: ["lines", "data"] parameter tells the extractor to expand the nested lines.data array and emit each item as a separate record
  2. Preserve parent context: The remain_original_record: true parameter embeds the parent invoice object in each expanded line item record under the original_record field
  3. Filter unexpanded records: Added record_filter to only emit records that have original_record field, preventing parent invoice objects from being emitted when lines.data is missing or empty
  4. Extract parent fields: Updated transformations to extract invoice metadata (invoice_id, invoice_created, invoice_updated) from the embedded original_record
  5. Clean up: Added RemoveFields transformation to remove the original_record field from the final output

Review guide

  1. manifest.yaml lines 1438-1451: Review the new record_selector configuration with expansion and filtering logic
    • Verify the filter syntax "{{ 'original_record' in record }}" is correct
    • Confirm this prevents emitting parent invoices when lines.data is missing/empty
  2. manifest.yaml lines 1457-1480: Review the updated transformations
    • Verify all references to original_record are safe (use .get() with defaults)
    • Confirm invoice_updated, invoice_id, and invoice_created are extracted correctly
    • Check that RemoveFields properly cleans up original_record
  3. Edge cases to consider:
    • Empty lines.data arrays (should emit nothing, not the parent invoice)
    • Missing lines.data field (should emit nothing, not the parent invoice)
    • invoice.deleted events (may not have line items)

User Impact

Positive:

  • Users will now receive individual line items in the invoice_line_items stream instead of full invoice objects
  • Incremental syncs will work correctly, emitting only line items from updated invoices
  • Each line item will include parent invoice metadata (invoice_id, invoice_created, invoice_updated)

Negative:

  • This is a breaking change in the data structure emitted by the stream
  • Users may need to update downstream transformations that expect invoice objects
  • Historical data synced before this fix will have a different structure

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Reason: This changes the fundamental data structure of the invoice_line_items stream. Reverting would cause inconsistency between historical and new data. If issues arise, a forward fix would be preferred.


Testing Notes

⚠️ Cannot test locally until CDK PR airbytehq/airbyte-python-cdk#859 is merged and this connector is updated to use the new CDK version.

Once the CDK dependency is available:

  1. Test with Stripe account that has invoices with multiple line items
  2. Verify incremental sync emits individual line items, not invoice objects
  3. Test edge cases: empty line items, deleted invoices, invoices without line items
  4. Verify schema compatibility with emitted line item structure

…ms incremental stream

- Override record_selector in incremental_stream retriever to add expansion
- Extract invoice objects from data.object path
- Expand lines.data array using expand_records_from_field parameter
- Preserve parent invoice context with remain_original_record
- Update transformations to extract invoice fields from original_record
- Add RemoveFields to clean up original_record after extracting needed fields
- Fixes issue where incremental sync emitted invoice objects instead of line items

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

Original prompt from API User
Comment from @DanyloGL: /ai-triage\n\nIMPORTANT: The user will expect a response posted back to the PR. You should post exactly one comment back to the respective issue PR. If the user requested a code change or PR, your comment should contain a link to the PR. Assume the user has no access to your session or conversation thread unless/until you respond back to them.\n\nIssue #8683 by @jnr0790: Python L3: Stripe - Missing data in `invoice_line_items` stream\n\nIssue URL: https://github.com/airbytehq/oncall/issues/8683\n\nPlease use playbook macro: !issue_triage

PLAYBOOK_md:
# `/ai-triage` Slash Command Playbook

You are AI Triage Devin, an expert at analyzing Airbyte-related issues and providing actionable insights. You are responding to a GitHub slash command request. After reading the provided context, you should post a comment to confirm you understand the request and stating what your next steps will be, along with a link to your session. Once your triage and analysis is complete, update your comment with the full results of your triage. Collapse all of your comments under expandable sections.

IMPORTANT: Expect that your user has no access to the session and cannot talk with you directly. Do not wait for feedback or confirmation on any action.

## Context

You are analyzing the issue provided to you above. You will need to pull comment history on this issue to ensure you have full context.

## Your Task: Static Analysis and Triage

1. **Issue Analysis and Confirmation**: Read the complete issue content including all comments for full context.
   - **Post an initial comment immediately** (within 1-2 minutes) to confirm you understand the assignment and that you are looking into it. Include your session URL.
   - If you are missing any critical information or context (e.g., workspace UUID, connector version, error logs, reproduction steps, customer environment details), include in your initial comment a request for additional context. (Do not block waiting for a... (9078 chars truncated...)

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
    • You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
    • Leaving the changelog arg blank will auto-populate the changelog from the PR title.
  • /run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
  • /run-live-tests - Runs live tests for the modified connector(s).
  • /run-regression-tests - Runs regression tests for the modified connector(s).
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).
  • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-dev.{git-sha}) for all modified connectors in the PR.
  • JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
    • /bump-bulk-cdk-version bump=patch changelog='foo' - Bump the Bulk CDK's version. bump can be major/minor/patch.
  • Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.

📝 Edit this welcome message.

… syntax

Update invoice_line_items incremental stream to use the new RecordExpander
class structure instead of direct parameters on DpathExtractor.

Changes:
- Move expand_records_from_field and remain_original_record into nested
  record_expander configuration
- Add explicit type: RecordExpander to enable proper component instantiation

This aligns with the CDK refactoring that extracted record expansion logic
into a dedicated RecordExpander component.

Co-Authored-By: unknown <>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

source-stripe Connector Test Results

209 tests   206 ✅  19m 12s ⏱️
  2 suites    3 💤
  2 files      0 ❌

Results for commit 7600302.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant