Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨🚨 Migrate source-zendesk-talk to low code #35780

Merged
merged 47 commits into from
May 13, 2024

Conversation

ambirdsall
Copy link
Contributor

@ambirdsall ambirdsall commented Mar 3, 2024

What

Converts the source-zendesk-talk source to a manifest-based implementation for the low-code CDK. While migrating, I found a logic error with the record parsing of the ivr_routes stream, which actually returned records based on the parent menus objects. Closes https://github.com/airbytehq/airbyte-internal-issues/issues/6395

How

  1. Defined global properties and at least partial implementations of all streams in the Connector Builder UI
  2. Exported the generated YAML to a local file
  3. Fixed up some basic inconsistencies and minor errors existing at the time I decided I needed to stop using the Builder: nullable fields in the schemas, removing a partial implementation of ivr_routes and ivr_menus record parsing via the transformations key (which would have been technically possible, but extremely tedious to write and maintain), etc
  4. Added a new source class, SourceZendeskTalkTwo, alongside the existing python definition along with script definitions, a test catalog for each individual stream, and a testing shell script to run parallel reads of any stream for each. This let me easily compare the results of spec and read calls (by examining record counts and by pasting output into https://jsondiff.com/ to compare the structure of specs and records) to find and fix inconsistencies
  5. Added custom components defining the custom record parsing for ivr_menus and ivr_routes; while doing so I realized that the existing implementation of ivr_routes was incorrect (see below for more details)
  6. Added a custom authentication component. I chose not to use SelectiveAuthenticator to dispatch to the different possible authenticators because I did not see a way to implement both of the existing ways credentials may be fetched from the config for BasicHttpAuthenticator (this might be possible to implement with SelectiveAuthenticator, but reading the existing code and documentation I did not see how), and for development speed (there are zero existing usages available for reference).
  7. Replaced the unit tests for the old python connector implementation with unit tests for the new custom components
  8. Deleted the python definition and its supporting files, renamed the low-code implementation SourceZendeskTalk, and deleted the testing scripts and catalogs.

A bit more on the bug with ivr_routes

ivrs, ivr_menus, and ivr_routes all use the same endpoint and response data, but the latter two parse out child records from arrays within that response: each ivr contains a list of menus, and each menu contains a list of routes. Here is the old record parsing code from ivr_routes:

    def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
        """Simply parse json and iterates over root object"""
        ivrs = super().parse_response(response=response, **kwargs)
        for ivr in ivrs:
            for menu in ivr["menus"]:
                for route in ivr["menus"]:
                    yield {"ivr_id": ivr["id"], "ivr_menu_id": menu["id"], **route}

Notice that for menu in ivr["menus"]: and for route in ivr["menus"]: both iterate over the parent menus field; the source never iterated through the list of routes within each menu, meaning its records contained extraneous, logically-duplicated data and that its record counts did not accurately reflect the number of route objects returned by the API. I updated the code to actually return one ivr_routes record per object defined in some menu's routes array.

For the testing account whose credentials are stored in Airbyte's company LastPass, there were 52 ivrs with one menu each, but only 11 routes—most menus contained an empty routes array. After this fix, the stream returns the expected data given the actual response JSON.

Schema discrepancies between the old and new versions

  • a new average_transfers_only_time field was added to the agents_overview stream
  • a new transfers_only_time field was added to agents_activity
  • a new capabilities.emergency_address field was added to phone_numbers
  • a new exceeded_queue_wait_time field added to calls
  • removed ivrs.ivr_id, which does not exist in the API (ivr_id only exists as an Airbyte-side addition to the ivr_menus/ivr_routes "child" streams so users have a reference to the parent object they were extracted from; ivrs themselves just use id, which was already in the schema)

🚨 User Impact 🚨

Breaking changes:

  • state changes: the API accepts cursor field filtering input as unix timestamps, but records output datetime strings. The python connector had handrolled code which would always emit record-format-style datetime strings in its state messages, but the low-code CDK normalizes on the query format instead. The python connector errors if given a unix timestamp as its initial state
  • removed the ivr_id field from the ivrs stream schema: it's not part of the record and should not have been in the definition in the first place

Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.*

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

Copy link

vercel bot commented Mar 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 9, 2024 10:24am

@octavia-squidington-iv octavia-squidington-iv requested a review from a team March 3, 2024 22:57
@ambirdsall ambirdsall force-pushed the amb/migrate-source-zendesk-talk-to-low-code branch 4 times, most recently from b61f3ee to 0f1e57b Compare March 4, 2024 07:07

definitions:
authenticator:
class_name: source_zendesk_talk.components.ZendeskTalkAuthenticator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use the SelectiveAuthenticator component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did of course look at that component, but I chose not to use it because I did not see a way to define both the legacy and updated config structure used with BasicHttpAuthenticator the way the python connector did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to change the config structure and make a breaking change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like this idea because if we need to revert, we can't. With such a big change as changing the implementation almost completely, it seems rather risky. I know there is a concern about velocity regarding this as well and I'm not sur how to balance that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need to revert, we can't. With such a big change as changing the implementation almost completely, it seems rather risky

This was my thinking as well when I opted to dump the existing nonstandard authentication logic into a custom component instead of just keeping the two non-legacy auth options, which would be possible to do in the manifest with SelectiveAuthenticator—though given the sparse documentation and lack of existing usages in the repo to use as reference, it might also be a bit slower to build it that way.

@octavia-squidington-iv octavia-squidington-iv requested a review from a team March 4, 2024 14:40
@ambirdsall ambirdsall force-pushed the amb/migrate-source-zendesk-talk-to-low-code branch from 0f1e57b to 8d0f073 Compare March 4, 2024 17:29
@natikgadzhi
Copy link
Contributor

Thank you for the detailed writeup of the steps you took to do it. Very interesting! /cc @girarda.

  • It's telling that you chose to implement your own script to compare the output of new and old. That's now how I assumed that should go — perhaps that was easy because regression testing was not ready yet, but this is what regression testing is for.
  • It's explicitly OK to break config compatibility — I think we have a thing to implement config migrations. But we can also choose to have custom component if we know there are a lot of connections using this connector today, and the migration will be painful. Let's check the connections count If cloud workspaces are under 10, let's make the breaking change.

Let's run regression tests on this and get this merged quickly! @clnoll another candidate for testing here :D

@lazebnyi
Copy link
Collaborator

lazebnyi commented Mar 4, 2024

  • It's explicitly OK to break config compatibility — I think we have a thing to implement config migrations. But we can also choose to have custom component if we know there are a lot of connections using this connector today, and the migration will be painful. Let's check the connections count If cloud workspaces are under 10, let's make the breaking change.

From my perspective, I definitely vote for configuration migration + custom component.

@natikgadzhi Btw, would it be better to calculate the duration of using Airbyte or may be even sync data instead of workspaces qty? Within those 10 workspaces, there could be lightweight connections with newcomers and small amounts of data, as well as heavyweight synchronization processes and users who have been using Airbyte for a long time. Resetting connections or data could be quite painful for them.

@natikgadzhi
Copy link
Contributor

@maxi297, @ambirdsall, I get that CAT output can be scary — a session to get this over the finish line, and coach Alex on how to drive a connector migration through completion would be great 👀

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the migration seems fine but the devil is in the details!


definitions:
authenticator:
class_name: source_zendesk_talk.components.ZendeskTalkAuthenticator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like this idea because if we need to revert, we can't. With such a big change as changing the implementation almost completely, it seems rather risky. I know there is a concern about velocity regarding this as well and I'm not sur how to balance that

@maxi297
Copy link
Contributor

maxi297 commented Mar 7, 2024

@maxi297, @ambirdsall, I get that CAT output can be scary — a session to get this over the finish line, and coach Alex on how to drive a connector migration through completion would be great 👀

There is a log mentioning Request to https://d3v-airbyte.zendesk.com/api/v2/channels/voice/addresses failed with status code 401 and error message None using config.json. Locally, I don't have this issue but it seems like CATs were fixed recently here. We can see the error is different but maybe @askarpets fixed this one as well. I would suggest updating the branch and trying again. If it does not work, we can investigate further

@ambirdsall ambirdsall force-pushed the amb/migrate-source-zendesk-talk-to-low-code branch 2 times, most recently from 2b4d5c2 to 39e9b58 Compare March 7, 2024 23:20
@ambirdsall ambirdsall force-pushed the amb/migrate-source-zendesk-talk-to-low-code branch 2 times, most recently from e5938e7 to c0409b2 Compare March 8, 2024 19:10
@ambirdsall
Copy link
Contributor Author

Notes based on prior feedback:

  • pagination and incremental field correctness issues fixed
  • CATs passing
  • schemas compared with a fine-toothed comb; all discrepancies are either fixed or are themselves fixes for errors in the prior definition.

@ambirdsall ambirdsall requested a review from maxi297 March 12, 2024 17:37
@maxi297
Copy link
Contributor

maxi297 commented Mar 12, 2024

adding fields

That's all good

ivrs.greeting_id was moved to ivrs.menus.greeting_id

I'm a bit confused as it seems like greeting_id was already under ivrs.menus (see the deleted schema). Do you mean it was moved to ivrs.greeting_id (that is my best assumption based on the fact that the field seems on the root of ivrs now). This seems to be a breaking change. But assuming the field was never populated, I'm fine with that.

removed ivrs.ivr_id

It seems like it was never populated anyway so I'm fine with that

@ambirdsall
Copy link
Contributor Author

ambirdsall commented Mar 12, 2024

it seems like greeting_id was already under ivrs.menus (see the deleted schema). Do you mean it was moved to ivrs.greeting_id

@maxi297 I am honestly not sure how I came to make that mistake; it should indeed be (and now is) inside ivr.menus.

@katmarkham katmarkham self-requested a review March 13, 2024 17:18
@ambirdsall
Copy link
Contributor Author

this seems to be sitting for more than a week idle

@natikgadzhi I hope I don't come off too defensive here—I agree wholeheartedly with your prioritization of velocity—but I pushed a fix and replies to the corresponding review comments 3 days ago and have since had to take a sick day. I haven't been satisfied with the time to close this out either, but it has not just been idleness.

In any case, those breaking change updates are now in place, which I hope clears the way for getting the PR approved shortly and ready to ship along with the other breaking migrations.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change, thanks for addressing the comments! @bazarnov I would like you to check again so we can have an approval from API Sources as well

docs/integrations/sources/zendesk-talk.md Outdated Show resolved Hide resolved
@alafanechere alafanechere added the low-code-migration This connector has been migrated to the low-code CDK label Apr 3, 2024
@alafanechere alafanechere self-assigned this Apr 4, 2024
@lazebnyi lazebnyi assigned lazebnyi and unassigned maxi297 and alafanechere Apr 17, 2024
breakingChanges:
1.0.0:
upgradeDeadline: "2024-05-31"
message: "The source Zendesk Talk connector is being migrated from the Python CDK to our declarative low-code CDK. Due to changes to the incremental stream state message format and the removal of a nonexistent field from the ivrs stream schema, this migration constitutes a breaking change. After updating, please reset your source before resuming syncs. For more information, see our migration documentation for source Zendesk Talk."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katmarkham can you review the breaking change banner when you get a chance? The connector will be good to go once that's done

@lazebnyi lazebnyi merged commit 0405990 into master May 13, 2024
37 checks passed
@lazebnyi lazebnyi deleted the amb/migrate-source-zendesk-talk-to-low-code branch May 13, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/zendesk-talk low-code-migration This connector has been migrated to the low-code CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants