Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URL creation in the CDK #8513

Merged
merged 1 commit into from
Dec 5, 2021

Conversation

sergei-solonitcyn
Copy link
Contributor

Signed-off-by: Sergei Solonitcyn sergei.solonitcyn@zazmic.com

What

Current realization of the CDK uses straight strings concatenation to create a full url from url_base and /path.
It works for the most of cases. But the reason why it is a common-known bad practice, is a lack of flexibility.

One case, which was the initial reson of this update, is a Mailgun source, where we get a full url for next and previous pages. Using current scheme we're forced to create ugly code trying to extract parts of the url and concatenate it with existing base url. Besides all, it's not guaranteed to get right url this way, as the source doesn't define it.

So, we should use best practice to create urls in a flexible and predictable way.

How

Just replaced concatenation with urllib.parse.urljoin function.
Some of existing connectors used to rely on the old way of url creation. They also were updated according to the new one.

Recommended reading order

  1. airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py
  2. the rest

User Impact

It isn't expected to have any user impact.
Only one dangerous is that some of existing conenctors could be left unattended during this change.
Automatic integration tests have to show if this is the case.

@CLAassistant
Copy link

CLAassistant commented Dec 4, 2021

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues CDK Connector Development Kit labels Dec 4, 2021
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
@sergei-solonitcyn sergei-solonitcyn force-pushed the improve-url-creation-in-the-cdk branch from 070dde0 to 3822e51 Compare December 4, 2021 22:26
@sergei-solonitcyn sergei-solonitcyn temporarily deployed to more-secrets December 4, 2021 22:27 Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a better approach, but does it solve the point described in the PR description? also what's the need for rstrip especially if we are adding it again afterwards with + "/"?

@@ -30,7 +30,7 @@ def __init__(self, config: Mapping[str, Any], stream_name: str = None, param: Ma
self.config = config
self.start_date = config["start_date"]
self.window_in_days = config["window_in_days"]
self._url_base = config["domain_url"]
self._url_base = config["domain_url"].rstrip("/") + "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of the rstrip portion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that we're not sure if the value has trailing slash. This way allows us to be sure.

@sergei-solonitcyn
Copy link
Contributor Author

this seems like a better approach, but does it solve the point described in the PR description?

Yes, it allows to return full url, like https://domain.org/path/path, in our path() method, and it will replace the whole url, because urljoin works this way.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, lgtm. Can you update the CDK docs as appropriate to indicate this?

@sergei-solonitcyn sergei-solonitcyn merged commit 6ee922c into master Dec 5, 2021
@sergei-solonitcyn sergei-solonitcyn deleted the improve-url-creation-in-the-cdk branch December 5, 2021 20:30
@sergei-solonitcyn
Copy link
Contributor Author

thanks for the explanation, lgtm. Can you update the CDK docs as appropriate to indicate this?

Didn't find a place in the current version of the docs where according updates would be strictly required.
However, there is a big update of the docs ahead, where it could be mentioned.

@sherifnada
Copy link
Contributor

@sergei-solonitcyn you need to publish the CDK and update version, team can help with procedure

schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants