-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve URL creation in the CDK #8513
Conversation
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
070dde0
to
3822e51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a better approach, but does it solve the point described in the PR description? also what's the need for rstrip especially if we are adding it again afterwards with + "/"
?
@@ -30,7 +30,7 @@ def __init__(self, config: Mapping[str, Any], stream_name: str = None, param: Ma | |||
self.config = config | |||
self.start_date = config["start_date"] | |||
self.window_in_days = config["window_in_days"] | |||
self._url_base = config["domain_url"] | |||
self._url_base = config["domain_url"].rstrip("/") + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of the rstrip portion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that we're not sure if the value has trailing slash. This way allows us to be sure.
Yes, it allows to return full url, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation, lgtm. Can you update the CDK docs as appropriate to indicate this?
Didn't find a place in the current version of the docs where according updates would be strictly required. |
@sergei-solonitcyn you need to publish the CDK and update version, team can help with procedure |
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
Signed-off-by: Sergei Solonitcyn sergei.solonitcyn@zazmic.com
What
Current realization of the CDK uses straight strings concatenation to create a full url from
url_base
and/path
.It works for the most of cases. But the reason why it is a common-known bad practice, is a lack of flexibility.
One case, which was the initial reson of this update, is a Mailgun source, where we get a full url for
next
andprevious
pages. Using current scheme we're forced to create ugly code trying to extract parts of the url and concatenate it with existing base url. Besides all, it's not guaranteed to get right url this way, as the source doesn't define it.So, we should use best practice to create urls in a flexible and predictable way.
How
Just replaced concatenation with
urllib.parse.urljoin
function.Some of existing connectors used to rely on the old way of url creation. They also were updated according to the new one.
Recommended reading order
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py
User Impact
It isn't expected to have any user impact.
Only one dangerous is that some of existing conenctors could be left unattended during this change.
Automatic integration tests have to show if this is the case.