Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP connection parse path as schema but use as protocol #10913

Open
dungdm93 opened this issue Sep 13, 2020 · 11 comments
Open

HTTP connection parse path as schema but use as protocol #10913

dungdm93 opened this issue Sep 13, 2020 · 11 comments
Labels
airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:providers good first issue kind:bug This is a clearly a bug provider:http

Comments

@dungdm93
Copy link
Contributor

Apache Airflow version:
1.10.11

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Ubuntu 20.04
  • Kernel (e.g. uname -a): 5.4.0-47-generic
  • Install tools:
  • Others:

What happened:
I store my connection into Airflow using this command

$ airflow connections --add --conn_id=demo --conn_uri="http://localhost:8080/path/foo/bar"
Successfully added `conn_id`=demo : http://localhost:8080/path/foo/bar

And when It's used by hooks, I realize that HTTP connection parse path as schema but use as protocol:

from airflow.hooks.http_hook import HttpHook

c = HttpHook.get_connection("demo")
print(f"type={c.conn_type} schema={c.schema}, user={c.login}, pass={c.password}, host={c.host}, port={c.port}, extra={c.extra}")
# type=http schema=path/foo/bar, user=user, pass=pass, host=localhost, port=8080, extra=None

h = HttpHook(http_conn_id="demo")
h.get_conn()
print(h.base_url)
# 'path/foo/bar://localhost:8080' 

For now, I could workaround by store hold URI in host field like

$ airflow connections --add --conn_id=demo2 --conn_type=http --conn_host=http://localhost:8080/path/foo/bar
Successfully added `conn_id`=demo2 : http://:******@http://localhost:8080/path/foo/bar:

But you could see, it's absolutely bad solution.

So my suggestion is the connection should be sanitize follow the standard URL syntax:
image

This mean that:

  • host part should only contain domain name, hostname or IP address
  • rename schema filed to path for better naming. This filed store HTTP path or SQL database name
  • add scheme or protocol filed for store acture connection protocol, e.g:
    • https connection: conn_type=http, protocol=https
    • jdbc connection: conn_type=mysql, protocol=jdbc:mysql
@dungdm93 dungdm93 added the kind:bug This is a clearly a bug label Sep 13, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 13, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented Sep 13, 2020

Yeah @dungdm93 - this is a known problem, It's not a bug per-se, more of a naming problem but it's hard to solve in a backward-compatible way. Would you like to attempt to solve it this way that the previous usages will be not affected ? It might be temporary (with some deprecation notice) but we have to make sure that at least when people migrate to 2.0, their code still works and at most they receive a deprecation warning.

@potiuk
Copy link
Member

potiuk commented Sep 13, 2020

And it would be great to solve it BTW. it's pretty anoying.

@mik-laj
Copy link
Member

mik-laj commented Sep 14, 2020

Related: #10256

@kubatyszko
Copy link
Contributor

Btw, this also affects some other protocols, so far I've identified HTTP/HTPS and SPARK:

Sample URI's that I have in my Secrets Manager :

http://https%3A%2F%2Fexample.com

spark://spark%3A%2F%2Fspark-master-0.spark-master.spark.svc.cluster.local:7077

I've also just added a piece of documentation that explains the strange URI's required. We should keep it until this current Issue taken care of... (but I recall from my conversation in the past that it was not easy to fix this).

@vashchukmaksim
Copy link

vashchukmaksim commented Mar 20, 2021

The above doesn't work for me, so I came up with the following that works in my case for HTTP:

AIRFLOW_CONN_API_AUTH=http://app_auth:8000/http

So I added http at the end and later use relative path in SimpleHttpOperator's endpoint argument.

@curlup
Copy link
Contributor

curlup commented Sep 23, 2022

I know it's not a bug per-se, but maybe a path forward would be to introduce new http and https hook and name it differently (RestHook or httpApiHook?) and then deprecate the current one, not breaking anyones code/pipelines behavior?

@potiuk
Copy link
Member

potiuk commented Sep 26, 2022

Why not. Maybe you would like to attempt to do it? PRs are welcome :)

@Bisk1
Copy link
Contributor

Bisk1 commented Nov 17, 2023

I would like to revive this thread because recently I've spend a lot of time when I was trying to set up an HTTPS connection because of how confusing connection URI is. Currently if I want to do it then the URI has to be either http://example.com/https OR http://https://example.com but the most intuitive one: https://example.com doesn't work.

I also noticed that there was more work done on supporting the second pattern: #31465 which makes it seem like it is now the recommended solution rather than a workaround.

I feel like there is a design decision needed here. Either Airflow committers agree that the current URI syntax: https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#uri-format is the final one and in that case this issue should be closed.

Alternatively (which is in my opinion better) we could try to aim at changing the URI syntax so that only the 3rd pattern is valid and others are deprecated. E.g. schema (actually 'scheme') and protocol could be considered as one and the same thing in the connection model (in future, now it would be a breaking change). And connection type can be derived from the scheme in code (e.g. if the scheme is HTTPS then conn_type is HTTP, in almost all other cases conn_type = scheme). Then the path part is not needed and hostname should not contain protocol because it will be available in scheme.

I'm willing to propose some solution and deprecation path (so that the old patterns are only supported until Airflow 3) if the community sees value in this idea.

@potiuk
Copy link
Member

potiuk commented Nov 17, 2023

I do not think it is final. I think we would be entirely open to get more "intutitive" way of defining and storing the connection as long as it does not introduce backwards compatiblity - then it can even be made "preferred" and the old one "deprecated" - I have completley no problem with that.

@Bisk1
Copy link
Contributor

Bisk1 commented Nov 17, 2023

Ok, after looking into this issue again I see that parts of my comment were mistaken - I realised that 'schema' in connection is not 'scheme' with a typo but a database schema. This is because these terms were mixed in HttpHook implementation. I think my PR (#35712) should fix this issue. But encoding protocol in the hostname is a separate thing so I will leave it for now.

@kaxil kaxil added airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 labels Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:providers good first issue kind:bug This is a clearly a bug provider:http
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants