Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support HTTP authentication for DbtGitRemote #112

Open
KarthikRajashekaran opened this issue Mar 30, 2023 · 8 comments
Open

[Feature] Support HTTP authentication for DbtGitRemote #112

KarthikRajashekaran opened this issue Mar 30, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@KarthikRajashekaran
Copy link

KarthikRajashekaran commented Mar 30, 2023

I am trying to use GitLab DBT project repo using DbtGitRemoteHook

 dbt_run = DbtRunOperator(
        task_id="dbt_run",
        project_dir="https://domain/abc/-/tree/main/dbt/db_metrics?private_token=abcdesf",
        dbt_conn_id="dbt_conn_id",
        target="dev",
        do_xcom_push_artifacts=["run_results.json"],
    )
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='scm.platform.us-west-2.io', port=443): Max retries exceeded with url: https://scm.platform.us-west-2.io/users/auth/saml (Caused by ResponseError('too many redirects'))

When i tried to access on the browser with https and the token i was able to access

Also tried with below project_dir

project_dir="https://$gitlabUser:$gitlabToken@domain/abc.git",

File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 705, in clone
  result = self.fetch(path, target, progress=progress, depth=depth)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 782, in fetch
  result = self.fetch_pack(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2085, in fetch_pack
  refs, server_capabilities, url = self._discover_references(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 1941, in _discover_references
  resp, read = self._http_request(url, headers)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2219, in _http_request
  raise HTTPUnauthorized(resp.headers.get("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided
@KarthikRajashekaran KarthikRajashekaran changed the title ModuleNotFoundError: No module named 'dulwich' DbtGitRemote: HTTPSConnectionPool Mar 30, 2023
@tomasfarias tomasfarias added the enhancement New feature or request label Mar 30, 2023
@tomasfarias tomasfarias changed the title DbtGitRemote: HTTPSConnectionPool [Feature] Support HTTP authentication for DbtGitRemote Mar 30, 2023
@tomasfarias
Copy link
Owner

Thanks for opening an issue.

Support for authentication in git remotes was not implemented, hence the error. I'm working on a patch for this and already got it working. Just need to do some clean up of the code and add some tests.

We'll do this in two steps:

  • First release (v1.0.4) will support HTTPS auth by specifying a user/password or token in the URL. This should be enough to get you going: locally I'm able to pull a private repo from GitLab using a project_dir in the form https://oauth2:<my-personal-access-token>/gitlab.com/tomasfarias/<my-private-repo>, which is equivalent to your second attempt.

  • Second release (likely v1.0.5, but potentially 1.1.0), will include proper Airflow connection support so that you can store your credentials in Airflow instead of having to pass them as the project's URL. This requires a bit of refactoring in the git remote, but very doable otherwise.

@KarthikRajashekaran
Copy link
Author

Tentative release for v1.0.4 ?

@tomasfarias
Copy link
Owner

v1.0.4 going out later today assuming CI is green.

@alvaromendoza
Copy link

I see that v1.0.4 is already available on PyPI. However, doing pip install airflow-dbt-python==1.0.4 seems to install a version of the code that does not have the changes from #113. Also, pyproject.toml still says version = "1.0.3". Is there a problem? Or am I missing something?

@tomasfarias
Copy link
Owner

tomasfarias commented Apr 3, 2023

Thanks for bringing this up @alvaromendoza.

I think I may have tagged the wrong commit, and thus 1.0.4 was deployed without the changes. Unfortunately, PyPI doesn't allow overwriting existing releases, so I will go ahead and do a 1.0.5 release. This will just be what v1.0.4 was intended to be, no other changes. I may yank 1.0.4 afterwards, just so that folks upgrade from 1.0.3 to 1.0.5 directly.

Sorry for the inconveniences. The deployment pipeline is all automated except for bumping the version and tagging the commit, and I couldn't get that right 😅

@tomasfarias
Copy link
Owner

Just pushed tag v1.0.5 which does have the latest changes as you can verify looking at the tree: https://github.com/tomasfarias/airflow-dbt-python/tree/v1.0.5/airflow_dbt_python/hooks.

It should be deployed shortly to PyPI.

@FouadApp
Copy link

FouadApp commented Apr 3, 2023

Preferably to add verify ssl = bool

to avoid this error:
SSL: CERTIFICATE_VERIFY_FAILED

@KarthikRajashekaran
Copy link
Author

does version v1.0.5 support both remote GitLab clone and airflow connections?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants