Skip to content

[Bug] dulwich throws an error when it gets a branch name #150

Closed
@natebessa

Description

@natebessa

Firstly, thank you to the team here for providing this package -- it's really helpful!

I noticed in the recent v3.0.0 release (big shoutout to @millin) that support for git branches was added. I tried to implement it and am seeing an error thrown from dulwich.

I'm passing to DbtRunOperator a project_dir that looks like https://<git-user>:<git-token>@github.com/<org>/<repo>.git@<branch>.

Here's the error raised:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow_dbt_python/hooks/dbt.py", line 287, in dbt_directory
    project_dir, profiles_dir = self.prepare_directory(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow_dbt_python/hooks/dbt.py", line 332, in prepare_directory
    project_dir_path = self.download_dbt_project(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow_dbt_python/hooks/dbt.py", line 144, in download_dbt_project
    return fs_hook.download_dbt_project(project_dir, destination)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow_dbt_python/hooks/fs/__init__.py", line 75, in download_dbt_project
    self._download(source_url, destination_url)
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow_dbt_python/hooks/fs/git.py", line 159, in _download
    client.clone(
  File "/home/airflow/.local/lib/python3.12/site-packages/dulwich/client.py", line 896, in clone
    head_ref = _set_default_branch(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dulwich/refs.py", line 1205, in _set_default_branch
    origin_ref = origin_base + branch
                 ~~~~~~~~~~~~^~~~~~~~
TypeError: can't concat str to bytes

I looked into the dulwich code and noticed two key things:

  1. https://github.com/jelmer/dulwich/blob/master/dulwich/refs.py#L1195C5-L1195C24 - _set_default_branch expects both origin and branch to be of type bytes.
  2. https://github.com/jelmer/dulwich/blob/master/dulwich/client.py#L898 - when clone calls _set_default_branch it is converting origin into bytes, but it is not converting branch.

Maybe this is really a bug on the dulwich side, but I wanted to raise this here first to make sure I'm not missing something, since their clone method is over 2 years old so this isn't a new bug on their end.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions