Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Fix BaseGithubClient and _generate_documents #526

Merged
merged 2 commits into from
Sep 21, 2023

Conversation

mkly
Copy link
Contributor

@mkly mkly commented Sep 20, 2023

  • Add branch parameter to get_branch definition due to GithubClient implementation get_branch definition making use of branch_name as opposed to branch. This issue is most prevelent when type checker runs and they do not match
  • Throw exception if neither branch or branch_name is not defined. Potentionally could also add deprecation messaging.
  • Fix definition of _generate_documents id parameter. Default to empty string "" as os.path.join will omit an empty string from the path list, but will fail on None
  • Fix test__generate_documents url value likely added after test addition
  • Fix test_github_client test to loader_hub as llama_hub did not exist at commit sha selected in test
  • Add pytest-asyncio dev dependency as this is required for async tests in test_github_client that make use of the @pytest.mark.asyncio decorator

Note: Tests in tests_github_repo are currently all skipped unless the skip declaration is removed or commented out from tests_github_repo/__init__.py

@mkly
Copy link
Contributor Author

mkly commented Sep 20, 2023

Noticed a few quick issue with type checking and tests and figured I'd send this over. Let me know if you have any other ideas on how to handle it or if you have any questions about the original issues. Thanks for maintaining these. Very much appreciated. 🍹

Comment on lines +346 to +350
if branch is None:
if branch_name is None:
raise ValueError("Either branch or branch_name must be provided.")
branch = branch_name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is a nice one

but why can't you use branch_name directly? instead of adding a new parameter

await self.request(
    "getBranch", "GET", owner=owner, repo=repo, branch=branch_name
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, and I likely should have gone into some detail about why that was done.

In the BaseGithubClient class the get_branch method parameter name is branch_name
https://github.com/emptycrown/llama-hub/blob/cdb6b30ce8fbb765f9b8ee5d47a5d23f51b2b9ba/llama_hub/github_repo/github_client.py#L196

In the GithubClient class the get_branch method parameter name is branch (note it does not have the _name)
https://github.com/emptycrown/llama-hub/blob/cdb6b30ce8fbb765f9b8ee5d47a5d23f51b2b9ba/llama_hub/github_repo/github_client.py#L325

GithubRepositoryReader uses BaseGithubClient as the type for github_client to enforce the interface that a github client would need to conform to.
https://github.com/emptycrown/llama-hub/blob/cdb6b30ce8fbb765f9b8ee5d47a5d23f51b2b9ba/llama_hub/github_repo/base.py#L71

Here is how the GithubClient class is used with GithubRepositoryReader:
https://github.com/emptycrown/llama-hub/blob/cdb6b30ce8fbb765f9b8ee5d47a5d23f51b2b9ba/tests/tests_github_repo/test_github_reader.py#L876

github_client would typically be GithubClient which should conform to BaseGithubClient but currently does not due to the parameter names. If we were just to update either BaseGithubClient or GithubClient to align with the other by changing branch to branch_name, we might cause an issue with code in the wild that is using the parameter name branch or branch_name as opposed to passing the argument in via position. That is the reason for adding both as Optional[str] = None and then throwing a ValueError if neither are set.

I understand it a bit of an odd situation and hope that helps to add some additional context for the change. There also may be a preferred way of dealing with this situation based your preferences as the maintainer and I'd be more than happy to adjust for that.

Thank you again for taking a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood now, thanks for the detailed answer!

@EmanuelCampos
Copy link
Collaborator

@mkly sounds good, can you update with main please?

* Add `branch` parameter to `get_branch` definition due to
  `GithubClient` implementation `get_branch` definition making use of
  `branch_name` as opposed to `branch`. This issue is most prevelent
  when type checker runs and they do not match
* Throw exception if neither `branch` or `branch_name` is not defined.
  Potentionally could also add deprecation messaging.
* Fix definition of `_generate_documents` id parameter. Default to empty
  string `""` as `os.path.join` will omit an empty string from the path
  list, but will fail on `None`
* Fix `test__generate_documents` `url` value likely added after test
  addition
* Fix `test_github_client` test to `loader_hub` as `llama_hub` did not
  exist at commit sha selected in test
* Add `pytest-asyncio` dev dependency as this is required for async
  tests in `test_github_client` that make use of the
  `@pytest.mark.asyncio` decorator

_Note_: Tests in `tests_github_repo` are currently all skipped unless
the skip declaration is removed or commented out from
`tests_github_repo/__init__.py`
@mkly mkly force-pushed the fix_base_github_client branch from 9de7c02 to b015af8 Compare September 21, 2023 23:09
@EmanuelCampos
Copy link
Collaborator

EmanuelCampos commented Sep 21, 2023

Nice one, Thank you!

@EmanuelCampos EmanuelCampos merged commit 03ebe5b into run-llama:main Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants