Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Resolve several issues with the require dependencies decorator #315

Merged
merged 4 commits into from
Feb 28, 2023

Conversation

tomaarsen
Copy link
Contributor

Hello!

Pull Request overview

  • Fix several issues re. the requires_dependencies decorator:
    1. There was a missing space between the sentences.
    2. Crucial brackets were missing in making the error message.
    3. "pygithub" was used where "github" should have been used.

Details

Issue 1

The output of the requires_dependencies error was e.g.:

ImportError: Following dependencies are missing: pygithub.Please install them using `pip install unstructured[github]`.

I.e., no space between the sentences.

Issue 2

If extras was not provided, then the first part of the error message was not included, e.g.

ImportError: Please install them using `pip install uninstalled_test_module`.

This was due to missing brackets.

Issue 3

The GitHubConnector was decorated with

@requires_dependencies(["pygithub"], extras="github")

Which attempts to import pygithub, which does not exist. Instead, github must be imported.


This all was causing test failures on some CI runs. For example here and here.
It is causing failures for my other PRs, e.g. #313 and #312.
Interestingly, the CI didn't fail every single time - I'm not quite sure why.

cc: @alvarobartt I believe these issues may negatively impact your PRs too.

  • Tom Aarsen

Otherwise the 'Follow dependencies are missing ...' section is excluded if extras is not Truthy
to "github". This was causing CI test failures in unrelated PRs.
Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cragwolfe
Copy link
Contributor

Interestingly, the CI didn't fail every single time - I'm not quite sure why.

Best guess is that github virtualenv caching is not behaving as expected :/

@alvarobartt
Copy link
Contributor

Hello!

Pull Request overview

  • Fix several issues re. the requires_dependencies decorator:

    1. There was a missing space between the sentences.

    2. Crucial brackets were missing in making the error message.

    3. "pygithub" was used where "github" should have been used.

Details

Issue 1

The output of the requires_dependencies error was e.g.:


ImportError: Following dependencies are missing: pygithub.Please install them using `pip install unstructured[github]`.

I.e., no space between the sentences.

Issue 2

If extras was not provided, then the first part of the error message was not included, e.g.


ImportError: Please install them using `pip install uninstalled_test_module`.

This was due to missing brackets.

Issue 3

The GitHubConnector was decorated with


@requires_dependencies(["pygithub"], extras="github")

Which attempts to import pygithub, which does not exist. Instead, github must be imported.


This all was causing test failures on some CI runs. For example here and here.

It is causing failures for my other PRs, e.g. #313 and #312.

Interestingly, the CI didn't fail every single time - I'm not quite sure why.

cc: @alvarobartt I believe these issues may negatively impact your PRs too.

  • Tom Aarsen

Thanks for reporting this @tomaarsen! I realized about the missing space but not about everything else... Also regarding pygithub I was not sure how to put it there because in some cases, like that one, the installable is named one way but the import uses another name, but as long as we point the user to install the extras instead I think we're OK with it 😀 Thanks again!

@cragwolfe cragwolfe enabled auto-merge (squash) February 28, 2023 20:12
@tomaarsen
Copy link
Contributor Author

tomaarsen commented Feb 28, 2023

It definitely could be a cache thing. I've seen e.g. the Python 3.8 job pass while the 3.9 job fails, despite that they should both install the same dependencies.

Also regarding pygithub I was not sure how to put it there because in some cases, like that one, the installable is named one way but the import uses another name, but as long as we point the user to install the extras instead I think we're OK with it 😀 Thanks again!

This is always really tricky. I've encountered this same problem very recently with requiring users to install sklearn/scikit-learn before using some functionality. I think we can get away with the behaviour from this PR currently.

@cragwolfe cragwolfe merged commit 1ccbc05 into Unstructured-IO:main Feb 28, 2023
@tomaarsen
Copy link
Contributor Author

Note that issue 3 in particular was a breaking issue: I suspect that the GitHub ingest data connector is not usable for v0.5.0. It's a shame that the CI didn't pick it up earlier.

@tomaarsen tomaarsen deleted the fix/require_deps_decorator branch February 28, 2023 20:23
@cragwolfe
Copy link
Contributor

Note that issue 3 in particular was a breaking issue: I suspect that the GitHub ingest data connector is not usable for v0.5.0. It's a shame that the CI didn't pick it up earlier.

Just verified it is indeed broken for 0.5.0 but works locally installing from main:

pip install -e .[github]

and verified installed with:

pip list | grep unstructured
unstructured            0.5.1.dev1         /Users/cragwolfe/r/unstructured

definitely worth a patch release!

@cragwolfe
Copy link
Contributor

And fixing the fixture outputs:

There are differences from the previously checked-in structured outputs.

If these differences are acceptable, copy the outputs from
github-downloadify-output/ to test_unstructured_ingest/expected-structured-output/github-downloadify/ after running

  PYTHONPATH=. ./unstructured/ingest/main.py --github-url dcneiner/Downloadify --github-file-glob '*.html,*.txt' --structured-output-dir github-downloadify-output --verbose

https://github.com/Unstructured-IO/unstructured/actions/runs/4297066379/jobs/7489614747

I'll can pick this up in ~1-2 hours if nobody else does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants