-
Notifications
You must be signed in to change notification settings - Fork 6
Avoid infinite loop in module_name_from_path #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hmm, this fails Thanks for the feedback and it's great that GDAL is trying out docstub. Very happy to listen to good and bad experiences. :) |
src/docstub/_utils.py
Outdated
| if is_in_package: | ||
| name_parts.insert(0, directory.name) | ||
| directory = directory.parent | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the failing test, this seems like it may not be the correct fix.
Could you perhaps provide more context on how the original bug occurred? Maybe we can construct a minimal reproducing example for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the context of the test case, it seems to deal fine with the presence of a __init__.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OSGeo/gdal#13198, in what directory did you run docstub? It's meant to be passed the path to a package, including the root directory of that package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - that explains it. I'm running docstub on Python files generated from SWIG from within the same folder, along with an __init__.py file. I've updated the PR to add a check to avoid hanging if run in the same folder, but it works fine pointing to the package folder too with: docstub run ./osgeo --no-cache --config C:\docs\gdal\swig\python\pyproject.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it works if an absolute path is provided to docstub, but if I run it from the module directory itself (with docstub run . then we get an infinite loop because the parent of . is ., apparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That fits. I tried to make docstub preserve relative paths because that makes output and (error) messages a bit more readable. But absolute paths are the more robust option.
I definitely think docstub should handle this case more gracefully: docstub run . where . is inside a package.
Is there some particular behavior you would expect in that case? I'm thinking about that myself right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could convert the relative path to an absolute path with path = path.resolve() near the top of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm more asking if it makes sense that docstub supports running inside or only on part of a Python package at all. Which types are matched depends on what types are collected throughout the package.
Right now I'm thinking, support this use case but warn that running on partial packages may lead to incomplete results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm more asking if it makes sense that docstub supports running inside or only on part of a Python package at all. Which types are matched depends on what types are collected throughout the package.
Right now I'm thinking, support this use case but warn that running on partial packages may lead to incomplete results.
For info, I get the same output if I run within the folder (using . which works with the latest commit), or outside with a full or relative path (./osgeo). All .py files are in the same folder though, so maybe this is more relevant to projects with .py files in different subfolders?
Both approaches require some types to be added to the pyproject.toml (I'm not sure why as the same types in some docstrings don't throw errors): https://github.com/OSGeo/gdal/blob/b8d0f72f306e7fc0b5a511d96221277797301be5/swig/python/pyproject.toml#L47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how the osgeo package is structured or populated. I assume something like the osgeo.osr module is created during build time? So I'm not sure if docstub is missing something or this is simply a setup that docstub can't really support.
We didn't realise the GDAL Python docstrings had quite so many issues until running |
`Path(".").parent` can't move past the "." in a relative path and will
just return `Path(".")` again. This lead to `while True` never breaking.
To fix this, we make sure that absolute paths are used. We need to
resolve the `path` before `lru_cache` sees it. Otherwise, `lru_cache`
might return a wrong cached result in case the current working directory
changes. That should never happen in docstub, but I think it's still
good to be defensive here.
This change also adds a few other defensive guards and asserts.
Long term, it might be the least error-prone to resolve all paths to
absolute ones as soon as possible. However, we'd have to do some
additional work to shorten paths that are within the current working
directory. Otherwise, users might see unnecessarily long paths in
their output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got back to working on this. I took the liberty to implement and push a hopefully more robust fix that also accounts for this function using lru_cache which might interfere with relative paths.
I also added a warning in case docstub is invoked on a subpackage only. That should hopefully clear up some confusion. In my book we can merge this if the CI passes. Let me know if this addresses the bug on you side.
Off topic: If I get time I might look into using only absolute paths in docstub. That should the most robust option. But it requires additional work to shorten paths again before showing them to users.
|
@lagru - I tested this locally, and it no longer hangs when run in a subdirectory. Thanks for looking into this. I've created a PR to add docstub to the GDAL CI (OSGeo/gdal#13270) to check all annotations are valid. Next steps will be to look into refining the stub files. |
|
Good to know and thanks for testing! Merging now. I plan to ship it in a release soonish. |
Running on single files on their own, or multiple files was fine, but as soon as an
__init__.pyfile (even an empty one) was present thedocstub run . --no-cachecommand hangs.I thought it may be crashing out due to the size of the library, but after debugging, it appears there is a
breakmissing in themodule_name_from_pathfunction.We are looking at generating stub files for the GDAL project - thanks for creating this tool.