Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stubs for google.cloud.ndb the Google Cloud Datastore ndb client library #5821

Merged
merged 34 commits into from
Oct 21, 2021

Conversation

romanofski
Copy link
Contributor

@romanofski romanofski commented Jul 30, 2021

This adds stubs for https://github.com/googleapis/python-ndb, the Google Cloud Datastore ndb client library.

The stubs have been initially generated with mypy-stubgen. I have removed private modules and removed any unused imports. Most of the modules and functions are not typed yet, because these packages are quite big. I'm filing this as a draft in order to determine whether this would be an acceptable contribution. I would like to refine the types in subsequent pull requests.

PS: I've checked with upstream which place would be better for these annotations and currently typeshed it is (see googleapis/python-ndb#9).

@romanofski romanofski marked this pull request as draft July 30, 2021 10:55
@srittau
Copy link
Collaborator

srittau commented Jul 30, 2021

typeshed is the right place for this. A few things I noticed while briefly looking at it (not a full review):

  • The stubs directory should be named google-cloud-ndb, not python-ndb as the former is the name of the package on pypi.
  • Version number in METADATA.toml have only two components, as the last component usually has no significance for the API.
  • types-futures is a Python-2-only package, while this is presumably a Python-3-only package. In my understanding the dependency should just be removed and concurrent.futures be used as an import, but I am not certain about the right way to do this.

@srittau
Copy link
Collaborator

srittau commented Jul 30, 2021

One more thing: Untyped arguments and return values should just be left out, not annotated with Any as a clear sign that they are not annotated yet.

@romanofski
Copy link
Contributor Author

One more thing: Untyped arguments and return values should just be left out, not annotated with Any as a clear sign that they are not annotated yet.

Thank you so much for all the points raised! Will address them step by step.

The `types-future` package is only for Python2, but this package
supports only Python3. Also the versioning is only major minor, no patch.
@romanofski
Copy link
Contributor Author

romanofski commented Jul 31, 2021

One question:

typeshed is the right place for this. A few things I noticed while briefly looking at it (not a full review):

* The stubs directory should be named `google-cloud-ndb`, not `python-ndb` as the former is the name of the package on pypi.

Using pyright and pointing to my local typeshed checkout, with the google-cloud-ndb directory, pyright can not find stub files in the google.cloud.ndb namespace, keeping it python-ndb it can. I'm trying to find how modules are resolved in order to find the cause for this, but --verbose and the documentation is sparse. Any idea where I should look or what is causing this?

@srittau
Copy link
Collaborator

srittau commented Jul 31, 2021

Maybe @erictraut can help with your question. I am not sure how pyright handles third-party stubs at the moment.

@erictraut
Copy link
Contributor

When you say "pointing to my local typeshed checkout", how are you configuring that in pyright? Are you specifying the "typeshedPath" in your pyrightconfig.json file? That's the best way to replace the bundled typeshed stubs with a private copy.

There's another thing that might be happening here. Some versions of google's protobuf library shipped with a py.typed file in the google directory, and the py.typed was not marked partial. According to PEP 561, this overrides typeshed and all other submodules that start with google.. Look in your site-packages directory to see if you have a google subdirectory with a py.typed file in it. If so, make sure that py.typed file contains the line partial.

This is almost done. Most type annotations were extracted from the
ndb.model docstrings.
@romanofski
Copy link
Contributor Author

@erictraut thank you very much for your pointers. I can't see the import errors by pyright any more, yet I can't remember what I have changed. I basically picked the pyrightconfig.json from the typeshed package and adjusted the "typeshedPath", while keeping the rest of the settings. So not seeing import errors, I presume it's working as expected.

Re the google packages. I did check my virtual environment and the namespace packages which had py.typed seemed to be in the right place:

./google/cloud/secretmanager_v1/py.typed
./google/cloud/appengine_logging_v1/py.typed
./google/cloud/secretmanager_v1beta1/py.typed
./google/cloud/secretmanager/py.typed
./google/cloud/logging_v2/py.typed
./google/cloud/logging/py.typed
./google/cloud/appengine_logging/py.typed

The protobuf package did not contain a py.typed file.

Again, thank you very much for the pointer. If I run into trouble again, that's where I'll start investigating.

@romanofski
Copy link
Contributor Author

One more thing: Untyped arguments and return values should just be left out, not annotated with Any as a clear sign that they are not annotated yet.

@srittau could you clarify what that means for a big package like this? Would the better way be to remove any modules not having type hints instead of removing the Any from them piece by piece. Then PRs add the most important modules first and go from there?

@srittau
Copy link
Collaborator

srittau commented Aug 3, 2021

The latest version of stubgen from mypy 0.910 doesn't generate Any annotations if the type can't be determined. Maybe it's worth it to regenerate some stubs?

@romanofski
Copy link
Contributor Author

romanofski commented Aug 5, 2021

I have to ask, because I'm out of ideas here and I'm not really sure where I could start looking.
To test the types, I've written a simple program like so:

from google.cloud.ndb  import Model, StringProperty

class Foo(Model):

    name = StringProperty()

foo = Foo(name='frob')
reveal_type(foo.name)
reveal_type(foo.key)

Running mypy against it with:

MYPYPATH=./typeshed/stubs/google-cloud-ndb/ mypy --namespace-packages --custom-typeshed-dir=./typeshed foo.py

i get both types reported as Any.

Running pyright configured with:

{
    "typeshedPath": "/home/rjoost/typeshed/",
    "include": [
        "stdlib",
        "stubs"
    ],
    [...]

I get:

Found 1 source file
/tmp/foo.py
  /tmp/foo.py:8:13 - info: Type of "foo.name" is "Unknown"
  /tmp/foo.py:9:13 - info: Type of "foo.key" is "Unknown"
0 errors, 0 warnings, 2 infos 

To an extend, I can understand why name is unknown, when I look at model.pyi:

class MetaModel(type):
    def __init__(cls, name: str, bases, classdict) -> None: ...

class Model(_NotEqualMixin, metaclass=MetaModel):
    key: ModelKey = ...
    def __init__(_self, **kwargs) -> None: ...

If I'm not mistaken, the type of the name only becomes clear at runtime after instantiating Foo. However, shouldn't key at least be typed as ModelKey? Is there any chance I could type foo.name with additional type information in my sample program?

@srittau or @erictraut if you do have any pointers where I could read up on figuring out this scenario I'd be very grateful. I browsed the mypy documentation and https://github.com/microsoft/pyright/blob/main/docs/typed-libraries.md, but nothing sticks out.

@srittau
Copy link
Collaborator

srittau commented Aug 5, 2021

The descriptor protocol (__get__, __set__) in Property is throwing mypy (and probably pyright) off. If you annotate the return type of Property.__get__ correctly, the fields should be annotated correctly.

@romanofski
Copy link
Contributor Author

The descriptor protocol (__get__, __set__) in Property is throwing mypy (and probably pyright) off. If you annotate the return type of Property.__get__ correctly, the fields should be annotated correctly.

Wow yes! That is totally it. Hm pyright ist still revealing "Unknown", but perhaps the problem is somewhere among those lines. I'll play around with it. Thank you so much!!!

@erictraut
Copy link
Contributor

@romanofski, thanks for working on this PR. I checked out your branch to diagnose the problem you're encountering with pyright. It's due to a current limitation in pyright, which assumes that import module names will be unique across packages in typeshed. This PR introduces the first duplicate to typeshed, so it breaks that assumption. The google top-level module is already located within the protobuf package, and you're introducing a second copy in the google-cloud-ndb package. I've update the logic in pyright to eliminate this limitation. When I publish version 1.1.160 of pyright (tentatively tomorrow, Fri Aug 6), it will properly deal with this situation. I don't know if other type checkers have the same limitation.

Let me echo what @srittau said above. Please do not check in a set of stubs that contain a bunch of Any annotations. It's better to omit annotations rather than annotating types as Any. Of course, it's even better if you fill in the proper types, but that can be done over time by other contributors.

Did you reach out to the team that maintains the google-cloud-ndb package to see if they would be interested in accommodating inlined type annotations or packaged stubs? That approach is preferable to maintaining separate type stubs in typeshed. For additional guidance, you may find this documentation useful.

@romanofski
Copy link
Contributor Author

@romanofski, thanks for working on this PR. I checked out your branch to diagnose the problem you're encountering with pyright. It's due to a current limitation in pyright, which assumes that import module names will be unique across packages in typeshed.

Just out of curiosity: does the namespacing - I think almost all Google packages I've encountered are namespace packages - make it harder to traverse the imports?

This PR introduces the first duplicate to typeshed, so it breaks that assumption. The google top-level module is already located within the protobuf package, and you're introducing a second copy in the google-cloud-ndb package. I've update the logic in pyright to eliminate this limitation. When I publish version 1.1.160 of pyright (tentatively tomorrow, Fri Aug 6), it will properly deal with this situation. I don't know if other type checkers have the same limitation.

Thank you so much for doing this. Since you had it mentioned earlier, I did remove the protobuf directory from the typesheed tree just to see if it would make a difference. Unfortunately it not.

Let me echo what @srittau said above. Please do not check in a set of stubs that contain a bunch of Any annotations. It's better to omit annotations rather than annotating types as Any. Of course, it's even better if you fill in the proper types, but that can be done over time by other contributors.

Yes I fully agree.

Did you reach out to the team that maintains the google-cloud-ndb package to see if they would be interested in accommodating inlined type annotations or packaged stubs? That approach is preferable to maintaining separate type stubs in typeshed. For additional guidance, you may find this documentation useful.

Thanks again for the pointer. I'll have a more thorough read. Indeed I did check upstream first. The gist is, that the Google engineers would rather provide typing across all Python libraries (see googleapis/python-ndb#9) and prefer this going into typeshed for now.

@erictraut
Copy link
Contributor

Pyright 1.1.160 is now released, and with this PR, typeshed's tests are now using the new version.

Does the namespacing make it harder to traverse the imports?

It adds a little bit of additional complexity to the import resolution mechanism (especially if that mechanism maintains a cache for performance reasons), but it's not a big problem.

This is probably not correct. Will have to test.
This is currently going after what the documentation refers to their
typical return values is.
@romanofski
Copy link
Contributor Author

Alright.. all tests pass. Most of my work went into the model.py but I'd expand on that in future PRs.

Questions:

  • I still have modules which use Any since they were generated by mypy. Since Any is discouraged, will I have to fix those up for each module as well? I read that for some object is appropriate. I wonder whether for others it's ok to simply remove the type hint?
  • I read in the documentation, that partial type stubs can be flagged with .e.g. __getattr__ functions. I presume partial here means incomplete type stubs, where as this one would not be partial since the full package was generated by mypy?
  • Should I squash all changes before I move this draft into review?

Many thanks!!

@JelleZijlstra
Copy link
Member

  • Don't worry about leaving a few Anys. We prefer to avoid them, but we'd rather have imperfect stubs than no stubs.
  • Partial means not all public objects in the library are included in the stub. For a stubgen-generated set of stubs, it probably shouldn't be used.
  • Don't worry about squashing commits; we'll squash when we merge the PR. It's actually better not to squash or force-push because it makes reviewing easier.

@romanofski romanofski marked this pull request as ready for review September 9, 2021 01:42
@@ -0,0 +1,5 @@
from typing_extensions import Literal

EVENTUAL: Literal["EVENTUAL"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I installed it from pip and this value is 2 for me, not "EVENTUAL".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Will update as soon as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again thanks for spotting this! Should be addressed.

@@ -0,0 +1,12 @@
google.cloud.ndb.EVENTUAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments explaining why these need to be allowlisted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I must have generated this allowlist mistakenly in the earlier stages and never realised that it also hid some of the errors (e.g. google.cloud.ndb.EVENTUAL). Now I only kept the three inconsistencies with the __new__ signatures which use self in the library, but I kept it as cls in the stub. Otherwise I run into errors from other type checkers in the tests.

This also addresses more inconsistencies with mistakenly added symbols
to the allowlist
Copy link
Collaborator

@srittau srittau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change then this is good to go. Sorry for the long delay.

@@ -0,0 +1,2 @@
version = "1.9"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax has changed:

Suggested change
version = "1.9"
version = "1.9.*"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Thanks for pointing it out.

@romanofski
Copy link
Contributor Author

I hope I did not make it hard than it already is when I merged in origin/master in order to run the check_new_syntax.py myself to fix up using the built-in generics.

@srittau srittau merged commit 3a70f34 into python:master Oct 21, 2021
@romanofski romanofski deleted the feat/google-cloud-ndb branch October 21, 2021 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants