|
| 1 | +# `ndb` Migration Notes |
| 2 | + |
| 3 | +This is a collection of assumptions, API / implementation differences |
| 4 | +and comments about the `ndb` rewrite process. |
| 5 | + |
| 6 | +The primary differences come from: |
| 7 | + |
| 8 | +- Absence of "legacy" APIs provided by Google App Engine (e.g. |
| 9 | + `google.appengine.api.datastore_types`) as well as other environment |
| 10 | + specific features (e.g. the `APPLICATION_ID` environment variable) |
| 11 | +- Presence of new features in Python 3 like keyword only arguments and |
| 12 | + async support |
| 13 | + |
| 14 | +## Assumptions |
| 15 | + |
| 16 | +- In production, the `APPLICATION_ID` environment variable will be set to |
| 17 | + a useful value (since there is no `dev_appserver.py` for |
| 18 | + `runtime: python37`). This is used as a fallback for the `ndb.Key()` |
| 19 | + constructor much like `google.cloud.datastore.Client()` determines a default |
| 20 | + project via one of |
| 21 | + |
| 22 | + - `DATASTORE_DATASET` environment variable (for `gcd` / emulator testing) |
| 23 | + - `GOOGLE_CLOUD_PROJECT` environment variable |
| 24 | + - Google App Engine application ID (this is legacy / standard GAE) |
| 25 | + - Google Compute Engine project ID (from metadata server) |
| 26 | + |
| 27 | + The correct fallback is likely different than this and should probably cache |
| 28 | + the output of `google.cloud.datastore.client._determine_default_project()` |
| 29 | + on the `ndb.Key` class or `ndb.key` module (it should cache at import time) |
| 30 | + |
| 31 | +## Differences (between old and new implementations) |
| 32 | + |
| 33 | +- The "standard" exceptions from App Engine are no longer available. Instead, |
| 34 | + we'll create "shims" for them in `google.cloud.ndb._exceptions` to match the |
| 35 | + class names and emulate behavior. |
| 36 | +- There is no replacement for `google.appengine.api.namespace_manager` which is |
| 37 | + used to determine the default namespace when not passed in to `Key()` |
| 38 | +- The `Key()` constructor (and helpers) make a distinction between `unicode` |
| 39 | + and `str` types (in Python 2). These are now `unicode->str` and `str->bytes`. |
| 40 | + However, `google.cloud.datastore.Key()` (the actual type we use under the |
| 41 | + covers), only allows the `str` type in Python 3, so much of the "type-check |
| 42 | + and branch" from the original implementation is gone. This **may** cause |
| 43 | + some slight differences. |
| 44 | +- `Key.from_old_key()` and `Key.to_old_key()` always raise |
| 45 | + `NotImplementedError`. Without the actual types from the legacy runtime, |
| 46 | + these methods are impossible to implement. Also, since this code won't |
| 47 | + run on legacy Google App Engine, these methods aren't needed. |
| 48 | +- `Key.app()` may not preserve the prefix from the constructor (this is noted |
| 49 | + in the docstring) |
| 50 | +- `Key.__eq__` previously claimed to be "performance-conscious" and directly |
| 51 | + used `self.__app == other.__app` and similar comparisons. We don't store the |
| 52 | + same data on our `Key` (we just make a wrapper around |
| 53 | + `google.cloud.datastore.Key`), so these are replaced by functions calls |
| 54 | + `self.app() == self.app()` which incur some overhead. |
| 55 | +- The verification of kind / string ID fails when they exceed 1500 bytes. The |
| 56 | + original implementation didn't allow in excess of 500 bytes, but it seems |
| 57 | + the limit has been raised by the backend. (FWIW, Danny's opinion is that |
| 58 | + the backend should enforce these limits, not the library.) |
| 59 | +- I renamed `Property.__creation_counter_global` to |
| 60 | + `Property._CREATION_COUNTER`. |
| 61 | + |
| 62 | +## Comments |
| 63 | + |
| 64 | +- There is rampant use (and abuse) of `__new__` rather than `__init__` as |
| 65 | + a constructor as the original implementation. By using `__new__`, sometimes |
| 66 | + a **different** type is used from the constructor. It seems that feature, |
| 67 | + along with the fact that `pickle` only calls `__new__` (and never `__init__`) |
| 68 | + is why `__init__` is almost never used. |
| 69 | +- The `Key.__getnewargs__()` method isn't needed. For pickle protocols 0 and 1, |
| 70 | + `__new__` is not invoked on a class during unpickling; the state "unpacking" |
| 71 | + is handled solely via `__setstate__`. However, for pickle protocols 2, 3 |
| 72 | + and 4, during unpickling an instance will first be created via |
| 73 | + `Key.__new__()` and then `__setstate__` would be called on that instance. |
| 74 | + The addition of the `__getnewargs__` allows the (positional) arguments to be |
| 75 | + stored in the pickled bytes. The original `ndb` implementation did **all** of |
| 76 | + the work of the constructor in `__new__`, so the call to `__setstate__` was |
| 77 | + redundant. In our implementation `__setstate__` is succifient and `__new__` |
| 78 | + isn't implemented, hence `__getnewargs__` isn't needed. |
| 79 | +- Since we no longer use `__new__` as the constructor / utilize the |
| 80 | + `__getnewargs__` value, the extra support for |
| 81 | + `Key({"flat": ("a", "b"), ...})` as an alternative to |
| 82 | + `Key(flat=("a", "b"), ...)` can be retired |
| 83 | +- Key parts (i.e. kind, string ID and / or integer ID) are verified when a |
| 84 | + `Reference` is created. However, this won't occur when the corresponding |
| 85 | + protobuf for the underlying `google.cloud.datastore.Key` is created. This |
| 86 | + is because the `Reference` is a legacy protobuf message type from App |
| 87 | + Engine, while the latest (`google/datastore/v1`) RPC definition uses a `Key`. |
| 88 | +- There is a `Property._CREATION_COUNTER` that gets incremented every time |
| 89 | + a new `Property()` instance is created. This increment is not threadsafe. |
| 90 | + However, `ndb` was designed for `Property()` instances to be created at |
| 91 | + import time, so this may not be an issue. |
0 commit comments