Skip to content

Commit 6a68a46

Browse files
authored
Moving ndb migration notes into dedicated document. (googleapis#6246)
1 parent cf806f2 commit 6a68a46

File tree

2 files changed

+91
-77
lines changed

2 files changed

+91
-77
lines changed

ndb/MIGRATION_NOTES.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# `ndb` Migration Notes
2+
3+
This is a collection of assumptions, API / implementation differences
4+
and comments about the `ndb` rewrite process.
5+
6+
The primary differences come from:
7+
8+
- Absence of "legacy" APIs provided by Google App Engine (e.g.
9+
`google.appengine.api.datastore_types`) as well as other environment
10+
specific features (e.g. the `APPLICATION_ID` environment variable)
11+
- Presence of new features in Python 3 like keyword only arguments and
12+
async support
13+
14+
## Assumptions
15+
16+
- In production, the `APPLICATION_ID` environment variable will be set to
17+
a useful value (since there is no `dev_appserver.py` for
18+
`runtime: python37`). This is used as a fallback for the `ndb.Key()`
19+
constructor much like `google.cloud.datastore.Client()` determines a default
20+
project via one of
21+
22+
- `DATASTORE_DATASET` environment variable (for `gcd` / emulator testing)
23+
- `GOOGLE_CLOUD_PROJECT` environment variable
24+
- Google App Engine application ID (this is legacy / standard GAE)
25+
- Google Compute Engine project ID (from metadata server)
26+
27+
The correct fallback is likely different than this and should probably cache
28+
the output of `google.cloud.datastore.client._determine_default_project()`
29+
on the `ndb.Key` class or `ndb.key` module (it should cache at import time)
30+
31+
## Differences (between old and new implementations)
32+
33+
- The "standard" exceptions from App Engine are no longer available. Instead,
34+
we'll create "shims" for them in `google.cloud.ndb._exceptions` to match the
35+
class names and emulate behavior.
36+
- There is no replacement for `google.appengine.api.namespace_manager` which is
37+
used to determine the default namespace when not passed in to `Key()`
38+
- The `Key()` constructor (and helpers) make a distinction between `unicode`
39+
and `str` types (in Python 2). These are now `unicode->str` and `str->bytes`.
40+
However, `google.cloud.datastore.Key()` (the actual type we use under the
41+
covers), only allows the `str` type in Python 3, so much of the "type-check
42+
and branch" from the original implementation is gone. This **may** cause
43+
some slight differences.
44+
- `Key.from_old_key()` and `Key.to_old_key()` always raise
45+
`NotImplementedError`. Without the actual types from the legacy runtime,
46+
these methods are impossible to implement. Also, since this code won't
47+
run on legacy Google App Engine, these methods aren't needed.
48+
- `Key.app()` may not preserve the prefix from the constructor (this is noted
49+
in the docstring)
50+
- `Key.__eq__` previously claimed to be "performance-conscious" and directly
51+
used `self.__app == other.__app` and similar comparisons. We don't store the
52+
same data on our `Key` (we just make a wrapper around
53+
`google.cloud.datastore.Key`), so these are replaced by functions calls
54+
`self.app() == self.app()` which incur some overhead.
55+
- The verification of kind / string ID fails when they exceed 1500 bytes. The
56+
original implementation didn't allow in excess of 500 bytes, but it seems
57+
the limit has been raised by the backend. (FWIW, Danny's opinion is that
58+
the backend should enforce these limits, not the library.)
59+
- I renamed `Property.__creation_counter_global` to
60+
`Property._CREATION_COUNTER`.
61+
62+
## Comments
63+
64+
- There is rampant use (and abuse) of `__new__` rather than `__init__` as
65+
a constructor as the original implementation. By using `__new__`, sometimes
66+
a **different** type is used from the constructor. It seems that feature,
67+
along with the fact that `pickle` only calls `__new__` (and never `__init__`)
68+
is why `__init__` is almost never used.
69+
- The `Key.__getnewargs__()` method isn't needed. For pickle protocols 0 and 1,
70+
`__new__` is not invoked on a class during unpickling; the state "unpacking"
71+
is handled solely via `__setstate__`. However, for pickle protocols 2, 3
72+
and 4, during unpickling an instance will first be created via
73+
`Key.__new__()` and then `__setstate__` would be called on that instance.
74+
The addition of the `__getnewargs__` allows the (positional) arguments to be
75+
stored in the pickled bytes. The original `ndb` implementation did **all** of
76+
the work of the constructor in `__new__`, so the call to `__setstate__` was
77+
redundant. In our implementation `__setstate__` is succifient and `__new__`
78+
isn't implemented, hence `__getnewargs__` isn't needed.
79+
- Since we no longer use `__new__` as the constructor / utilize the
80+
`__getnewargs__` value, the extra support for
81+
`Key({"flat": ("a", "b"), ...})` as an alternative to
82+
`Key(flat=("a", "b"), ...)` can be retired
83+
- Key parts (i.e. kind, string ID and / or integer ID) are verified when a
84+
`Reference` is created. However, this won't occur when the corresponding
85+
protobuf for the underlying `google.cloud.datastore.Key` is created. This
86+
is because the `Reference` is a legacy protobuf message type from App
87+
Engine, while the latest (`google/datastore/v1`) RPC definition uses a `Key`.
88+
- There is a `Property._CREATION_COUNTER` that gets incremented every time
89+
a new `Property()` instance is created. This increment is not threadsafe.
90+
However, `ndb` was designed for `Property()` instances to be created at
91+
import time, so this may not be an issue.

ndb/README.md

Lines changed: 0 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -11,83 +11,6 @@ It was designed specifically to be used from within the
1111
Learn how to use the `ndb` library by visiting the Google Cloud Platform
1212
[documentation][2].
1313

14-
## Assumptions
15-
16-
This is a running list of "compatibility" assumptions made for
17-
the rewrite.
18-
19-
- In production, the `APPLICATION_ID` environment variable will be set to
20-
a useful value (since there is no `dev_appserver.py` for
21-
`runtime: python37`). This is used as a fallback for the `ndb.Key()`
22-
constructor much like `google.cloud.datastore.Client()` determines a default
23-
project via one of
24-
25-
- `DATASTORE_DATASET` environment variable (for `gcd` / emulator testing)
26-
- `GOOGLE_CLOUD_PROJECT` environment variable
27-
- Google App Engine application ID (this is legacy / standard GAE)
28-
- Google Compute Engine project ID (from metadata server)
29-
30-
The correct fallback is likely different than this and should probably cache
31-
the output of `google.cloud.datastore.client._determine_default_project()`
32-
on the `ndb.Key` class or `ndb.key` module (at import time)
33-
- The "standard" exceptions from App Engine are no longer available. Instead,
34-
we'll create "shims" for them in `google.cloud.ndb._exceptions` to match the
35-
class names and emulate behavior.
36-
- There is no replacement for `google.appengine.api.namespace_manager` which is
37-
used to determine the default namespace when not passed in to `Key()`
38-
39-
## Differences (between old and new implementations)
40-
41-
- The `Key()` constructor (and helpers) make a distinction between `unicode`
42-
and `str` types (in Python 2). These are now `unicode->str` and `str->bytes`.
43-
However, `google.cloud.datastore.Key()` (the actual type we use under the
44-
covers), only allows the `str` type in Python 3, so much of the "type-check
45-
and branch" from the original implementation is gone. This **may** cause
46-
some slight differences.
47-
- `Key.from_old_key()` and `Key.to_old_key()` always raise
48-
`NotImplementedError`. Without the actual types from the legacy runtime,
49-
these methods are impossible to implement. Also, since this code won't
50-
run on legacy Google App Engine, these methods aren't needed.
51-
- `Key.app()` may not preserve the prefix from the constructor (this is noted
52-
in the docstring)
53-
- `Key.__eq__` previously claimed to be "performance-conscious" and directly
54-
used `self.__app == other.__app` and similar comparisons. We don't store the
55-
same data on our `Key` (we just make a wrapper around
56-
`google.cloud.datastore.Key`), so these are replaced by functions calls
57-
`self.app() == self.app()` which incur some overhead.
58-
- The verification of kind / string ID fails when they exceed 1500 bytes. The
59-
original implementation didn't allow in excess of 500 bytes, but it seems
60-
the limit has been raised by the backend. (FWIW, Danny's opinion is that
61-
the backend should enforce these limits, not the library.)
62-
- I renamed `Property.__creation_counter_global` to
63-
`Property._CREATION_COUNTER`.
64-
65-
## Comments
66-
67-
- The `Key.__getnewargs__()` method isn't needed. For pickle protocols 0 and 1,
68-
`__new__` is not invoked on a class during unpickling; the state "unpacking"
69-
is handled solely via `__setstate__`. However, for pickle protocols 2, 3
70-
and 4, during unpickling an instance will first be created via
71-
`Key.__new__()` and then `__setstate__` would be called on that instance.
72-
The addition of the `__getnewargs__` allows the (positional) arguments to be
73-
stored in the pickled bytes. The original `ndb` implementation did **all** of
74-
the work of the constructor in `__new__`, so the call to `__setstate__` was
75-
redundant. In our implementation `__setstate__` is succifient and `__new__`
76-
isn't implemented, hence `__getnewargs__` isn't needed.
77-
- Since we no longer use `__new__` as the constructor / utilize the
78-
`__getnewargs__` value, the extra support for
79-
`Key({"flat": ("a", "b"), ...})` as an alternative to
80-
`Key(flat=("a", "b"), ...)` can be retired
81-
- Key parts (i.e. kind, string ID and / or integer ID) are verified when a
82-
`Reference` is created. However, this won't occur when the corresponding
83-
protobuf for the underlying `google.cloud.datastore.Key` is created. This
84-
is because the `Reference` is a legacy protobuf message type from App
85-
Engine, while the latest (`google/datastore/v1`) RPC definition uses a `Key`.
86-
- There is a `Property._CREATION_COUNTER` that gets incremented every time
87-
a new `Property()` instance is created. This increment is not threadsafe.
88-
However, `ndb` was designed for `Property()` instances to be created at
89-
import time, so this may not be an issue.
90-
9114
[0]: https://cloud.google.com/datastore
9215
[1]: https://cloud.google.com/appengine
9316
[2]: https://cloud.google.com/appengine/docs/python/ndb/

0 commit comments

Comments
 (0)