-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating NDB into gcloud.datastore? #557
Comments
Do we know of issues using In the "new world order", I'm not sure what ISTM that |
I don't have confirmation either way. We certainly didn't design it with the Python App Engine sandboxing in mind... Might be worth a separate issue to look into that ?
Yes -- See item 3 in the list of suggestions ("Rename gcloud.datastore to be a peer with ndb (using "simple" in this diagram, not set on that at all though).")
This was actually a point of contention -- because I don't think I also suspect it will be more work to port NDB on top of |
gcloud should work asis in GAE right now, but its performance would be significantly worse than ndb's. Implementing on top of something like JJ was able to convince me that NDB should not be implemented on top of Importantly though, I think that Personally (I don't think we really talked about this), I would like to see them result in a nice API for library developers or for customers who want more control than |
RE: As Patrick mentions, |
@jgeewax and @pcostell know, but for those who don't I've also taken a stab at converting @jgeewax There is a lot going on in ISTM (given the observed timing difference on GAE) that we should use |
Can we look at what those things are? The reason I ask is that we have to find a happy balance between having one entry point for Google Cloud stuff in Python ( Would this be a good fit for a git submodule? Where as far as our users are concerned, they still type |
I recommend taking a peak at The primary modules doing non-ORM type things are:
These are great features of Just to get a sense of the size, we have about 2500 lines of non-test / non-generated code:
On the other hand,
My current
|
RE: Using a git submodule, @tseaver has expressed a distaste for it. We can ship |
Will pre/post hooks be supported when using the proposed |
Can you elaborate or provide some snippets? |
Sure, here's a really simplified and contrived example of where hooks might be used that are currently difficult to replicate when making write-requests through the Datastore API: class Book(ndb.Model):
title = ndb.StringProperty()
author = ndb.StringProperty()
def _pre_put_hook(self):
# If this is a new entity, use a sharded counter to track the total number of books.
if self.key == None:
sharded_counter.incr('sum_of_books') Another, possibly more important, concern is the use of Computed Properties since they are stored/indexed and used in queries. class Region(ndb.Model):
zip_codes = ndb.StringProperty(repeated=True)
len_zip_codes = ndb.ComputedProperty(lambda self: len(self.zip_codes)) |
@eric-optimizely Sorry for the poor question on my part. I am aware how hooks work in I was curious how you saw that working with the As for |
Ok, thanks for clarifying. I was hoping for some sort of magic :) but, I suspect that support for hooks/computed properties is going to be complicated unless there's some other mechanism that's storing/caching them within the Datastore API itself. |
Magic within We could add support for hooks here but I'm not sure if you want them in RE: storing/caching within Datastore API, that's not necessary for a hook or a computed properties. A computed property just uses local data to create some derived property while a hook just does pre- and post-processing on data sent/received. |
My assumption is that you'd need to import the Models in order for those hooks/properties to be generated and executed. If that's a correct assumption, then the problem for us would be that not all codebases which access our Datastore via the API have a copy of the Model definitions. The Models are defined by the web application in one repo, and other ancillary systems that access the data life in other repos. This gets more complicated if the Models are defined in Python, but other callers are written in other languages. The storing/caching mechanism I mentioned could allow all callers of the Datastore API to remain agnostic about the implementation details and maintain consistency on write ops. |
Ahhhh I finally get it :) Sorry for being so slow on the uptake! I don't think that's a doable feature, but I like it. It essentially would require another service or just a custom backend. In either case, you'd have HTTP overhead that could really hurt large applications. Best bet would be to use the same models (even if you could duplicate the behavior in another language, keeping the code in sync would be a very dangerous proposition since so easy to slip up). |
Yeah, I'm not sure if it would be possible to serialize the functionality of hooks/computed properties into protobufs in such a way that they could be retrieved by the gcloud library (language agnostic) and applied to the caller. I could imagine the (Python) code looking something like this:
|
For record-keeping. Scary https://code.google.com/p/googleappengine/issues/detail?id=9610 |
Async/Tasklets was the first thing I looked for when I cracked open gcloud-python. As a long-time GAE user, I've grown to love tasklets because they allow a developer to write performant async code while still allowing for proper separation of concerns / encapsulation. Meaning: to do RPC work in parallel, the developer doesn't need to jam a whole bunch of otherwise unrelated work into a single function. That said, the NDB implementation of tasklets is definitely a big undertaking and has challenging issues like the ones that @dhermes points out above (https://code.google.com/p/googleappengine/issues/detail?id=9610). I see that each of I don't yet know enough about the API backend for the Google APIs, but does it support a more global sense of batching? E.g., can gcloud have a higher-level notion of batching that will work over all (most?) of the sub-APIs? "Higher-level" batching isn't as powerful as full-on tasklets, but at least it would allow the performance gains, i.e., across disparate APIs. |
@squee1945 JJ opened #796 for discussion of library-wide batching. As for getting async into |
@dhermes yes, it's just wow. Didn't Guido use that area as a proving ground for the Tulip stuff in Py3? Another approach (as opposed to full-on tasklets) would be async methods and futures. But I suppose that may drag along the eventloop pump, etc. |
Do we have a timeline (UPDATE: for |
@pcostell I noticed https://www.googleapis.com/discovery/v1/apis/datastore/v1beta3/rest is serving. Does this mean it is ready? |
Almost :-). It's not quite usable yet. |
Bump, v1beta3 is out. Is this still something we want to do? |
I don't think so. I'd like to recommend gcloud-python as the preferred way to use Cloud Datastore with ndb as an ORM on top of that if you'd prefer an ORM-like experience. I'd like ndb to be available as a separate install for users that want to use it. |
That's fine. But we do need a plan for making that happen. On Wed, May 4, 2016, 9:59 AM Patrick Costello notifications@github.com
|
What's the current best practice for folks who like NDB but want to build on the flexible environment? (There's the compat runtime but that sounds like a temporary/intermediate solution.) |
I think that's a question for @pcostell? |
If you're using a flexible environment and would like to use ndb, I'd recommend using the compat runtime, which will get you the entire App Engine SDK with serving fastpaths. In the future, this hopefully won't be necessary and you can just include ndb manually, but we're not at that point yet. |
Is there any updates on this? Can I use ndb on flexible enviroment? Or should I go with python-compat yet? cc @pcostell |
Just to be clear, there's currently no Python 3-compatible way to access @pcostell wrote:
Since this issue has been closed, are there any other issues tracking the implementation(s) of these preferences? Are they on a roadmap anywhere? |
@pcostell wrote:
Is anyone actually working on this? Do you expect it'll be available in the next couple months or is this a longer-term undertaking? (We'd like to use Python 3 in the flexible environment but want to be able to take advantage of caching, transactions, and other features of NDB.) |
@aatreya did you make any progress or exploration here? I hit the same roadblock and challenge. |
I'd like to use ndb on a python 3.5 project in the flexible environment. would love to see some progress on making this possible. |
@jonparrott Do you know of any updates? |
We are working on it (ndb from outside of App Engine), but right now there are a lot of incompatibilities that make it so it is likely not as usable. You can try out the existing state by following the instructions in the demo: https://github.com/GoogleCloudPlatform/datastore-ndb-python/tree/master/demo However there are a lot of gotchas (one of the more substantial ones is that it isn't running on gRPC and is RPCs are only run synchronously). If you try it out, please file any bugs on the ndb github tracker. Here is a bug to track this issue, rather than keeping this issue open in google-cloud-python: GoogleCloudPlatform/datastore-ndb-python#272 |
Any news about this. I don't move my projects to Appengine Flexible only because I don't want to migrate my code from ndb to the cloud datastore api. |
@magoarcano : See @pcostell's note from above, specifically https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/master/demo/task_list.py which is a demo app using NDB that runs inside GCE. Pay special attention to the configuration settings that turn things off. Also, As Patrick says, the performance of this will be sub-par due to synchronous requests and no caching. It's important to note that this is an incredibly complex undertaking because the run time environments are significantly different and NDB was built on the premise that it would only ever run in App Engine (which is an "all or nothing" service with Datastore, Memcache, Task Queues, etc all available via RPC calls). In Flex (or GCE, or AWS, etc) things are very different so a majority of those assumptions don't hold anymore. This means that even though a port of NDB might work, there will be a huge number of "Oh, we didn't realize NDB made that assumption!" moments, so we're being extra cautious about what we release to the world. We don't want to hand out code that works only in the right circumstances -- it should work everywhere. |
Source-Link: https://togithub.com/googleapis/synthtool/commit/92006bb3cdc84677aa93c7f5235424ec2b157146 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:2e247c7bf5154df7f98cce087a20ca7605e236340c7d6d1a14447e5c06791bd6
Had a few discussions with @GoogleCloudPlatform/cloud-datastore (particularly @pcostell) so I wanted to summarize the things we covered. This issue is to clarify the goals for Datastore support in gcloud-python, and discuss and decide on a course of action that everyone likes for how to reconcile what we have today with the other libraries out there.
Current state of the world
Currently, I see two styles that people want to use when interacting with Datastore:
For the former,
gcloud.datastore
has had the goal of covering this use case. For the latter,ndb
is the latest way (supported) way of doing this -- with others potentially existing, butndb
seems to be the clear leader.We also have a unique situation where our code currently might have trouble running in App Engine -- whereas
ndb
can't run outside of App Engine. The layout sort of looks like this:Looking forward
If our goals are ....
gcloud.datastore
andndb
are our choices for each style respectivelygcloud.datastore
andndb
should both run in App Engine and non-App Engine runtimesgcloud.datastore
is where all the recommended Python stuff to talk to Datastore should live (it is the "official source of truth")gcloud.datastore
(and setgcloud
as a Python dependency)... then I'd like to suggest that we....
ndb
over asgcloud.datastore.ndb
(bringing with itdatastore_rpc
anddatastore_query
)gcloud.datastore
to run on top ofdatastore_query
gcloud.datastore
to be a peer withndb
(using "simple" in this diagram, not set on that at all though).which makes things look like this:
What do you guys think?
/cc @GoogleCloudPlatform/cloud-datastore @dhermes @tseaver @silvolu @proppy
The text was updated successfully, but these errors were encountered: