Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding generated Bigtable classes. #22

Closed
wants to merge 3 commits into from
Closed

Adding generated Bigtable classes. #22

wants to merge 3 commits into from

Conversation

saicheems
Copy link

No description provided.

@saicheems
Copy link
Author

@anthmgoogle

@dhermes
Copy link
Owner

dhermes commented Nov 30, 2015

@saicheems Can you explain what these are and what generated them and why they are needed given the existence of gcloud_bigtable._generated?

@geigerj
Copy link

geigerj commented Nov 30, 2015

These are intended to replace the gRPC generated code that currently is in gcloud_bigtable._generated, for some common use cases. They hide some gRPC features, like explicit timeouts and request objects for every call, and are backed by the GAX library -- not included in this PR -- which performs things like retrying, page-streaming, and possibly eventually other features that might be desired of the callables. We're hoping that this will reduce the amount of code that needs to be handwritten on top of the gRPC generated code. They were generated by our code generator, which is not yet on github.

@dhermes
Copy link
Owner

dhermes commented Nov 30, 2015

Before continuing I should note that we are in the process of merging gcloud-python-bigtable into the main gcloud-python, however this one remains the upstream master until they are at parity.


  • Is there a link to the GAX library?
  • Does importing this work? The generated code assumed the google namespace package would be available but it was more trouble than it was worth, so I re-wrote it. Right now the gcloud_bigtable.google package looks public to users of this library, but we'd rather hide it as an implementation detail. How hard will that be to put it in a different place?
  • Do you still depend on the protobuf PyPI package?
  • Can you give some sample code explaining how to use this?
  • Do you have any documentation? I'm fairly uncomfortable depending on something undocumented.

/cc @jgeewax @tseaver

@geigerj
Copy link

geigerj commented Nov 30, 2015

  • GAX link will come later. I think this is intended only to get feedback on the public surface of the generated code, not to actually run or be merged; maybe Sai can comment?
  • I think we'll reorganize the package names before sending a working pull request. Most likely we'll use a package named "generated.google". For non-manually veneered APIs (unlike this one), the generated code will be the public surface with nothing sitting on top -- not sure how this will play into package naming if we need manually veneered codegen to look private.
  • Yes, we depend on the protobuf PyPI package
  • @saicheems / @anthmgoogle -- should we be including sample code at this point?
  • The documentation story is still in the works, I think.

@saicheems
Copy link
Author

Right, GAX is still in the works - we'll update when available but it's not intended to be a part of this review.

@geigerj @anthmgoogle Maybe we can include the end to end example for demonstration purposes?

@geigerj
Copy link

geigerj commented Nov 30, 2015

I think that makes sense -- it's not necessary intended to be the final sample code, but it should give some sense of how it might look.

@dhermes
Copy link
Owner

dhermes commented Nov 30, 2015

I don't expect GAX to be part of this review. I'd expect it to be include-able as a PyPI package.

@saicheems
Copy link
Author

@dhermes Right, our intention is to eventually have it available as a package.

I'll include our end to end example.

@geigerj
Copy link

geigerj commented Nov 30, 2015

@dhermes @saicheems Yes, although note that as a temporary solution, we may have to put it in gcloud-python, similar to what happed with Java GAX (googleapis/google-cloud-java#320). We haven't yet worked out a permanent home for GAX.

@dhermes
Copy link
Owner

dhermes commented Nov 30, 2015

GitHub and PyPI are a great start to be part of the Python community

@dhermes
Copy link
Owner

dhermes commented Nov 30, 2015

@geigerj So the current most pressing need is just to review the API surface? (And the forthcoming example usage docs)

@geigerj
Copy link

geigerj commented Nov 30, 2015

Yup, that's right.

@anthmgoogle
Copy link

For the owners of gcloud-python, this PR is not intended for merging. but we are looking for feedback and sign-off on the APi surface, and also details of the documentation and source code. In parallel we are working on a PR that will be mergable with all dependencies and layout issues resolved. But it will include these files. If these files need to be changed, we would like to iterate on any changes in this PR now.

For reference, there was a lot of good feedback on the Java equivalent of this that took a while to iterate through, so we would like to get going on the Python equivalent now to avoid a long delay when the mergable version is ready.

* New tables can be created in the cluster.
* The cluster's allocated resource levels will be readable via the API.
The embedded operation's "metadata" field type is
[CreateClusterMetadata][google.bigtable.admin.cluster.v1.CreateClusterMetadata] The embedded operation's "response" field type is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we strip out the link headers so that this is ".. field type is google.bigtable.admin.cluster.v1.CreaseClusterMetadata. The ..."

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It may be a pain unless you have something that can easily parse it from the comments in the .proto definition.

If you can parse it, you should reformat in Sphinx / RsT format to link within the generated classes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're working on improving the docs as we can right now, and have an issue tracking it.

@dhermes
Copy link
Owner

dhermes commented Dec 1, 2015

I haven't read the code yet, but have gone through e2e.py. It is essentially identical to writing an application with the standard code generated by protoc (using grpc_python_plugin).

Maybe I'm misunderstanding what the goal was?

# EDITING INSTRUCTIONS
# This file was generated from google/bigtable/admin/cluster/v1/bigtable_cluster_service.proto,
# and updates to that file get reflected here through a regular refresh
# process. However, manual and updates to that file get reflected here
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual and updates to that file doesn't make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Properties
@property
def channel(self):
return self.channel
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is circular.

>>> import sys
>>> sys.setrecursionlimit(4)
>>> class A(object):
...     @property
...     def b(self):
...         return self.b
... 
>>> a = A()
>>> a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in b
RuntimeError: maximum recursion depth exceeded

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@anthmgoogle
Copy link

"It is essentially identical to writing an application with the standard code generated by protoc (using grpc_python_plugin).Maybe I'm misunderstanding what the goal was?"

The code generator can take advantage of patterns and conventions in Google APis like page streaming. Unfortunately, BigTable is an atypical API that uses few standard patterns and is mostly custom methods. So the generator does not save as much user code as other APIs.

There should still be some benefits even in the absense of pattern use:

  • There are still savings to initialization code.
  • The callable pattern makes it more concise to use same advanced gRPC patterns.
  • There is an editable doc site for each method (unlike gRPC/stubby) where we can get proto-doc by default, but we can add usage examples.
  • As we detect client-side features in use by multiple APIs, new features can be added over time that can be auto-generated across languages.

"""Service for reading from and writing to existing Bigtables."""

# The default address of the logging service.
_SERVICE_ADDRESS = "bigtable.googleapis.com"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this uses double quotes while the scopes below use single quotes. Why mix and match? ISTM that using %r or {!r} or repr() would be what you'd want to do, and all of those would produce single quotes for the string above.

@dhermes
Copy link
Owner

dhermes commented Dec 1, 2015

The code generator can take advantage of patterns and conventions in Google APis like page streaming

There are three two (a, b) methods that use streaming but they seem to be identical.


There are still savings to initialization code.

Where? It seems to be identical, and calling many methods on stubs requires the same level of knowledge about the generated message classes (seems to occur when a property is itself a message).


The callable pattern makes it more concise to use same advanced gRPC patterns.

I'm not clear what you mean. Side-by-side:

# Veneer generated
cluster_pb = cluster_api.create_cluster(
    name=zone_name, cluster_id=disp_name, cluster=cluster)
# Standard protoc generated (with gRPC plugin)
cluster_pb = cluster_stub.CreateCluster(
    request_pb, timeout_seconds)

The only distinction is that the Veneer call constructs the request_pb for the user.


There is an editable doc site for each method (unlike gRPC/stubby) where we can get proto-doc by default, but we can add usage examples.

I'm not sure what you mean by this.


As we detect client-side features in use by multiple APIs, new features can be added over time that can be auto-generated across languages.

But the exact same can be said of the existing toolchain.

@anthmgoogle
Copy link

Some clarifications related to my earlier comment.

Page streaming what we call the pattern of List methods returning a page at a time and turning this into an idiomatic collection. Most APIs have some of these but bigtable does not.

The initialization saving is stuff the like the scopes relative to raw gRPC, not the per method stuff.

The advanced calling patterns I meant whether things like Async, chaining, batching, etc.

Regarding docs; gRPC and stubby methods cannot be extended in source form. So you can't go an add a language-speicific usage example to their doc cmments. That is possible with these wrappers.

Regarding future features, I'm not sure what you mean by existing toolchain. But if you mean gRPC, there are some features that cannot be added or unlikely to be added because they would depend on One Platform patterns or would need to depend on configuration outside the proto.

But overall it is acknowledged that the before and after on this particular API is not very different because of its atypical nature. It was much more evident in APIs like PubSub and Logging. We should discuss if that changes our strategy here.

@dhermes
Copy link
Owner

dhermes commented Dec 1, 2015

Page streaming what we call the pattern of List methods

Gotcher. I get it now.


The initialization saving is stuff the like the scopes relative to raw gRPC, not the per method stuff.

There is no reason that the gRPC generator couldn't do this. Though (I asked elsewhere in this PR) I don't actually know where the scopes and the API endpoints come from.


The advanced calling patterns I meant whether things like Async, chaining, batching, etc.

Where does that come into play? Why wouldn't the gRPC / protobuf code support it?


Regarding docs; gRPC and stubby methods cannot be extended in source form. So you can't go an add a language-speicific usage example to their doc cmments. That is possible with these wrappers.

That is not very relevant. Most library devs are going to wrap the generated code into something else anyway.

Even if this code was easier to use, the fact that there are 3 distinct services is a reason enough to wrap the Bigtable APIs into something higher-level to make it easier on users. (In our implementation we just concentrate on the nouns and verbs relevant to the API. It may help to take a peek at the docs.)


Regarding future features, I'm not sure what you mean by existing toolchain. But if you mean gRPC, there are some features that cannot be added or unlikely to be added because they would depend on One Platform patterns or would need to depend on configuration outside the proto.

I mean the protoc code generator with the grpc_python_plugin. I don't know what "One Platform" is, and generally am only concerned with publicly documented things users can see / already know about.

It seems this may be a hint at an answer to "where do the API scopes and URIs come from". If that's the case, why not provide those configuration values in a public way just like the .proto files so external developers can build their own toolchains or extend this one as they see fit?


It was much more evident in APIs like PubSub and Logging

We can run the generator on Pub/Sub and then get an idea of the differences there?


Overall, it seems like we are just adding another abstraction layer. For Bigtable at least, the current layers are

  • Happybase library (to emulate a popular HBase/thrift library)
  • Our low-level API mapping
  • The gRPC generated code
  • The Python protobuf library
  • The layer to just run Python (much slower than native)
  • The gRPC core (C++) code

It just doesn't seem worth it.

@anthmgoogle
Copy link

The feedback provided on the code-gen is very valuable. It would be much appreciated if we could keep iterating until these point issues are either addressed or captured as work items. For the purposes of this review it would good to separate out these higher level questions.

For the higher level questions we should cover these in a different forum, but this boils down to 2 categories:

  1. This specific API will have a lot less code-gen saving than others because it has atypical patterns. As such we should reevaluate whether it should go first. We might, for example, apply all the feedback on the details, but select a different API to focus on.
  2. Broader questions such as whether code-gen features go in gRPC or another layer is complicated and best described elsewhere.

@dhermes
Copy link
Owner

dhermes commented Dec 1, 2015

OK. Let me know what else I can do?

@anthmgoogle
Copy link

We will process the feedback you gave us. If you can review any updates and comment on whether the resolve the issue that would be much appreciated.

@dhermes
Copy link
Owner

dhermes commented Dec 1, 2015

Sure thing. Ping me when the review is needed.

@dhermes
Copy link
Owner

dhermes commented Dec 4, 2015

@anthmgoogle I'm realizing in porting some of this over that a nice utility your library could provide would be to automatically convert Any messages into the correct type (based on the type_url).

Feature request?


PS You should really start a GitHub project, even if there is no code and just an issue tracker for feature requests.

@saicheems
Copy link
Author

Thanks for the feedback, we made changes to address the immediately resolvable ones. We have issues tracking some of the more difficult problems - mainly:

  • Clearer, idiomatic (as much as possible) documentation
  • Better formatting
  • Parameter flattening

# Manual additions are allowed because the refresh process performs
# a 3-way merge in order to preserve those manual additions. In order to not
# break the refresh process, only certain types of modifications are
# allowed.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems like a waste of time trying to implement this feature.

@dhermes
Copy link
Owner

dhermes commented Dec 14, 2015

I don't see the value in the *_callable methods. Do you expect a user to actually think it's a good thing to have two methods with similar names, one of which they'll never use?

@saicheems saicheems closed this Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants