-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A simple plugin system #380
Conversation
Thank you. I totally agree that using entry_points is a great alternative or addition to the registration system. |
I think this is a great idea. I was hoping that we could figure out how do this in the future for sources and sinks where you could connect Streamz to external services using packages install-able via pip or conda because, these connections often have a lot of dependencies that you don't necessarily want in your core package. I hadn't considered it for functionality that operates in the middle of the pipeline. I think this is the step that gets us in that direction. |
I think this is great! How should we document all these plugins so people can discover them? |
If we go with entry_points as a way of declaring plugins, then we do not document them, at least not exhaustively. Intake faces this: we have a page of known plugins, which people can submit PRs for changing, but there's nothing to stop new plugins appearing which we don't know about. It would be possible to have a configuration that only loads a select number of plugins, or instead to not load the plugins except upon explicit user invocation. |
I think this is really nice! I think for validating the plugins, we would definitely need Also, for testing an example plugin, having a separate repository/page called As for tests for future plugins, I think if we consider adding a plugin to [EDIT] I see Martin beat me to it. :) |
I think this is the way to go. Then I'll add tests, so that they are not bound to the repo in my account.
So, plugins in streamz-plugins repo that are tested and "verified" can be provided via |
This seems reasonable to me, although I would defer to Martin and CJ to make the final call whether or not this is the best way to proceed. |
I see Travis builds timing out. Is this due to it being overloaded? Should we try something like |
We need to transfer our tests away from travis... |
Yes, we should do whatever it takes to get travis to pass for now... I am only now looking at the actual code. I wonder, is there any appetite for making this system lazy? As it is, streamz will import all packages that claim to have relevant plugins - but the node classes still need to be annotated with decorators as before. I could imagine entry points spelled like:
which adds "transform" to the dir() of Stream, but only imports and registers when the Furthermore, we may want to be more specific with our entrypoint group naming:
What do you think? btw: does this require the package |
I like having them lazy, since that allows us to not need the plugins to be installed until we actually need them |
@martindurant would it be worthwhile to setup the CI using conda-forge's CI provisioning system? That way if we need to move to something we just rerender? |
Sorry @CJ-Wright , you're the expert there, I really don't know what's involved. |
I'll put up a PR if we're interested, but the rough outline is here: https://conda-forge.org/docs/user/ci-skeleton.html |
Does it work for github actions CI? |
Not yet, since CF doesn't use GH actions as a CI (yet). The hope of this approach would be that we don't need to care about what particular CI we are using, since we could rerender and the CIs could move. We could add GH actions to CF and then it would support that CI. |
In that case I am hesitant, since the dask repos are all going to github. I don't have a circle or azure account myself - is it all free like Travis used to be? |
Sorry, can we move the CI conversation to an issue? I don't want to derail things here. #382 |
Do you mean that with lazy loading they won't have to be annotated?
This can be done with a
Do these types of nodes require different treatment when being loaded?
I don't think so, this is built into setuptools. |
Indeed, this would be an alternative to the annotation; although doing both should be allowed for backward compatibility.
Something like that, yes. There are many ways to achieve it. You would call register_api on the class once it's imported, so it gets added to the namespace and you don't need to import it again.
maybe. This gives us more flexibility for the future. We could argue that, for the time being, all possible plugins as "nodes" (i.e., they appear as attributes of Stream). My motivation here is, that in Intake we started off having things called "plugins", but then made other parts of the library pluggable and had to, painfully, rename the original plugins to "drivers".
Better check that setuptools has this functionality for all python versions of interest. Also, setuptools had better be included in the deps (yes, I know everyone already has it). |
Then register_api would have to allow us to specify which attribute the class will be registered at. Right now it's
OK, will do. |
Correct. I imagine we'll have redundancy for now "name=package.module.name", but there might be a case for when we want the class name and method name to be different, which is not yet possible. Actually, there is a difference between normal nodes and sources: the latter are registered as |
@roveo, would you be interested to try something like |
Setuptools supported entry_points since |
Codecov Report
@@ Coverage Diff @@
## master #380 +/- ##
==========================================
+ Coverage 95.77% 95.84% +0.07%
==========================================
Files 16 17 +1
Lines 2508 2529 +21
==========================================
+ Hits 2402 2424 +22
+ Misses 106 105 -1
Continue to review full report at Codecov.
|
Added Plugins page to docs (I'm not a native speaker, so edits and additions are very welcome). Note that there's a table commented out in Known plugins section. I think at this point we should think about which existing functionality should go into extras (the leading candidate being kafka). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this PR
streamz/tests/test_plugins.py
Outdated
|
||
Stream.register_plugin_entry_point(entry_point) | ||
|
||
assert Stream.test_double.__name__ == "stub" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not sure whether the method should have been overwritten - presumably Stream.test_double
already existed before the call to register_plugin_entry_point
.
In normal usage, the stubs always come first, since they happen at import time, and streamz should always be imported before derived classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess this test actually doesn't test anything. I wanted to test that running Stream.register_api
twice on the same class doesn't break anything, in case the implementation of register_api
changes. We can just check for this in register_plugin_entry_point
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is OK, but not comprehensive. The cases are:
- register_api, then register entrypoint
- register entrypoint then register_api
- either of the to methods called twice
In every case, calling the method should result in the same test class.
We could check for whether the name is being overwritten, but I don't think that's essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In every case, calling the method should result in the same test class.
The class is stored in Stream.method.__wrapped__
, but I don't see how anything else but the test class can end up there in any of these cases.
Btw, what should we do if a plugin wants to override a built-in method? On the one hand, this would allow people to create useful extensions of built-in functionality (my version of partition
could be provided as a plugin long before it's merged into core), on the other — it could break things and lead to confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would have the system balk at a built-in override. My hope would be that names are cheap, so adding more text onto a would be partition
describing how it is different would add less burden than issues around which partition
was used. This could get particularly thorny as envs get updated, code that previously worked and relied on the built-in are now producing errors in subtle ways that could be difficult to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we should have a clobber=False
argument for the register methods?
I wonder in that case, if there is a sane way to tell when an entrypoint and register_api refer to the same thing, which would not be a problem even without clobber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder in that case, if there is a sane way to tell when an entrypoint and register_api refer to the same thing, which would not be a problem even without clobber.
We can just check that hasattr(Stream, entry_point.name)
. But it would have to happen during plugin load (so that we're not overwriting built-in methods with stubs) and so importing streamz with a bad plugin would result in an error. Or it could be a warning that we skipped an entrypoint because of name collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this will fail when someone adds the entrypoint to a class that already has register_api, no?
Note @CJ-Wright , that the current implementation does allow you to use register_api to overwrite methods at will. Of course this is python, so we can't ever disable someone who wants to do that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. To some extent this boils down to locality for me. If the author of the code is the user of the env then if they override things then they are in a better place to clean up. When installing 3rd party things that state may not hold as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the author of the code is the user of the env then if they override things then they are in a better place to clean up. When installing 3rd party things that state may not hold as well.
I agree. With register_api
things are explicit and visible. A plugin is a black box.
@martindurant please look at the last commit to see what I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this. @CJ-Wright ?
cls.register_api( | ||
modifier=modifier, attribute_name=entry_point.name | ||
)(node) | ||
if getattr(cls, entry_point.name).__name__ == "stub": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can be False, but it doesn't hurt to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is false when the plugin class returned from entry_point.load()
is decorated with @Stream.register_api
. Then it is registered right away when loaded, and so at the moment of this check stub has already been overwritten with the actual class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One thing to consider is scraping the conda-forge packages/data for packages who declare the streamz plugin, if we are interested in building a plugin registry.
Right, but there aren't any yet :) |
@chinmaychandak : that kafka test is failing too often! (and giving an incorrect message, due to using positional rather than keyword arguments to wait) |
It was more of a general idea, since other plugin mediated systems could use that approach as well. |
Right now, if I need to add my own custom stream nodes, I have to do this:
It would be nice to have a way to distribute additional functionality as separate packages that can just be installed via pip. This can be done with entry_points, similar to the way it works in airflow. This is a super bare-bones implementation of this mechanism. Check out https://github.com/roveo/streamz_example_plugin for an example of a plugin.
Problems:
isinstance(plugin, Stream)
, but there is probably something else I haven't thought of