-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design auto-registration of pipelines #1284
Comments
Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
On second thoughts I think it's clear that the One other thing I'm wondering though is a more extreme Part 2 of this where we remove the
|
Support for this idea in #1436 to enable a plugin that does yaml pipeline definitions. |
Suggestion number 2 in the previous comment seems most useful, although I would make the default function be the current one in
Number 3 seems very powerful and very simple to implement. |
Thanks for the comments @idanov. I like your idea 3 a lot. Building on it, the only thing I wonder is what the default value of
Following our current model in which a user doesn't need to touch settings.py unless they're trying to do something relatively advanced/customised, I would say that ultimately the default value should be the one which is most commonly useful for beginner users. This would be option 2 or 3, since then a beginner user doesn't need to touch settings.py or pipeline_register.py in order to run a simple kedro project (e.g. I could do the whole spaceflights tutorial without needing to touch those files at all). However, although option 3 is non-breaking, it would be a bit of a departure from current behaviour. So my feeling is probably option 1 is right for now, and we give option 2 and/or 3 as commented-out suggestions in settings.py (like we do with |
On second thoughts, I'm not sure how much I like idea 3... I'm guessing that a common pattern would be:
On point 2, as in my original example, what I would like to do is something like this:
The sequential nature of idea 3 means that this wouldn't be possible unless we let |
Notes from technical design on 29 June:
Conclusion:
Questions: in the future would we still add the settings.py option and/or remove pipeline_registry.py? |
To be implemented in #1664. Following discussion with Ivan, we decided there's no need to add an option for settings.py any more. |
Following a discussion in backlog grooming, the idea of auto-registering pipelines met with general approval so this is a ticket to design how to do it. See #1078 for original context and motivation.
The end goal
When I do
kedro pipeline create
it creates the following structure:Assuming they're following the above structure, a user should be able to run
kedro run --pipeline=a
without needing to editpipeline_registry.py
at all.kedro run
should run all pipelines, i.e. we have__default__ = a + b + c
. It should be possible for a user to overwrite these automatic registrations if they want to by editingpipeline_registry.py
as they can now.Ultimately the above structure should result in a
pipeline_registry.py
that acts like the following (but does not actually have this code):Proposed implementation
Something that is very roughly like this:
Then, if wanted, a user could change the default behaviour like this:
Questions:
Where should
get_default_registered_pipelines
go? The Zen of Kedro says A sprinkle of magic is better than a spoonful of it, which suggests maybe it goes in pipeline_registry.py itself. But maybe it's confusing for a user to have this weird looking code in such a core user-facing file (likehooks.py
seemed to me when I first saw it)? So maybe better to have it defined on framework-side and then done asimport kedro.pipeline...
instead?Alternative implementations
kedro.project._ProjectPipelines
that automatically registers pipelines. Sounds a bit too magical to me though - I prefer the explicitness of the above.kedro pipeline create
. Sounds totally horrible though.The text was updated successfully, but these errors were encountered: