-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modular Pipelines #220
Comments
@yetudada Not sure if this is where you'd like the feedback, but this is essentially how we've been building all our pipelines. One of the sticking points I've found is how to write tests that ensure the pipelines work within a kedro context. What I've resorted to doing is writing tests that create temporary kedro projects, then test the pipelines within them. |
Hi @EigenJT , thank you so much for sharing your feedback!! Could you maybe elaborate a bit on what you mean by Is that end-to-end, from loading fake data to context identifying the right pipeline, and running the nodes with the right inputs/outputs? Your tests sound like integration tests rather than unit tests to me, correct me if I'm wrong? Are you testing that different variations of a pipeline (with |
Hi @lorenabalan, yup integration tests is the best description. We write unit tests for the nodes, but to ensure the pipeline actually works as intended, we create fake data, fake parameters and a fake catalog, try the pipeline, evaluate the results, then tear the whole thing down. Essentially testing the results of a We haven't tried |
My bad, when you said "this is essentially how we've been building all our pipelines" I thought you meant with Edit: Also maybe worth taking a peek at the docs in develop, I suspect something like this could make your tests easier to reason about: https://kedro.readthedocs.io/en/latest/04_user_guide/06_pipelines.html#using-a-custom-runner |
@lorenabalan Ah yeah that would make things much easier. Regarding the additional end-to-end testing: ensuring that the pipeline actually performs what it was supposed to do. So given fake inputs, validate that the resulting outputs are exactly what they're supposed to be. So a data test, in short. As an example, a pipeline that's dedicated to reformatting a certain filetype would be have its final output tested against a known output (down to the individual values, as well as the schema) As an aside, one issue I've run in to is that when pipelines are written in isolation but then added together, the resulting pipeline can behave in an unexpected manner (the order in which nodes are run can change, for example). Not so much a test, but it would be interesting to have some way of requiring each modular pipeline to complete before another is kicked off. Maybe something like total_pipeline = combine_pipelines([pipeline_1,pipeline_2,pipeline_3...], enforce_order = True)
|
You're right in that order is not necessarily guaranteed, though that should only be at tie-level (nodes with the same number of dependencies), as we leave it to |
Also as an update (for whoever is interested), we're looking to include this feature in the next breaking release (0.16.0). We've merged |
All makes sense, and we've been making those explicit links between pipelines on our end to ensure that things run as expected. The |
Hi @EigenJT! We hope you've been able to make use of the new modular pipeline workflow. We're going to close this issue as part of our GitHub issue clean up but please do comment to re-open this issue or create a new one based on your requirements. |
Description
We've seen something incredible evolve through continued use of Kedro. Teams around the world are starting to use Kedro to create stores of reusable pipelines.
Last year, we introduced basic support for Modular Pipelines and this year we're doubling down on this area.
In our world, a modular pipeline is a series of generalised and connected Python functions that have inputs and outputs. A modular pipeline:
Context
The final evolution of Modular Pipelines will see an ecosystem of reusable pipelines. However, for now we want to focus on allowing users to easily add pre-assembled pipelines to an existing or new Kedro project and export their own pre-assembled pipelines.
Next steps
Give us feedback if you've tried Modular Pipelines and the basic support we have for using them, like
pipeline.transform()
. Modular Pipelines also have implications forkedro-viz
and we can't wait to show you what we have in mind for this.The text was updated successfully, but these errors were encountered: