You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a project where there is a huge number of pipelines generated programatically (in a loop). The process of generating those pipelines takes a lot of time and it seems to be quadratic (see the chart below).
n - number of pipelines to sum time - time in seconds
The problem has 2 variants:
Large number of small pipelines
Small number of pipelines with large node count (200+).
Context
While Kedro encourages to keep the nodes small and pipelines modular - extensive use of both of those features/approaches lead to slow project startup times.
The most severe impact of this issue is in mono-repo setups, where multiple teams work in the same project but on separate pipelines - in such setups the number of pipelines grows quickly as the development proceeds.
The pipelines are listed after a few minutes (depending on the number of pipelines/nodes), with the time increasing quadratically (see the chart above).
Possible causes
The main problem is that internally, the pipelines are summed __add__ and then __init__ in the Pipeline class. The slowness of the operations inside of the __add__ itself is partially addressed by #3146 but the problem with the __init__ still remains - maybe the calls to _topologically_sorted in the constructor are the root cause. It would require more detailed profiling.
Your Environment
Kedro version used: 0.18.13
Python version used: 3.10.13
Operating system and version: macOS 13.0.1
The text was updated successfully, but these errors were encountered:
Description
I have a project where there is a huge number of pipelines generated programatically (in a loop). The process of generating those pipelines takes a lot of time and it seems to be quadratic (see the chart below).
n
- number of pipelines to sumtime
- time in secondsThe problem has 2 variants:
Context
While Kedro encourages to keep the nodes small and pipelines modular - extensive use of both of those features/approaches lead to slow project startup times.
The most severe impact of this issue is in mono-repo setups, where multiple teams work in the same project but on separate pipelines - in such setups the number of pipelines grows quickly as the development proceeds.
Steps to Reproduce
spaceflights
starter.data_processing
pipeline to:Show the code ⬇️
kedro registry list
Expected Result
Pipelines are listed quickly.
Actual Result
The pipelines are listed after a few minutes (depending on the number of pipelines/nodes), with the time increasing quadratically (see the chart above).
Possible causes
The main problem is that internally, the pipelines are summed
__add__
and then__init__
in thePipeline
class. The slowness of the operations inside of the__add__
itself is partially addressed by #3146 but the problem with the__init__
still remains - maybe the calls to_topologically_sorted
in the constructor are the root cause. It would require more detailed profiling.Your Environment
0.18.13
3.10.13
macOS 13.0.1
The text was updated successfully, but these errors were encountered: