Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise pipeline addition and creation #3730

Merged
merged 12 commits into from
Mar 26, 2024

Conversation

idanov
Copy link
Member

@idanov idanov commented Mar 21, 2024

Description

Creating large pipelines in Kedro is very slow and can take tens of seconds as reported in #3167

After some investigation, it turned out that number of factors contributed to that:

  • Usage of self.nodes within the Pipeline class triggers many unneeded copies (original PRs: Don't toposort nodes in non-user-facing operations #3146)
  • Eager toposorting upon creation of a Pipeline object
  • Re-tagging nodes on Pipeline object creation, caused by each node being copied and revalidated

This PR addresses the first two completely, while it addresses the third one partially only when no new tags are added.
The first one was addressed by #3146, but it's still not merged yet.

Development notes

For testing, @marrrcin 's test from #3167 was used and https://pyinstrument.readthedocs.io/en/latest/ profiling was run before and after.

After the changes from this PR and #3728, we've reduced the time it takes to sum 51 pipelines from ~15s down to ~6s, which is about 60% reduction in time. All of that was tested on Python 3.8 with the graphlib backport, it's possible that the built-in graphlib is much faster than the backport and might yield better results.

Further improvements could be done by removing unnecessary set() and list() operations, doing a lightweight check for cycles without the need of instantiating graphlib.TopologicalSorter upon init and potentially making Node and Pipeline use attrs. The latter will help ensuring that they remain immutable, as apparently a previous contribution snuck-in mutability to the Node class which is against the idea of stateless nodes:

@func.setter
def func(self, func: Callable) -> None:
"""Sets the underlying function of the node.
Useful if user wants to decorate the function in a node's Hook implementation.
Args:
func: The new function for node's execution.
"""
self._func = func

Before:

╰─❯ pyinstrument --show '*/kedro/pipeline/*' -m kedro registry list
Sum of 1 pipelines took: 0.000s
Sum of 11 pipelines took: 0.685s
Sum of 21 pipelines took: 2.514s
Sum of 31 pipelines took: 5.570s
Sum of 41 pipelines took: 10.029s
Sum of 51 pipelines took: 15.612s
- __default__
- data_processing
- data_science


  _     ._   __/__   _ _  _  _ _/_   Recorded: 18:16:21  Samples:  58702
 /_//_/// /_\ / //_// / //_'/ //     Duration: 72.331    CPU time: 67.571
/   _/                      v4.6.2

Program: pyinstrument --show */kedro/pipeline/* -m kedro registry list

72.323 <module>  kedro/__main__.py:1
├─ 69.608 main  kedro/framework/cli/cli.py:225
│     [38 frames hidden]  kedro, click, importlib_metadata, imp...
│        63.056 _ProjectPipelines._load_data  kedro/framework/project/__init__.py:176
│        └─ 63.054 register_pipelines  kedro_spaceflights/pipeline_registry.py:8
│           └─ 62.366 find_pipelines  kedro/framework/project/__init__.py:322
│                 [3 frames hidden]  kedro, importlib
│                    58.453 _create_pipeline  kedro/framework/project/__init__.py:299
│                    └─ 58.447 create_pipeline  kedro_spaceflights/pipelines/data_processing/pipeline.py:7
│                       ├─ 56.910 Pipeline.__add__  kedro/pipeline/pipeline.py:181
│                       │  ├─ 55.767 Pipeline.__init__  kedro/pipeline/pipeline.py:80
│                       │  │  ├─ 28.988 _topologically_sorted  kedro/pipeline/pipeline.py:887
│                       │  │  │  └─ 28.988 <listcomp>  kedro/pipeline/pipeline.py:912
│                       │  │  │     ├─ 22.526 Node.__lt__  kedro/pipeline/node.py:184
│                       │  │  │     │  ├─ 19.511 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │     │  │  ├─ 7.283 hashable  kedro/pipeline/node.py:167
│                       │  │  │     │  │  │  ├─ 4.472 [self]  kedro/pipeline/node.py
│                       │  │  │     │  │  │  └─ 2.811 isinstance  <built-in>
│                       │  │  │     │  │  ├─ 6.322 Node.name  kedro/pipeline/node.py:264
│                       │  │  │     │  │  │  ├─ 4.475 [self]  kedro/pipeline/node.py
│                       │  │  │     │  │  │  └─ 1.847 Node.namespace  kedro/pipeline/node.py:289
│                       │  │  │     │  │  └─ 5.906 [self]  kedro/pipeline/node.py
│                       │  │  │     │  └─ 2.698 [self]  kedro/pipeline/node.py
│                       │  │  │     └─ 5.780 toposort  toposort.py:47
│                       │  │  │           [4 frames hidden]  toposort
│                       │  │  │              2.199 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │              └─ 1.839 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │              2.094 <dictcomp>  toposort.py:61
│                       │  │  │              ├─ 1.109 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │              │  └─ 0.954 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │              1.186 <dictcomp>  toposort.py:79
│                       │  │  │              └─ 1.093 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │                 └─ 0.900 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  ├─ 15.735 <listcomp>  kedro/pipeline/pipeline.py:148
│                       │  │  │  └─ 15.581 Node.tag  kedro/pipeline/node.py:251
│                       │  │  │     └─ 14.616 Node._copy  kedro/pipeline/node.py:145
│                       │  │  │        └─ 14.009 Node.__init__  kedro/pipeline/node.py:22
│                       │  │  │           ├─ 9.112 Node._validate_inputs  kedro/pipeline/node.py:501
│                       │  │  │           │  ├─ 4.069 signature  inspect.py:3103
│                       │  │  │           │  │     [7 frames hidden]  inspect
│                       │  │  │           │  └─ 3.861 Signature.bind  inspect.py:3032
│                       │  │  │           │        [3 frames hidden]  inspect
│                       │  │  │           ├─ 1.532 Node._validate_unique_outputs  kedro/pipeline/node.py:521
│                       │  │  │           │  └─ 0.812 Counter.__init__  collections/__init__.py:540
│                       │  │  │           ├─ 1.294 [self]  kedro/pipeline/node.py
│                       │  │  │           └─ 0.901 Node._validate_inputs_dif_than_outputs  kedro/pipeline/node.py:530
│                       │  │  ├─ 3.754 Pipeline.node_dependencies  kedro/pipeline/pipeline.py:325
│                       │  │  │  ├─ 2.148 <dictcomp>  kedro/pipeline/pipeline.py:334
│                       │  │  │  │  └─ 2.015 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │  │     └─ 1.836 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │  │        └─ 1.217 [self]  kedro/pipeline/node.py
│                       │  │  │  └─ 0.936 [self]  kedro/pipeline/pipeline.py
│                       │  │  ├─ 1.259 _validate_transcoded_inputs_outputs  kedro/pipeline/pipeline.py:861
│                       │  │  ├─ 1.076 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │  └─ 0.911 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  ├─ 0.883 _strip_transcoding  kedro/pipeline/pipeline.py:46
│                       │  │  ├─ 0.879 _validate_unique_outputs  kedro/pipeline/pipeline.py:839
│                       │  │  │  └─ 0.854 Counter.__init__  collections/__init__.py:540
│                       │  │  │        [2 frames hidden]  collections
│                       │  │  └─ 0.803 <listcomp>  kedro/pipeline/pipeline.py:142
│                       │  │     └─ 0.775 [self]  kedro/pipeline/pipeline.py
│                       │  └─ 1.025 Node.__hash__  kedro/pipeline/node.py:189
│                       │     └─ 0.852 Node._unique_key  kedro/pipeline/node.py:165
│                       └─ 1.285 pipeline  kedro/pipeline/modular_pipeline.py:167
│                          └─ 0.991 Pipeline.__init__  kedro/pipeline/pipeline.py:80
│                    3.895 import_module  importlib/__init__.py:109
│                    └─ 3.864 <module>  kedro_spaceflights/pipelines/data_science/__init__.py:1
│                       └─ 3.859 <module>  kedro_spaceflights/pipelines/data_science/pipeline.py:1
│                          └─ 3.857 <module>  kedro_spaceflights/pipelines/data_science/nodes.py:1
│                             └─ 3.229 <module>  sklearn/__init__.py:1
│                                   [13 frames hidden]  sklearn, scipy, importlib
└─ 2.709 <module>  kedro/framework/cli/__init__.py:1
      [4 frames hidden]  kedro

After:

╰─❯ pyinstrument --show '*/kedro/pipeline/*' -m kedro registry list
Sum of 1 pipelines took: 0.000s
Sum of 11 pipelines took: 0.276s
Sum of 21 pipelines took: 1.099s
Sum of 31 pipelines took: 2.391s
Sum of 41 pipelines took: 4.181s
Sum of 51 pipelines took: 6.448s
- __default__
- data_processing
- data_science


  _     ._   __/__   _ _  _  _ _/_   Recorded: 18:02:59  Samples:  25956
 /_//_/// /_\ / //_// / //_'/ //     Duration: 36.311    CPU time: 33.158
/   _/                      v4.6.2

Program: pyinstrument --show */kedro/pipeline/* -m kedro registry list

36.305 <module>  kedro/__main__.py:1
├─ 34.037 main  kedro/framework/cli/cli.py:225
│     [53 frames hidden]  kedro, click, importlib_metadata, imp...
│        27.958 _ProjectPipelines._load_data  kedro/framework/project/__init__.py:176
│        └─ 27.953 register_pipelines  kedro_spaceflights/pipeline_registry.py:8
│           └─ 27.607 find_pipelines  kedro/framework/project/__init__.py:322
│                 [3 frames hidden]  kedro, importlib
│                    24.602 _create_pipeline  kedro/framework/project/__init__.py:299
│                    └─ 24.598 create_pipeline  kedro_spaceflights/pipelines/data_processing/pipeline.py:7
│                       ├─ 23.641 Pipeline.__add__  kedro/pipeline/pipeline.py:192
│                       │  ├─ 22.479 Pipeline.__init__  kedro/pipeline/pipeline.py:78
│                       │  │  ├─ 7.657 TopologicalSorter.prepare  graphlib/graphlib.py:84
│                       │  │  │     [3 frames hidden]  graphlib
│                       │  │  │        7.588 TopologicalSorter._find_cycle  graphlib/graphlib.py:196
│                       │  │  │        ├─ 6.419 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │        │  ├─ 5.352 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │        │  │  ├─ 1.937 hashable  kedro/pipeline/node.py:167
│                       │  │  │        │  │  │  ├─ 1.187 [self]  kedro/pipeline/node.py
│                       │  │  │        │  │  │  └─ 0.750 isinstance  <built-in>
│                       │  │  │        │  │  ├─ 1.810 Node.name  kedro/pipeline/node.py:264
│                       │  │  │        │  │  │  ├─ 1.262 [self]  kedro/pipeline/node.py
│                       │  │  │        │  │  │  └─ 0.548 Node.namespace  kedro/pipeline/node.py:289
│                       │  │  │        │  │  └─ 1.605 [self]  kedro/pipeline/node.py
│                       │  │  │        │  ├─ 0.685 [self]  kedro/pipeline/node.py
│                       │  │  │        │  └─ 0.382 hash  <built-in>
│                       │  │  ├─ 4.555 TopologicalSorter.__init__  graphlib/graphlib.py:41
│                       │  │  │     [4 frames hidden]  graphlib
│                       │  │  │        3.326 TopologicalSorter._get_nodeinfo  graphlib/graphlib.py:51
│                       │  │  │        └─ 2.899 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │           └─ 2.515 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │              ├─ 1.310 [self]  kedro/pipeline/node.py
│                       │  │  │              ├─ 0.609 Node.name  kedro/pipeline/node.py:264
│                       │  │  │              │  └─ 0.442 [self]  kedro/pipeline/node.py
│                       │  │  │              └─ 0.596 hashable  kedro/pipeline/node.py:167
│                       │  │  ├─ 3.309 Pipeline.node_dependencies  kedro/pipeline/pipeline.py:336
│                       │  │  │  ├─ 1.755 <dictcomp>  kedro/pipeline/pipeline.py:345
│                       │  │  │  │  └─ 1.633 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │  │     └─ 1.453 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  │  │        └─ 0.827 [self]  kedro/pipeline/node.py
│                       │  │  │  ├─ 0.895 [self]  kedro/pipeline/pipeline.py
│                       │  │  │  └─ 0.411 _strip_transcoding  kedro/pipeline/pipeline.py:44
│                       │  │  ├─ 1.282 _validate_transcoded_inputs_outputs  kedro/pipeline/pipeline.py:882
│                       │  │  │  └─ 0.427 _strip_transcoding  kedro/pipeline/pipeline.py:44
│                       │  │  ├─ 1.090 Node.__hash__  kedro/pipeline/node.py:189
│                       │  │  │  └─ 0.885 Node._unique_key  kedro/pipeline/node.py:165
│                       │  │  ├─ 0.900 _strip_transcoding  kedro/pipeline/pipeline.py:44
│                       │  │  │  └─ 0.650 _transcode_split  kedro/pipeline/pipeline.py:21
│                       │  │  │     └─ 0.405 [self]  kedro/pipeline/pipeline.py
│                       │  │  ├─ 0.872 _validate_unique_outputs  kedro/pipeline/pipeline.py:860
│                       │  │  │  └─ 0.844 Counter.__init__  collections/__init__.py:540
│                       │  │  │        [2 frames hidden]  collections
│                       │  │  │           0.844 Counter.update  collections/__init__.py:608
│                       │  │  │           └─ 0.404 _strip_transcoding  kedro/pipeline/pipeline.py:44
│                       │  │  ├─ 0.628 _validate_duplicate_nodes  kedro/pipeline/pipeline.py:825
│                       │  │  │  └─ 0.509 _check_node  kedro/pipeline/pipeline.py:829
│                       │  │  ├─ 0.534 [self]  kedro/pipeline/pipeline.py
│                       │  │  ├─ 0.429 <dictcomp>  kedro/pipeline/pipeline.py:151
│                       │  │  └─ 0.423 <listcomp>  kedro/pipeline/pipeline.py:140
│                       │  │     └─ 0.393 [self]  kedro/pipeline/pipeline.py
│                       │  └─ 1.083 Node.__hash__  kedro/pipeline/node.py:189
│                       │     └─ 0.910 Node._unique_key  kedro/pipeline/node.py:165
│                       └─ 0.857 pipeline  kedro/pipeline/modular_pipeline.py:167
│                          └─ 0.553 Pipeline.__init__  kedro/pipeline/pipeline.py:78
│                    2.978 import_module  importlib/__init__.py:109
│                    └─ 2.969 <module>  kedro_spaceflights/pipelines/data_science/__init__.py:1
│                       └─ 2.968 <module>  kedro_spaceflights/pipelines/data_science/pipeline.py:1
│                          └─ 2.966 <module>  kedro_spaceflights/pipelines/data_science/nodes.py:1
│                             ├─ 2.391 <module>  sklearn/__init__.py:1
│                             │     [16 frames hidden]  sklearn, scipy
│                             └─ 0.571 <module>  sklearn/linear_model/__init__.py:1
└─ 2.262 <module>  kedro/framework/cli/__init__.py:1
      [11 frames hidden]  kedro, dynaconf

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
@idanov idanov self-assigned this Mar 21, 2024
@idanov idanov requested a review from merelcht as a code owner March 21, 2024 18:29
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
@marrrcin
Copy link
Contributor

Awesome job, huge improvement 👏🏻

As for:

potentially making Node and Pipeline use attrs. The latter will help ensuring that they remain immutable, as apparently a previous contribution snuck-in mutability to the Node class which is against the idea of stateless nodes

That change would be really unfortunate, because the flow of having a hook that changes the node.func at runtime is a common pattern I've seen (and also used / recommended) multiple times.

Examples:

@idanov
Copy link
Member Author

idanov commented Mar 22, 2024

This is a side conversation, not related to the PR, but responding to:

That change would be really unfortunate, because the flow of having a hook that changes the node.func at runtime is a common pattern I've seen (and also used / recommended) multiple times.

The immutability change will be a breaking change unfortunately, so unlikely to happen soon. Nevertheless we can make them attrs objects even without making them fully immutable and with no breaking changes.

The introduction of mutability was already a mistake we should've avoided in a first place. Immutable objects is one of the best ways to ensure that you can pass around a node without copying and make the code safe and bug free. There are different patterns we can apply in order to address your use cases without needing mutability.

The current pattern is quite unsafe, e.g. a plugin can attach a completely different function, as there are no validations applied. Moreover, it is the only mutable method there, e.g. if you apply new tags, you get a new copy of the node and you don't modify the current node. It's completely out of place from the current functioning and idea of the nodes and what makes a node node, and not just a function.

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

During some investigation I have done for #3575, I found that Pipeline is created multiple times so avoiding heavy operation during Pipeline creation time should definitely helps.

i.e. kedro run first create a Pipeline, and Pipeline.filter will create yet another Pipeline, the factory pipeline method also create mulitple Pipeline object in between, those are the things that I would look into next. (or if we make the cost of Pipeline creation neglectable)

The discussion of attrs and immutability should goes to a separate issue.

Base automatically changed from feat/use-graphlib-toposort to main March 26, 2024 09:37
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this improvement @idanov ⭐ Great to see how relatively few changes already improve the performance so much 👍

@merelcht merelcht merged commit 177b93a into main Mar 26, 2024
41 checks passed
@merelcht merelcht deleted the feat/improve-pipeline-sum-performance branch March 26, 2024 11:17
AhdraMeraliQB pushed a commit that referenced this pull request Apr 17, 2024
* Create toposort groups only when needed
* Ensure that the suggest resume test has no node ordering requirement
* Ensure stable toposorting by grouping and ungrouping the result
* Delay toposorting until pipeline.nodes is used
* Avoid using .nodes when topological order or new copy is unneeded
* Copy the nodes only if tags are provided
* Remove unnecessary condition in self.nodes

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
AhdraMeraliQB added a commit that referenced this pull request Apr 17, 2024
)

* Update kedro-catalog-0.19.json (#3724)

* Update kedro-catalog-0.19.json

Signed-off-by: Anthony De Bortoli <anthony.debortoli@protonmail.com>

* Update set_up_vscode.md

Signed-off-by: Anthony De Bortoli <anthony.debortoli@protonmail.com>

---------

Signed-off-by: Anthony De Bortoli <anthony.debortoli@protonmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update project tests directory structure in docs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add docs on writing tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Drop dependency on toposort in favour of built-in graphlib (#3728)

* Replace toposort with graphlib (built-in from Python 3.9)

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Create toposort groups only when needed

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Update RELEASE.md and graphlib version constraints

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Remove mypy-toposort

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Ensure that the suggest resume test has no node ordering requirement

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Ensure stable toposorting by grouping and ungrouping the result

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

---------

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Optimise pipeline addition and creation (#3730)

* Create toposort groups only when needed
* Ensure that the suggest resume test has no node ordering requirement
* Ensure stable toposorting by grouping and ungrouping the result
* Delay toposorting until pipeline.nodes is used
* Avoid using .nodes when topological order or new copy is unneeded
* Copy the nodes only if tags are provided
* Remove unnecessary condition in self.nodes

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Expand robots.txt for Kedro-Viz and Kedro-Datasets docs (#3729)

* Add project to robots.txt

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

* Add EOF

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

---------

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Kedro need more uv (#3740)

* Kedro need more uv

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

* remove docker

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Resolve all path in Kedro (#3742)

* Kedro need more uv

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

* remove docker

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

* fix broken type hint and resolve project path

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix type hint

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* remove duplicate logic

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* adding nok.py is definitely an accident

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix test

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* remove print

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* add test

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove settings of rate limits and retries (#3769)

* double linkcheck limits

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix ratelimit

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Improve resume suggestions (#3719)

* Improve suggestions to resume a failed pipeline

- if dataset (or param) is persistent & shared, don't keep looking for ancestors
- only look for ancestors producing impersistent inputs
- minimize number of suggested nodes (= shorter message for the same pipeline)
- testable logic, tests cases outside of scenarios for sequential runner

- Use _EPHEMERAL attribute
- Move tests to separate file
- Docstring updates

---------

Signed-off-by: Ondrej Zacha <ondrej.zacha@okra.ai>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Build docs fix (#3773)

* Ignored forbidden url

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned linkscheck retries

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed odd comment

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Clarify docs around custom resolvers (#3759)

* Updated custom resolver docs section

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated advanced configuration section for consistency

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated RELEASE.md

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated RELEASE.md

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Test linkcheck_workers decrease

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Increased the By default, the linkcheck_rate_limit_timeout to default

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned old docs build settings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Ignore forbidden url

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned linkcheck retries

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add mlruns to gitignore to avoid pushing  mlflow local runs to github (#3765)

* Add mlruns to gitignore to avoid pushing  mlflow local runs to github

Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>

* update release.md

Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>

---------

Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>
Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update the dependencies page in the docs (#3772)

* Update the dependencies page

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update docs/source/kedro_project_setup/dependencies.md

Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>

* Fix lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Move the last line to notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change pipeline test location to project root/tests (#3731)

* Change pipeline test location to project root/tests

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Fix some test_pipeline tests

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Change delete pipeline to account for new structure

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Fix some tests

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Change tests path on micropkg

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Fix remaining tests

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Add changes to release notes

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Update file structure on micropackaging doc page

Signed-off-by: lrcouto <laurarccouto@gmail.com>

---------

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add an option for kedro new to skip telemetry (#3701)

* First draft for telemetry consent flag on kedro new

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Add functioning --telemetry option to kedro new

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Update tests to acknowledge new flag

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Add tests for kedro new --telemetry flag

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Add changes to documentation and release notes

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Minor change to docs

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Lint

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Remove outdated comment and correct type hint

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>

* Lint

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Minor change on release note

Signed-off-by: lrcouto <laurarccouto@gmail.com>

---------

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update documentation for OmegaConfigLoader (#3778)

* Update documentation for OmegaConfigLoader

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>

* Update RELEASE.md

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>

* Update RELEASE.md

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>

* Update ignore-names.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix path

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add changes to RELEASE.md

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Address comments from code review

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Empty

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove unneeded imports

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change recommendation from pytest config to editable install

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add negative testing example

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Replace Dict with dict

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove test classes

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change the assert step for the integration test

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix error handling for OmegaConfigLoader (#3784)

* Update omegaconf_config.py

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>

* Update RELEASE.md

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>

* add a more complicated test case

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Co-authored-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add Simon Brugman to TSC (#3780)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update technical_steering_committee.md (#3796)

Signed-off-by: Marcin Zabłocki <m.zablo@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove jmespath dependency (#3797)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update spaceflights tutorial and starter requirements for kedro-datasets optional dependencies (#3664)

* Update spaceflights tutorial and starter requirements

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* fix e2e tests

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Fix e2e tests by distinguishing `kedro-datasets` dependency for different python versions (#3802)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

* Update docs/source/tutorial/tutorial_template.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>

---------

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Consider Vale's suggestions

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Hide test in details

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Quick fix

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Typo (and wording changes)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update robots.txt (#3803)

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add a test for transcoding loops of 1 or more nodes (#3810)

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Ensure no nodes can depend on themselves even when transcoding is used (#3812)

* Factor out transcoding helpers into a private module

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Ensure node input/output validation doesn't allow transcoded self-loops

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>

* Updated release note to avoid github warning

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Co-authored-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update UUID telemetry docs (#3805)

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change path to starters test (#3816)

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Move changes in RELEASE.md to docs section

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change formatting

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Change formatting"

This reverts commit 9582a22.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Apply changes from code review

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add explanation on why cleanup isn't needed

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change assert on successful pipeline to check logs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update description of integration test under pipeline slicing

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Missing formatting

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update tests directory structure

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

---------

Signed-off-by: Anthony De Bortoli <anthony.debortoli@protonmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ivan Danov <idanov@users.noreply.github.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ondrej Zacha <ondrej.zacha@okra.ai>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>
Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Marcin Zabłocki <m.zablo@gmail.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Co-authored-by: Anthony De Bortoli <anthony.debortoli@protonmail.com>
Co-authored-by: Ivan Danov <idanov@users.noreply.github.com>
Co-authored-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Co-authored-by: Ondrej Zacha <ondrej.zacha@okra.ai>
Co-authored-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>
Co-authored-by: Yolan Honoré-Rougé <29451317+Galileo-Galilei@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Co-authored-by: Puneet Saini <99470400+puneeter@users.noreply.github.com>
Co-authored-by: Marcin Zabłocki <m.zablo@gmail.com>
Co-authored-by: Elena Khaustova <ymax70rus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants