Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1169] Library code should be framework decision agnostic #149

Closed
tsanikgr opened this issue Oct 29, 2019 · 17 comments
Closed

[KED-1169] Library code should be framework decision agnostic #149

tsanikgr opened this issue Oct 29, 2019 · 17 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@tsanikgr
Copy link
Contributor

Description

This line in library code makes assumptions about the user code (i.e. that the directory src exists). This used to be in kedro_cli.py where all "frameworky" decisions are made, but now it is hidden (and undocumented), and the user is no longer able to change it.

Context

Working on a project where src does not exist - I would like to modify the python path myself (or at least have control over what gets added or not)

I understand that this is probably there to make this work both in jupyter/ipython and when running cli commands - however assumptions about user code should not be in library code.

@tsanikgr tsanikgr added the Issue: Bug Report 🐞 Bug that needs to be fixed label Oct 29, 2019
@921kiyo
Copy link
Contributor

921kiyo commented Oct 29, 2019

Am I correct that you have .kedro.yml file outside of src, or you don't want load_context to modify the Python path? It is true that load_context is based on the assumption that .kedro.yml file is under project_path/src/ (we used to rely on the location of kedro_cli.py before introducing .kedro.yml).

If you do not want load_context to modify the Python path but still want to instantiate KedroContext, you could directly instantiate the ProjectContext class.

@tolomea
Copy link
Contributor

tolomea commented Oct 29, 2019

You are technically correct, however that line is only used for updating the python path, the user can set the python path themselves and put the source wherever they want.

@tsanikgr
Copy link
Contributor Author

Indeed - any particular reason we do not want to surface this in kedro_cli.py? It's kind of hidden in load_context (and there is no mention of modifying the path in the docstring)

I am currently working around this like you mentioned but it is a bit ugly. In my case:
I am insert(0, path) in kedro_cli, but because load_context is called afterwards, my addition is pushed 2nd in the list, and I am getting the wrong object imported

project
  |
  ----- src/file.py/MyObject
  |----file.py/MyObject

@yetudada yetudada changed the title Library code should be framework decision agnostic [KED-1169] Library code should be framework decision agnostic Oct 31, 2019
@yetudada
Copy link
Contributor

@tsanikgr Thanks for raising this! We're going to put it on our next sprint to discuss this and get back to you with a proposed action.

@WaylonWalker
Copy link
Contributor

For what it's worth, our project template does not include a src directory.

@Pet3ris
Copy link

Pet3ris commented Nov 13, 2019

More very valid frustration from users on unintended side effects: #161.

@idanov
Copy link
Member

idanov commented Nov 27, 2019

however assumptions about user code should not be in library code

It is true that there should be no assumptions about the user code in the library, but load_context is framework code and not library one. The library classes and code is meant to be as flexible as possible, so people can use it the way they see fit. The framework code on the other hand is meant to impose structure (that's what frameworks are for) and make the behaviour consistent between different entrypoints.

We always need to make assumptions when we want to give the user a pre-built structure to work with, and maybe provide some level of extensibility and configuration. At the moment load_context is doing all the assumptions (not just the src one, but also the .kedro.yml one), where the extensibility is provided by extending the KedroContext class in your project.

We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.

@Pet3ris
Copy link

Pet3ris commented Nov 27, 2019

however assumptions about user code should not be in library code

It is true that there should be no assumptions about the user code in the library, but load_context is framework code and not library one. The library classes and code is meant to be as flexible as possible, so people can use it the way they see fit. The framework code on the other hand is meant to impose structure (that's what frameworks are for) and make the behaviour consistent between different entrypoints.

We always need to make assumptions when we want to give the user a pre-built structure to work with, and maybe provide some level of extensibility and configuration. At the moment load_context is doing all the assumptions (not just the src one, but also the .kedro.yml one), where the extensibility is provided by extending the KedroContext class in your project.

We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.

  • This response would also suggest that library code is intermingled with framework code. Context, as a necessary component, feels much more analogous to catalog and other library components than the framework contained in template and CLI.
  • I don’t think the argument of “whatever ends up being polluted becomes framework code and hence the pollution is justified” flies here.
  • Before the introduction of the context, users would be able to pip install kedro and use it without the template and CLI with sensible defaults. Right now this puts them strictly in framework territory with unexpected side-effects.
  • Last but not least, with active user complaints about this, this is not an abstract discussion, clearly the existing approach just doesn’t cut it for users and this could be more urgent than 2 quarters away
  • People don’t want to create their own framework, they just want to use some working subset of Kedro without unexpected side-effects, without additional work on their part.

@idanov
Copy link
Member

idanov commented Nov 27, 2019

whatever ends up being polluted becomes framework code and hence the pollution is justified

This argument is a very unfair and incorrect interpretation of my comment. Framework code by definition is the code creating the structure of your application (hence the word framework). There is no intermingling of library code with framework code, no library components depend on the existence of a framework. You can find more information about the current architecture of Kedro here and how different components interact with each other.

People can still just pip install kedro and decide to not use the template and the CLI, I fail to see how the existance of KedroContext prevents that. We have provided the framework code by user demand too, most of our users don't want to deal with deciding how to structure their applications and are mostly busy with dealing with the business logic of their pipelines.

For more advanced users there's always the option of dropping the framework and creating their own structure (or framework), be it through custom project templates, custom context-like structures or whatever they see fit.

Redesigning the framework part of Kedro is needed due to our users identifying many additional usecases where they can use the framework to simplify their work. However we are still exploring what that restructuring should look like in order to be more extensible and able to fit the most common usecases without any unnecessary configuration. We will provide more details once we have the issue well documented.

@Pet3ris
Copy link

Pet3ris commented Nov 27, 2019

  • I think the architecture diagram is super helpful, simple and nicely consistent with the code.
  • Let me try and articulate my point around leakage of framework code. The interesting thing here is that an user assumption was broken and even @tsanikgr intuitively felt that the Context is part of the library (analogous to the old tensor flow for example, the new tensorflow recognizes that sessions/contexts are unnecessary). That should ring alarm bells - suggesting perhaps that you may be able to figure out how to have it as part of the library.
  • The goal should be that the users can draw the diagram for you :).
  • I do think users don’t want to create their own framework, they want to be able to use a simple subset of Kedro as a library or use the full framework with clarity on what is at each level.
  • I think this issue can be discussed in isolation from broader changes happening to the framework about which I have no clue but I’m sure are well thought out

@philippegr
Copy link

We're a planning to redesign the framework part Kedro soon, since we have identified a few ways to improve it, but this won't happen at least until the first quarter of 2020. Meanwhile, if people need to create their own framework on top of the library code of Kedro, the easiest way to do that is to create their own context creating function and ditch the one we have provided.

First, thank you your work on Kedro!
We (with @bengpotter) have ran into the issue described here when trying to leverage Kedro on an existing codebase which did not follow the src convention but rather the packagename for source directory. We are appending to sys.path as a workaround to still leverage provided code for Context, but we were wondering if you have any update on the coming changes in framework?

@921kiyo
Copy link
Contributor

921kiyo commented Jan 23, 2020

@philippegr Thank you for raising this! As @idanov mentioned above, our architecture change in framework code is in our plan in the first quarter of this year, mainly improving the usability of KedroContext (and load_context etc). Hopefully we will be able to release it at the next major releases in near future :)

@WaylonWalker
Copy link
Contributor

@philippegr My teams projects currently all run from a modified form of the kedro new template and do not use src or touch sys.path. We have a custom object that stores all of our kedro objects in one place along with run and several other methods (having everything wrapped in one object is quite handy).

In essence this is how we are loading context without touching the path.

import os
from pathlib import Path
from kedro.context import load_context

project_path = Path(__file__).parents[1].resolve() # needs to be the directory with your .kedro.yml file and conf
_working_dir = os.getcwd() # save current working directory before kedro moves it
context = load_context(project_path)
os.chdir(_working_dir) # move back to the current working directory

@WaylonWalker
Copy link
Contributor

@921kiyo I am Excited to see the upcoming architecture changes this quarter! Thanks for all of the hard work.

@921kiyo
Copy link
Contributor

921kiyo commented Apr 21, 2020

We have made the source directory configurable with source_dir key in .kedor.yml in 8de9248 so you can customise it like

source_dir: .

Or

source_dir: src/nested/

if src doesn't suit your use case.

It will be released in the next release, so I'm closing this issue, but do let us know if you have any questions/comments. We are still continuing on the framework redesign, so more refactoring is coming soon :)

@921kiyo 921kiyo closed this as completed Apr 21, 2020
@philippegr
Copy link

Thank you very much. Looking forward to this next release!

@noklam
Copy link
Contributor

noklam commented Sep 8, 2022

See the more up-to-date solution here: #1836 (reply in thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

9 participants