-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve kedro run as a package #1423
Conversation
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
P.S. technically this is as breaking change since the way you call |
This is a nice summary, also reminds me I have some challenges to make #795. I am not sure if the discussion still holds or things can be done more easily now. I think this doc is also useful to lay out how people can run kedro pipeline.
Why is this happening?
It did look a bit weird, but I think it is because we want to execute
I am not super familiar with click and how
I do like the idea of a higher-level entry point, Misc: |
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
Just an idea that pops up, do we only expect |
@AntonyMilneQB In #1807 this would work for packaged kedro project. What's the benefit of calling the CLI instead of just using One thing that I can think of is when you have a custom from kedro.framework.session import KedroSession
from kedro.framework.project import configure_project
package_name = <your_package_name>
configure_project(package_name)
with KedroSession.create(package_name) as session:
session.run() |
@noklam sorry for the incredibly slow response - finally got around to this issue on my big list of things to respond to... 😅 There's a few problems with doing
|
Thanks @AntonyMilneQB, please see my comments. kedro/kedro/framework/cli/project.py Lines 344 to 376 in 3c1d6a7
I understand this need, but I also doubt if there are any people doing this for a real project, executing a module entirely with
I agree that I am not sure if Why do we need a context manager? Is that a good reason to keep them at all? Note that both
I think this is the only compelling reason to me for keeping backward compatibility. But as you said, it's not something we want to keep and this is also a very uncommon usage. Ideally, there should be only 1 CLI and 1 way of doing it in Python API, either python scripts or IPython. And I think this isn't hard to achieve as we tidy up some of the |
This might not be done much currently but it is a useful feature and something we should promote more (especially when it's fixed 😀).
Very fair points all round here I think. I'm not sure why we have this |
run = _find_run_command(package_name) | ||
run(*args, **kwargs) | ||
if kwargs: | ||
run(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is unfinished? as I can see in the other __main__.py
the result of session.run
is return but here it doesn't return the output
We discussed this in Technical Design and talked about the following:
Some steps to take to improve the situation and explain it better to users: |
Clarify this one a bit. I think these are 2 problems
|
This task isn't done, and is definitely something we want to work on. @AntonyMilneQB will never forget about this, and will make sure it gets discussed properly 😄 This PR will be closed because it has diverged a lot from the |
The actual change to code is very small, but a big improvement I think: you can now do
and it will Just Work, even in IPython/Juptyer. All the command line ways of invoking
main
work exactly as before, i.e.python -m spaceflights --pipeline=ds
etc. We now have consistency betweenmain
and CLI ways of launching kedro.Description
A user can run their kedro project in several ways. Note that the
run
command executed can be defined on the framework side or overridden in turn by a plugin or a project cli.py (done by_find_run_command
).kedro run
. This is the only route that doesn't go through the project__main__.py
; instead it goes throughkedro.framework.main
, which builds the CLI tree and does something like_find_run_command
python -m spaceflights
. This hits the project__main__.py
and will callmain
python src/spaceflights
. This is just a more unusual way of doing 2from spaceflights.__main__ import main; main()
, then run the script usingpython
session.run
(which is what we have advertised as the way to do a kedro run in the past)All the above must be run from within the project root or kedro won't be able to find your conf. Options 2 onwards needs you to have
pip install
ed your project or to havesrc
in yourPYTHONPATH
. Note that having the projectpip install
ed could mean first doingkedro package
and thenpip install
the resulting .whl file or it could mean justpip install ./src
from your project root; it doesn't make a difference.Current problems
--pipeline ds
. Withmain()
you domain(["--pipeline", "ds"])
, i.e. the CLI syntax shoehorned into a Python function call. When usingsession.run
you do a pure Python function call assession.run(pipeline_name="ds")
- but note the argument name is different!!main()
will actually execute...main()
doesn't return anything likesession.run
does, so you couldn't use any kedro outputs downstream anywaymain
doesn't work unless you callsession.close
first because you'll get some error about there already being an active session, same as %run_viz line magic requires session.close first kedro-viz#811This PR fixes 1, 2, 3, 4. We should have separate ticket(s) to fix 5 and also the following:
__main__.py
to somewhere framework side, because no one wants to see that in their project... Possible locations might bekedro.framework.cli.utils
orkedro.framework.project
from spaceflights.__main__ import main
, which doesn't really look like it should be donekedro
but insteadProject specific commands from cli
. I think this is a bug anyway and it didn't used to be like this...main
oversession.run
in IPython or even stop exposingsession
altogether. Mooted briefly in Technical design decision record forKedroSession
#1335 but needs more discussion.main
is a higher-level function thansession.run
and respects the hierarchy of framework < plugins < project cli.py. The arguments tomain
also match the usualkedro run
arguments, which those insession.run
do notDevelopment notes
There's basically two things that have changed:
return
to the kedro run command.session.run
, and hencemain
, returns the free outputs of the pipeline after the kedro run. This fixes problem 4main
calls therun
command. This fixes problems 1, 2, 3Adventures with click
I opened an issue on the click repo (which didn't go down well... 😅) describing the original source of problems 1, 2, 3. In short, when you run a click command it always exits afterwards with an Exception, even if execution was successful. This means it's impossible to run any code after that
main
in a Python script and also massively confuses ipython, which blows up.For the record, some of the things that don't quite fix this are:
run(standalone_mode=True)
like suggested on the click repo. This solves problem 2 and 3 but not 1run.callback
. This nearly gets you there but doesn't fill in the default arguments for you so you need to explicitly give all the arguments in the function call, which is horriblerun.main
is equivalent to what we were doing before (run.__call__
) so doesn't helprun.invoke
is really close to doing it, but unlikerun.forward
doesn't fills in default arguments and won't notice CLI arguments when called withpython -m spaceflights --pipeline=ds
Overall, the code here is I think the unique thing that does exactly what we want both on the CLI and in scripts/ipython 🎉
Checklist
RELEASE.md
file