-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kedro tracebacks are overwhelming #2401
Comments
Let me first explain what's happening above because it's not obvious unless you really look carefully:
Is this confusing? Yes, definitely. We should try to improve it so that it's more obvious to the user that the way to fix the problem is to There's a couple of possible things we might want to do here and it's worth breaking down more... 1.
|
As mentioned during backlog grooming (and without having fully read the above):
Couple quick thoughts:
|
We discussed this at Technical Design today. Some notes: Ideally, by default the user should see a traceback only in two situations:
Otherwise, a friendlier error message should be shown. Some users might want to actually see the full traceback or will be quick to spot the actual error, but we want to err on the side of favoring beginners here. Ideally, these cases would look as follows:
There are essentially two possible workflows:
We focused on some We settled on concealing the traceback by default behind a flag, which could be As an alternative, we considered having full logs are always written somewhere to disk, and only shown with a flag. But we quickly found that this could lead to problems in read-only filesystems, and does not necessarily make the situation better (for example, the user might not be able to easily extract artifacts from CI). Above all, "the last thing you want to do is write logs and crash the process/make the situation worse". We already have some machinery for this: the Possible implementation:
with KedroSession.create(
env=env, conf_source=conf_source, extra_params=params
) as session:
try:
session.run(
tags=tag,
runner=runner(is_async=is_async),
node_names=node_names,
from_nodes=from_nodes,
to_nodes=to_nodes,
from_inputs=from_inputs,
to_outputs=to_outputs,
load_versions=load_version,
pipeline_name=pipeline,
namespace=namespace,
)
except ManagedKedroError as e:
# Handle error appropriately, conceal traceback
...
# else:
# raise as before
Notice that in some places we are already making suggestions, but it's debatable whether the full traceback should be shown (this is not CLI usage, but Python usage):
|
Note the kedro/kedro/framework/cli/utils.py Lines 234 to 242 in 6531457
@astrojuanlu do you have any examples of packages that have their own |
I don't have any examples in mind, we could do some research. Unless I'm missing something I don't think they need to be too sophisticated, just a subclass of |
A user complained this week to me in person that when a node fails it's very difficult to see the actual error, because the traceback includes too much information from the Kedro internals. |
Hi everyone, If I may jump in with my own clumsy formulation: The traceback's "signal-to-noise" ratio is indeed something that could be improved. The relevant information is almost never immediately visible "at very bottom" and almost always concealed somewhere "in the middle". As @astrojuanlu previously said: Hoping this helps :-) |
Happy 1 year anniversary to this issue 🍰 I was prescient:
Probably we should tackle #3446 and #3591 before this one, so that people can do |
Related #3651 |
Agree with the sentiments mostly. I have an attempt to reduce the noise coming from Kedro during a kedro run.
Like @deepyaman said, I would be careful about hiding traceback particular for long runs. I like the idea having less crowd terminal and more verbose in logging, but I suspect no one is going to look at the log files during development. I think #3446 is ready to go and there is not much drawback. It won't work for 100% use case, but I think it will improve the developer experience a lot. #3591 can be broken down to two separate tickets (edited: I just did it)
In my mind, once |
Been participating and following the conversation there and I'm not sure that I saw a clear action plan, could you clarify?
I would very much welcome some historical context on why Kedro decided to depart from the 12 Factor App guidance on logging but at this point and especially after seeing @noklam's demo I am of the opinion that production logging should be handled by production tools and that we should plan a smooth transition towards phasing out our |
Probably not a super valuable comment but just saw this issue pop up in slack. The way Kedro will swallow actual errors and return its own errors has always been a big pet peeve of mine. Having to scroll up about 2/3 of the way up the trace back to find the true cause of the error, when the bottom 2/3 are totally irrelevant isn't great UX so big support for any development which might improve this 😃 |
Also, related: #3794 |
Do you mean the custom Kedro error class or things like auto pipeline
discovery that generate a warning message instead of an error?
The former one is created to provide better explicit message, if we are
actually making it worse it's bad. Can you give an example of it?
…On Tue, 9 Apr 2024, 23:48 Juan Luis Cano Rodríguez, < ***@***.***> wrote:
Also, related: #3794 <#3794>
—
Reply to this email directly, view it on GitHub
<#2401 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AELAWL4FTSKRRGFBDPEHLFDY4RV4XAVCNFSM6AAAAAAVTTVWGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGE3DCOBYGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Since rich/textual support some css / Web like feature, would it be
possible to support something that collapse kedro error by default?
There are concern that we tried too hard to hide important message, and a
rerun is too expensive just to see the error again.
…On Thu, 11 Apr 2024, 19:34 Nok Chan, ***@***.***> wrote:
Do you mean the custom Kedro error class or things like auto pipeline
discovery that generate a warning message instead of an error?
The former one is created to provide better explicit message, if we are
actually making it worse it's bad. Can you give an example of it?
On Tue, 9 Apr 2024, 23:48 Juan Luis Cano Rodríguez, <
***@***.***> wrote:
> Also, related: #3794 <#3794>
>
> —
> Reply to this email directly, view it on GitHub
> <#2401 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AELAWL4FTSKRRGFBDPEHLFDY4RV4XAVCNFSM6AAAAAAVTTVWGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGE3DCOBYGQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
From just this morning. This might be personal preference: the traceback is 244 lines long, but only the first 70 lines which show the error and the last two lines which identify the dataset are actually relevant to me, and the other 170 (70%) are kedro stuff which in my experience has never been helpful to me. Traceback
|
In the specific case of #3794, if there aren't any pipelines, I'd expect the error message to be exactly 1 line long and say "there are no pipelines defined, you can create one with |
Btw @noklam can you explain the rich css stuff? Without knowing specifics, I'm not sure how useful that will be outside of interactive/local development, as when running in our production environment (argo workflows) we only get access to whatever is logged to stdout, I don't know if this sort of interactivity would work outside of a terminal environment. |
@inigohidalgo We had some discussion about logging internally recently. There are different opinion but I think it's clear that rich is not suitable for production logging in many cases. So anything related to rich here I am referring to development flow in my head. For production logging, you would at least using something like a json that are easy to parse and can integrate with observability tool. In most cast you won't be reading the log inside the box but rather through some other UI/alert We need to be precise when we discuss the verbosity, because there are few things mentioned here. I don't think we have a good idea how to solve all of them together, so maybe it's easier to tackle the obvious case first. Do you have some examples in mind? |
For me the only major example I have is the one I put here #2401 (comment) which I think is totally separate from some other things being discussed above
|
I suggest splitting this issue into some actionable tasks. There are mixed discussion about logs and tracebacks, some are very specific. cc @astrojuanlu
|
Description
Kedro shows nice, colored tracebacks (which is great), but sometimes we don't need to show the full traceback to the user, and in fact it can be quite overwhelming. I propose we try to control the amount of tracebacks that bubble up to the user, and reserve those for user code or internal breakage.
Context
Today I did something silly: tried a
kedro run
right after akedro new --starter=pandas-iris
, without first installing the dependencies. This is what I got:For the user to understand what happened here they are forced to scroll to the top of the error message, which most likely will not be shown (unless the user has a really tall screen). This is an actual screenshot of how my screen was looking right after running the command:
The actual problem here was that a dependency was missing, and therefore no pipelines were left. But the user is presented with a huge traceback (colored and nice, but still huge) and a lengthy warning, when the actual solution is to do
pip install -r src/requirements.txt
.I argue that tracebacks should be shown to the user when something inside Kedro breaks, or when user code fails for whatever reason. But "pipeline contains no nodes" is hardly a broken condition: it can come up when certain tags or filters are applied. This is just the example that prompted me to open this issue, but there are more.
Possible Implementation
For managed exceptions, maybe a one line red warning would be more than enough:
(And the "maybe you meant" suggestion would be the icing in the cake)
Possible Alternatives
Alternatively, the full traceback should only be shown after a
--verbose/-v
or--debug
flag is passed. At the momentkedro run
does not have any such flags.The text was updated successfully, but these errors were encountered: