-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flows with subflows incorrectly reporting state as Failed #9376
Comments
@paulinjo we need a MRE in order to investigate this |
@madkinsz This is happening in our production environment from previously well behaved flows beginning on roughly 04/26. Coming up with an MRE is going to be a big challenge considering the number of moving pieces, but we were told byu support to open a Github issue for further investigation. |
@paulinjo Did you change Prefect versions..? Can you share the actual full output of |
Agent: |
@paulinjo did you change Prefect versions when this started occurring? |
When this started occurring we were using |
I'm also experiencing this error after upgrading from prefect Here's a general approximation of what my flows/subflos look like, but I unfortunately can't get it to reproduce on my local machine. import time
from prefect import flow, task, get_run_logger
@task
def task0(inpt):
for inp in inpt:
get_run_logger().info(inp)
time.sleep(1)
return inpt
@task
def task1(inp):
get_run_logger().info(inp)
time.sleep(1)
return True
@flow
def subflow1(inpt):
future = task0.submit(inpt)
future.wait()
future = task2.submit(future.result())
return future.result()
@task
def task2(val):
get_run_logger().info(val)
return 0
@flow
def subflow2(val):
future = task2.submit(val)
future.wait()
return future.result()
@flow
def main():
get_run_logger().info("Hello World")
future = task0.submit(list(range(0,10)))
future.wait()
subflow1(future)
subflow2(0)
return future.result()
if __name__ == "__main__":
main() |
It's possible this is related to a Cloud bug where the flow run was placed in a late state after it started running. We've released a fix for that now. If you can share some affected flow run and workspace ids, we can check if that was the cause for you. |
Workspace:
|
@madkinsz My jobs are running on self-hosted infrastructure. Will fix also be replicated for non-cloud setups? |
@majikman111 we have not seen cases of the bug I described in the OSS but we are replicating a fix anyway yes. @paulinjo I've confirmed that you are not affected by the described bug. This seems like something else. It looks like the parent is failing because it tries to retrieve the result of an subflow run that has previously completed but the result is not persisted. |
@madkinsz Is the implication that there's something we need to update to fix this? The flows and underlying infrastructure have not been changed in months, excluding Prefect and other dependency changes and several other flows are running without issue. |
After some digging in, I found that this was related to a bad configuration on our side. Both our docker container entrypoint and our kubernetes job block were making a call to |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
Hi @paulinjo thank you for the update - is it safe to close this issue now? |
We are experimenting the same issue with this (old) version in production
We still have running 2.9.x and we found regularly this kind of stacktrace after clicking "Retry" from the UI. As a result, we cannot continue our flow anymore and are totally stucked in flow progression. Full trace from the logs in the UI:
We have tasks that are supposed to persist their Result using |
First check
Bug summary
A subset of our Flows which make use of subflows are incorrectly reporting their terminal state as Failed, even when all subflows and tasks are completed.
These flows are triggered via an separate flow using the orchestrator pattern, and this orchestrator flow behaves as though the terminal state is Completed.
Logs from the agent running on EKS show the state initially reported as Success before switching to Failed.
Reproduction
Error
Versions
Additional context
No response
The text was updated successfully, but these errors were encountered: