Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

My workflow is running and the task is SCHEDULED but it is not being processed #3136

Closed
lianjunwei opened this issue Jul 28, 2022 · 10 comments
Closed
Labels

Comments

@lianjunwei
Copy link

lianjunwei commented Jul 28, 2022

一.description.

  • Conductor version: v3.10.1
  • Persistence implementation: Postgress
    conductor.db.type=postgres
    spring.datasource.url=jdbc:postgresql://127.0.0.1:8002/conductor
    spring.datasource.username=test
    spring.datasource.password=test123
    spring.datasource.hikari.maximum-pool-size=10
    spring.datasource.hikari.minimum-idle=2
  • Lock: Redis
    Conductor server: application.properties is as follows:
    conductor.app.workflowExecutionLockEnabled=true
    conductor.workflow-execution-lock.type=noop_lock
    conductor.redis-lock.serverType=single
    conductor.redis-lock.serverAddress=redis://127.0.0.1:6379
    conductor.redis-lock.namespace=conductor
  1. I have deployed two conductor servers. They are 10.190.32.157:8080 and 10.190.32.158:8080 respectively.
  2. My springboot project is configured as follows:
    image

3.If I trigger the workflow in "10.190.32.157:8080", everything is normal.However, when the workflow is triggered in "10.190.32.158:8080", it is not executed.

4.The tasks of workflow are as follows:

image
image

5.it is not being processed

image

6.It has been ensured that the "last polling time" of this task is the current time.

image

二.expect
If I trigger the workflow in "10.190.32.158:8080", the task is executed.

三.Supplementary notes:
Screenshot from: https://conductor.netflix.com/faq.html
image

I don't understand the passage in the red circle. Is the bug related to this?Hope to get help.

@lianjunwei lianjunwei added the type: bug bugs/ bug fixes label Jul 28, 2022
@jxu-nflx
Copy link
Contributor

jxu-nflx commented Aug 1, 2022

Hello @lianjunwei how is your worker configed to poll tasks: https://github.com/Netflix/conductor/blob/main/client/src/main/java/com/netflix/conductor/client/automator/TaskRunnerConfigurer.java#L174? Seemed to me that it is just polling from that specific instance? You might want to check your server setup to make sure all instances are discoverable.

BTW, you are using noop lock
conductor.workflow-execution-lock.type=noop_lock.

@lianjunwei
Copy link
Author

lianjunwei commented Aug 4, 2022

Hello @lianjunwei how is your worker configed to poll tasks: https://github.com/Netflix/conductor/blob/main/client/src/main/java/com/netflix/conductor/client/automator/TaskRunnerConfigurer.java#L174? Seemed to me that it is just polling from that specific instance? You might want to check your server setup to make sure all instances are discoverable.

BTW, you are using noop lock conductor.workflow-execution-lock.type=noop_lock.

1.I have changed the distributed lock to redis.
2.The worker configuration :see the first screenshot above

worker code is as follows:
image

3.According to your guess, if my conductor server deploys 10 instances, it will take 10 rounds of training to get the task, which is obviously unreasonable.
I think the task queues among instances should be shared. How to share? I don't know how to configure it. Does conductor support it?

  1. client sdk:
    image

@v1r3n
Copy link
Contributor

v1r3n commented Aug 5, 2022

@lianjunwei happy to help. Maybe we can jump on a call and debug this further.

@hebrd
Copy link

hebrd commented Aug 7, 2022

seems the same issue as #3089

@jxu-nflx
Copy link
Contributor

jxu-nflx commented Aug 8, 2022

@lianjunwei I am looking for your setting for https://github.com/Netflix/conductor/blob/main/client/src/main/java/com/netflix/conductor/client/automator/TaskRunnerConfigurer.java#L174?

According to your guess, if my conductor server deploys 10 instances, it will take 10 rounds of training to get the task, which is obviously unreasonable.
I think the task queues among instances should be shared. How to share? I don't know how to configure it. Does conductor support it?

What queue solution are you using? If it's in-memory one, then it won't be shared among instances. Conductor simply just poll messages from your queue solutions.

@jxu-nflx jxu-nflx removed the type: bug bugs/ bug fixes label Aug 15, 2022
@lianjunwei
Copy link
Author

@jxu-nflx postgres

@lianjunwei
Copy link
Author

@lianjunwei I am looking for your setting for https://github.com/Netflix/conductor/blob/main/client/src/main/java/com/netflix/conductor/client/automator/TaskRunnerConfigurer.java#L174?

According to your guess, if my conductor server deploys 10 instances, it will take 10 rounds of training to get the task, which is obviously unreasonable.
I think the task queues among instances should be shared. How to share? I don't know how to configure it. Does conductor support it?

What queue solution are you using? If it's in-memory one, then it won't be shared among instances. Conductor simply just poll messages from your queue solutions.

Looks like it's postgres, but I'm not sure, how to set up my queue which is which. I am using postgres as persistent storage.

@aravindanr
Copy link
Collaborator

@lianjunwei based on the screenshot in your initial question, conductor.client.rootUri=10.190.32.157:8080. You have configured the client to poll from only one instance of your cluster. Please ensure that both instances point to the same datastore.

@github-actions
Copy link
Contributor

This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 14, 2022
@github-actions
Copy link
Contributor

This issue was closed, because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants