-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Prevent duplicate edge workers unless existing worker is offline or unkown #58586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent duplicate edge workers unless existing worker is offline or unkown #58586
Conversation
25ac066 to
4e72433
Compare
…nknown Add validation to the edge worker registration endpoint to prevent launching multiple workers with the same hostname. If a worker with the same name already exists in an active state (running, idle, starting, terminating, or maintenance), the registration will fail with HTTP 409 CONFLICT. Workers can only reuse a name if the existing worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.
4e72433 to
8cf122f
Compare
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is a very good point that is good to be fixed to prevent deployment errors to mix-up hostnames.
One nit only:
When I start a second worker, the pure HTTP error is in the logs like:
root@bbd99465f97f:/opt/airflow# airflow edge worker --pid another
2025-11-23T09:31:39.223750Z [info ] Starting worker with API endpoint http://localhost:8080/edge_worker/v1/rpcapi [airflow.providers.edge3.cli.edge_command] loc=edge_command.py:80
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
____ __ _ __ __
/ __/__/ /__ ____ | | /| / /__ ____/ /_____ ____
/ _// _ / _ `/ -_) | |/ |/ / _ \/ __/ '_/ -_) __/
/___/\_,_/\_, /\__/ |__/|__/\___/_/ /_/\_\\__/_/
/___/
409 Client Error: Conflict for url: http://localhost:8080/edge_worker/v1/worker/bbd99465f97f
Can you explicitly handle the exception on the client and generate a better log message in the console? e.g. like we have for version conflicts in providers/edge3/src/airflow/providers/edge3/cli/api_client.py:133
|
Just realized: This PR will now lower a bit comfort though, when using breeze and then |
Does this look good to you? I made this update |
Thats true, looks like the state does not get cleared. You would need to wait a few minutes for the api server to determine worker hearbeat is missing and change to unknown state, then you will be able to launch. OR just launch with I don't think this is a big deal though. |
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works! Thanks!
…nkown (apache#58586) * Prevent duplicate edge workers unless existing worker is offline or unknown Add validation to the edge worker registration endpoint to prevent launching multiple workers with the same hostname. If a worker with the same name already exists in an active state (running, idle, starting, terminating, or maintenance), the registration will fail with HTTP 409 CONFLICT. Workers can only reuse a name if the existing worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state. * Jens Suggestions
…nkown (apache#58586) * Prevent duplicate edge workers unless existing worker is offline or unknown Add validation to the edge worker registration endpoint to prevent launching multiple workers with the same hostname. If a worker with the same name already exists in an active state (running, idle, starting, terminating, or maintenance), the registration will fail with HTTP 409 CONFLICT. Workers can only reuse a name if the existing worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state. * Jens Suggestions
Add validation to the edge worker registration endpoint to prevent
launching multiple workers with the same hostname. If a worker with
the same name already exists in an active state (running, idle,
starting, terminating, or maintenance), the registration will fail
with HTTP 409 CONFLICT. Workers can only reuse a name if the existing
worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.