Skip to content

Conversation

@dheerajturaga
Copy link
Member

Add validation to the edge worker registration endpoint to prevent
launching multiple workers with the same hostname. If a worker with
the same name already exists in an active state (running, idle,
starting, terminating, or maintenance), the registration will fail
with HTTP 409 CONFLICT. Workers can only reuse a name if the existing
worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.

@boring-cyborg boring-cyborg bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Nov 22, 2025
@dheerajturaga dheerajturaga force-pushed the bugfix/edge3-prevent-duplicates branch 2 times, most recently from 25ac066 to 4e72433 Compare November 23, 2025 02:30
…nknown

  Add validation to the edge worker registration endpoint to prevent
  launching multiple workers with the same hostname. If a worker with
  the same name already exists in an active state (running, idle,
  starting, terminating, or maintenance), the registration will fail
  with HTTP 409 CONFLICT. Workers can only reuse a name if the existing
  worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.
@dheerajturaga dheerajturaga force-pushed the bugfix/edge3-prevent-duplicates branch from 4e72433 to 8cf122f Compare November 23, 2025 03:15
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a very good point that is good to be fixed to prevent deployment errors to mix-up hostnames.

One nit only:
When I start a second worker, the pure HTTP error is in the logs like:

root@bbd99465f97f:/opt/airflow# airflow edge worker --pid another
2025-11-23T09:31:39.223750Z [info     ] Starting worker with API endpoint http://localhost:8080/edge_worker/v1/rpcapi [airflow.providers.edge3.cli.edge_command] loc=edge_command.py:80
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   ____   __           _      __         __
  / __/__/ /__ ____   | | /| / /__  ____/ /_____ ____
 / _// _  / _ `/ -_)  | |/ |/ / _ \/ __/  '_/ -_) __/
/___/\_,_/\_, /\__/   |__/|__/\___/_/ /_/\_\\__/_/
         /___/

409 Client Error: Conflict for url: http://localhost:8080/edge_worker/v1/worker/bbd99465f97f

Can you explicitly handle the exception on the client and generate a better log message in the console? e.g. like we have for version conflicts in providers/edge3/src/airflow/providers/edge3/cli/api_client.py:133

@jscheffl
Copy link
Contributor

Just realized: This PR will now lower a bit comfort though, when using breeze and then stop_airflow the docker is killed and if you then re-start the previous worker state is not cleaned and restarting directly with breeze will raise the HTTP 409 then :-) But still OK in my view.

@dheerajturaga
Copy link
Member Author

Can you explicitly handle the exception on the client and generate a better log message in the console? e.g. like we have for version conflicts in providers/edge3/src/airflow/providers/edge3/cli/api_client.py:133

Does this look good to you? I made this update

2025-11-23T18:30:05.778730Z [info     ] Starting worker with API endpoint http://localhost:8080/edge_worker/v1/rpcapi [airflow.providers.edge3.cli.edge_command] loc=edge_command.py:80
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   ____   __           _      __         __
  / __/__/ /__ ____   | | /| / /__  ____/ /_____ ____
 / _// _  / _ `/ -_)  | |/ |/ / _ \/ __/  '_/ -_) __/
/___/\_,_/\_, /\__/   |__/|__/\___/_/ /_/\_\\__/_/
         /___/

2025-11-23T18:30:05.924089Z [error    ] A worker with the name 'breeze' is already active. Please ensure worker names are unique, or stop the existing worker before starting a new one. [airflow.providers.edge3.cli.worker] loc=worker.py:270
A worker with the name 'breeze' is already active. Please ensure worker names are unique, or stop the existing worker before starting a new one.

@dheerajturaga
Copy link
Member Author

dheerajturaga commented Nov 23, 2025

Just realized: This PR will now lower a bit comfort though, when using breeze and then stop_airflow the docker is killed and if you then re-start the previous worker state is not cleaned and restarting directly with breeze will raise the HTTP 409 then :-) But still OK in my view.

Thats true, looks like the state does not get cleared. You would need to wait a few minutes for the api server to determine worker hearbeat is missing and change to unknown state, then you will be able to launch. OR just launch with -d (which I generally always use when testing).

I don't think this is a big deal though.

2025-11-23T18:42:55.005572Z [error    ] A worker with the name 'breeze' is already active. Please ensure worker names are unique, or stop the existing worker before starting a new one. [airflow.providers.edge3.cli.worker] loc=worker.py:270
A worker with the name 'breeze' is already active. Please ensure worker names are unique, or stop the existing worker before starting a new one.
root@99fa52cb933e:/opt/airflow# airflow edge worker --edge-hostname breeze --queues default
2025-11-23T18:45:21.402431Z [info     ] Starting worker with API endpoint http://localhost:8080/edge_worker/v1/rpcapi [airflow.providers.edge3.cli.edge_command] loc=edge_command.py:80
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   ____   __           _      __         __
  / __/__/ /__ ____   | | /| / /__  ____/ /_____ ____
 / _// _  / _ `/ -_)  | |/ |/ / _ \/ __/  '_/ -_) __/
/___/\_,_/\_, /\__/   |__/|__/\___/_/ /_/\_\\__/_/
         /___/

2025-11-23T18:45:21.593074Z [info     ] An existing PID file has been found: /root/airflow/airflow-edge-worker.pid. [airflow.providers.edge3.cli.signalling] loc=signalling.py:53
2025-11-23T18:45:21.593307Z [warning  ] PID file is orphaned. Cleaning up. [airflow.providers.edge3.cli.signalling] loc=signalling.py:65
2025-11-23T18:45:21.618799Z [info     ] No new job to process          [airflow.providers.edge3.cli.worker] loc=worker.py:336

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works! Thanks!

@jscheffl jscheffl merged commit 1c2e738 into apache:main Nov 23, 2025
79 checks passed
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
…nkown (apache#58586)

* Prevent duplicate edge workers unless existing worker is offline or unknown

  Add validation to the edge worker registration endpoint to prevent
  launching multiple workers with the same hostname. If a worker with
  the same name already exists in an active state (running, idle,
  starting, terminating, or maintenance), the registration will fail
  with HTTP 409 CONFLICT. Workers can only reuse a name if the existing
  worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.

* Jens Suggestions
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
…nkown (apache#58586)

* Prevent duplicate edge workers unless existing worker is offline or unknown

  Add validation to the edge worker registration endpoint to prevent
  launching multiple workers with the same hostname. If a worker with
  the same name already exists in an active state (running, idle,
  starting, terminating, or maintenance), the registration will fail
  with HTTP 409 CONFLICT. Workers can only reuse a name if the existing
  worker is in OFFLINE, UNKNOWN, or OFFLINE_MAINTENANCE state.

* Jens Suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants