Skip to content

Conversation

@o-nikolas
Copy link
Contributor

The lambda executor uses a single shared SQS queue as a results backend (of sorts) to receive results from the Lambda invocations. Previously if Lambda Executor A read an event which contained a task started by Lambda Executor B it would delete the message as unrecognized. Now properly formatted messages are returned to the queue for the correct executor to pull (ill formatted messages are still deleted). UAT testing has show that this simple solution actually stabilizes quite quickly especially since executors with no running tasks do not query the queue. If we see any further scaling issues in the future we can revisit this with a more complex solution.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

The lambda executor uses a single shared SQS queue as a results backend
(of sorts) to receive results from the Lambda invocations. Previously if
Lambda Executor A read an event which contained a task started by Lambda
Executor B it would delete the message as unrecognized.
Now properly formatted messages are returned to the queue for the correct
executor to pull (ill formatted messages are still deleted). UAT testing
has show that this simple solution actually stabilizes quite quickly
especially since executors with no running tasks do not query the queue.
If we see any further scaling issues in the future we can revisit this
with a more complex solution.
@o-nikolas o-nikolas requested a review from eladkal as a code owner July 15, 2025 23:36
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jul 15, 2025
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see any further scaling issues in the future we can revisit this with a more complex solution.

Since this is experimental feature I suggest to share this known/possible limitation at the end of the lambda executor doc - we should encourage users to be aware of the limitations and report back to us their findings so it would be easier to decide when/what needs to be changed to announce the executor as stable.
Ideally, this is something we should have for any experimental feature

@o-nikolas
Copy link
Contributor Author

If we see any further scaling issues in the future we can revisit this with a more complex solution.

Since this is experimental feature I suggest to share this known/possible limitation at the end of the lambda executor doc - we should encourage users to be aware of the limitations and report back to us their findings so it would be easier to decide when/what needs to be changed to announce the executor as stable. Ideally, this is something we should have for any experimental feature

I think I'd like to push back on this a little bit. After this fix I am not sure there really is a scaling issue present still. So I'm 1) not sure exactly what I would document and 2) whatever I put on the documentation will absolutely be taken as truth and gospel and quoted until the end of time. I really would like to not start a "rumor" of sorts. If we start getting some users on board and we get tickets complaining of a persistent issue I will then either fix it or for sure document that in the future, once we know something concretely.

@eladkal eladkal merged commit 6473fac into apache:main Jul 16, 2025
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants