-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
apache-spark
Versions of Apache Airflow Providers
4.0.0
Apache Airflow version
2.5.1
Operating System
Debian GNU/Linux 10 (buster)
Deployment
Other
Deployment details
No response
What happened
in airflow-providers-apache-spark 4.0.0, the value of spark_binary
was hardcoded to be restricted to only either 'spark-submit' or 'spark2-submit'.
What was the reason for this? At the Wikimedia Foundation, we install the
spark 3 binary as 'spark3-submit'. This change in airflow spark 4.0.0 has broken
some of our dags, making us resort to things like this.
What you think should happen instead
We'd submit a patch to expand the restriction list to include 'spark3-submit', but we aren't sure why this was done in the first place. I understand the reasoning for removing spark_home, but it seems strange to have a spark_binary parameter and restrict it to these two values.
Can we undo this? If not, should we submit a patch to add spark3-submit to the list?
How to reproduce
Set spark_binary to 'spark3-submit'
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct