Skip to content

Conversation

@ottomata
Copy link
Contributor

The list of allowed values for spark-binary was restricted in #27646. Add spark3-submit to this list to allow for distributions of Spark 3 that install the binary this way.

related: #27646
related: #30065

The list of allowed values for spark-binary was restricted in
apache#27646.  Add spark3-submit to this list to allow for distributions
of Spark 3 that install the binary this way.

See also apache#30065.
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 13, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Mar 13, 2023

You need to fix the static checks. However, when I looked at it - maybe you might want to make a better and more future-proof fix ?

It should be possible to make the check only for extra and still allow the Operator's spark_binary to be anything. Currently we check it in __init__ of the hook - but if we check it where extra.get("spark-binary", "spark-submit") is called (extracting that to a method) - then we could allow any spark_binary to be passed as operator's parameter.

The security issue was only about the "extra" (because it could be changed via the UI when defining connection) - but there is no problem with passing any binary via Operator in the DAG code.

@ottomata
Copy link
Contributor Author

make the check only for extra

I like this idea, but might not have time to follow up to make a more extensive change for this atm. I'll ping one of my teammates, but I'm not sure if they'll have the time either.

Thanks!

@ottomata
Copy link
Contributor Author

I'll ping one of my teammates, but I'm not sure if they'll have the time either.

They are interested in making a more generic change, but won't have time in the immediate future to do so. If you are okay with it, it would be useful to merge this as is now, and do the generic change in a future PR.

@potiuk potiuk merged commit b325987 into apache:main Mar 15, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 15, 2023

Awesome work, congrats on your first merged pull request!

@potiuk
Copy link
Member

potiuk commented Mar 21, 2023

Added #30213 to allow passing spark binary via hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants