Skip to content

Conversation

@yupbank
Copy link
Contributor

@yupbank yupbank commented May 31, 2023

currently only python file support gcs download, however, requirement.txt might also benefit from that


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented May 31, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to add a test for your change

Comment on lines 297 to 305
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a py_file which doesn't start with gs:// with a requirement file stored in gcs in the same job? if yes (and I think it's yes), your code will try to download the two files from gcs.

Instead, you can create a method to download an object from gcs, and use it for py_file and/or requirements file:

def _get_file_from_gcs(self, object_url):
    gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id)
    return exit_stack.enter_context(gcs_hook.provide_file(object_url=object_url))

and in this method:

if self.py_file.lower().startswith("gs://"):
    self.py_file = self._get_file_from_gcs(self.py_file).name
if snake_case_pipeline_options.get('requirements_file', '').startswith("gs://"):
    snake_case_pipeline_options['requirements_file'] = self._get_file_from_gcs(snake_case_pipeline_options['requirements_file']).name

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, i'll do that instead

@yupbank yupbank requested a review from hussein-awala June 1, 2023 14:58
@hussein-awala hussein-awala added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Jun 18, 2023
@hussein-awala hussein-awala changed the title Update beam python run operator to download requirements if needed Allow downloading requirements file from GCS in BeamRunPythonPipelineOperator Jun 18, 2023
@hussein-awala hussein-awala added the type:new-feature Changelog: New Features label Jun 18, 2023
@eladkal eladkal removed the type:new-feature Changelog: New Features label Jul 29, 2023
@hussein-awala
Copy link
Member

@yupbank could you check the failed tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:apache-beam use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants