-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
No response
Apache Airflow version
latest
Operating System
any
Deployment
Other
Deployment details
No response
What happened
First of all, apologies if this is not the right section to post a GH issue. I looked for provider specific feature requests but couldnt find such section.
We use the aws provider at my company to interact from airflow with AWS services. We are using poetry for building the testing environment to test our dags.
However the build times are quite long, and the reason is building pandas, which is a dependency of the amazon provider.
By checking the provider's code, it seems pandas is used in a small minority of functions inside the provider:
./aws/transfers/hive_to_dynamodb.py:93: data = hive.get_pandas_df(self.sql, schema=self.schema)
and
./aws/transfers/sql_to_s3.py:159: data_df = sql_hook.get_pandas_df(sql=self.query, parameters=self.parameters)
Forcing every AWS Airflow user that do not use hive or want to turn sql into an s3 file to install pandas is a bit cumbersome.
What you think should happen instead
given how heavy the package is and how little is used in the amazon provider, pandas should be an optional dependency.
How to reproduce
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct