Description
openedon Dec 2, 2022
Problem
As @krysal pointed out in this comment, presently the only way to restrict the number of records ingested by provider scripts is the ingestion_limit
Airflow variable. This variable will limit the records pulled for all provider DAGs. It's feasible a developer may only want to limit records for a particular DagRun, without affecting other DAGs.
Description
Add an option to the DagRun conf to allow users to set ingestion_limit at the DagRun level. This should take priority over a global ingestion_limit
if set (ie if both are set, the conf option should be considered the true limit).
Alternatives
We could also make it possible to set ingestion_limit per DAG at the Airflow variable level, but I think needing this level of granularity is unlikely, and it would make the variable very unwieldy.
Additional context
We should keep the current ingestion_limit
Airflow variable as well, as limiting ingestion globally for local testing purposes is still a common use case.
Implementation
- 🙋 I would be interested in implementing this feature.
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog