Skip to content

Add DagRun conf option to limit number of records ingested for a single run #1329

Open

Description

Problem

As @krysal pointed out in this comment, presently the only way to restrict the number of records ingested by provider scripts is the ingestion_limit Airflow variable. This variable will limit the records pulled for all provider DAGs. It's feasible a developer may only want to limit records for a particular DagRun, without affecting other DAGs.

Description

Add an option to the DagRun conf to allow users to set ingestion_limit at the DagRun level. This should take priority over a global ingestion_limit if set (ie if both are set, the conf option should be considered the true limit).

Alternatives

We could also make it possible to set ingestion_limit per DAG at the Airflow variable level, but I think needing this level of granularity is unlikely, and it would make the variable very unwieldy.

Additional context

We should keep the current ingestion_limit Airflow variable as well, as limiting ingestion globally for local testing purposes is still a common use case.

Implementation

  • 🙋 I would be interested in implementing this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    ✨ goal: improvementImprovement to an existing user-facing feature💻 aspect: codeConcerns the software code in the repository🔧 tech: airflowInvolves Apache Airflow🟩 priority: lowLow priority and doesn't need to be rushed🧱 stack: catalogRelated to the catalog and Airflow DAGs

    Type

    No type

    Projects

    • Status

      📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions