Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Allow overriding the date from the DagRun conf #880

Merged
merged 1 commit into from
Dec 7, 2022

Conversation

stacimc
Copy link
Contributor

@stacimc stacimc commented Nov 29, 2022

Fixes

Description

Adds an option to the DagRun conf to let you override the date of a run of a dated provider DAG.

When you manually run a dated DAG, it is always passed in today's date. This option allows you to pass in any arbitrary date you'd like. While I don't necessarily expect this to be used frequently in production, it has come in handy many times for local testing.

Testing Instructions

To trigger a DAG using a date override, select the Trigger DAG w/ config option seen here:

Screen Shot 2022-11-28 at 4 15 23 PM

Then pass your chosen date in using a YYYY-MM-DD format like this:
{"date": "2022-04-10"}

Pick a dated DAG, like Europeana:

  • Try running it normally by enabling the DAG locally and letting a scheduled run start. Verify that it correctly uses the scheduled date.
  • Try running it manually without a DagRun conf. Verify that it uses today's date.
  • Try triggering the DAG w/ a config and pass in a different date. Verify the custom date is used. You should see a message in the logs like Using date <date> from dagrun conf.

Notes

  • The DAG will error if nonsense data or an incorrectly formatted date is passed in.
  • You can set a date override on any provider DAG, even a non-dated one. In this case it simply won't do anything.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@stacimc stacimc added 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository labels Nov 29, 2022
@stacimc stacimc requested a review from a team as a code owner November 29, 2022 00:20
@stacimc stacimc self-assigned this Nov 29, 2022
@stacimc stacimc mentioned this pull request Dec 1, 2022
7 tasks
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, though, what is this date used for? It's not totally clear to me.

Is there a place where we can find these configuration parameters with their descriptions or use cases documented? Yesterday I was looking to limit the number of records pulled, and I wasn't sure if there is a param for it (there is an Airflow Variable but something for an individual DAG run? would be nice to add it if it's not already done).

@stacimc
Copy link
Contributor Author

stacimc commented Dec 2, 2022

The code looks good to me, though, what is this date used for? It's not totally clear to me.

This is the ingestion date that's used in dated DAGs. Typically, the logical date of the DagRun is passed in automatically. The goal of this PR is to allow a dev a way to override that to whatever date we want, because otherwise there isn't currently an easy way to run ingestion for an arbitrary day.

(A recent use case for me was testing with Finnish Museums. I wanted to reproduce a bug that only happens on certain days, and found it useful to be able to easily kick off ingestion for specific dates.)

Is there a place where we can find these configuration parameters with their descriptions or use cases documented?

There isn't, but there is an issue to add documentation! WordPress/openverse#1427 Couldn't agree more, it's easy to forget what the available options are.

Yesterday I was looking to limit the number of records pulled, and I wasn't sure if there is a param for it (there is an Airflow Variable but something for an individual DAG run? would be nice to add it if it's not already done).

The ingestion_limit Airflow variable is what you're thinking of, and you're right that it applies to all DAGs. Having that available as an option for individual DAG runs sounds like it could be very handy! I made WordPress/openverse#1329 to track it, thanks 😄

Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I can't test this locally at this time, but the implementation & test look 💯

@AetherUnbound
Copy link
Contributor

Finally able to test, works great!

@stacimc stacimc merged commit 6f92f40 into main Dec 7, 2022
@stacimc stacimc deleted the add/conf-option-to-override-date branch December 7, 2022 21:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants