Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Jamendo inconsistent result count #3500

Open
stacimc opened this issue Dec 8, 2023 · 0 comments
Open

Investigate Jamendo inconsistent result count #3500

stacimc opened this issue Dec 8, 2023 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Contributor

stacimc commented Dec 8, 2023

Description

The Jamendo DAG is non-dated, meaning each time it runs it should be ingesting all available data for the provider. Consequently, one would expect the number of records ingested to increase slightly on each run. This does not appear to be the case, or at least not consistently. Here are the record counts returned by the most recent production runs, from March to December 2023:

  • 40,472
  • 1,537
  • 57
  • 47,218
  • 307,418
  • 476,537
  • 517,977
  • 2,261 -- this one can be ignored as the DAG halted early due to an error
  • 603,425
  • 604,291
  • 77,206

The three bolded entries, including the most recent DagRun (for November 2023), have an unexplained dramatic decrease in the number of records ingested.

We should investigate these dates closer to see if we can get a better understanding of what is happening.

Reproduction

Observe the record counts from recent production runs.

@stacimc stacimc added 🟩 priority: low Low priority and doesn't need to be rushed 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant