Description
openedon Nov 29, 2023
Problem
We have generated some CSVs with identifier
and another column that we need to use to update the Catalog media table, but we don't have a way to efficiently run the media table updates.
Description
The batched update DAG is reusable DAG which can be used to perform an arbitrary batched update on a Catalog media table, while handling deadlocking and timeout concerns.
During the cleanup process in data refresh, we generate the CSVs that contain the item identifier
and the cleaned up version of another column (title
, url
, foreign_landing_url
, creator_url
and ). We need a DAG that is similar to the batched update DAG, but can use a CSV table for selecting the items that need to be updated.tags
It is important that this work does not delete any tags. The tag column, while present in the CSVs, should not be used.
Additional context
The CSV files are saved in the docker container of the ingestion server when we run data refresh.
Metadata
Assignees
Labels
Type
Projects
Status
✅ Done