Description
openedon Dec 1, 2022
Current Situation
The inaturalist provider script uses some non-trivial SQL formulae to build json for tags, and the next SQL-only ingestion will need to do a similar thing unless we come up with something DRYer like we have in python. https://github.com/WordPress/openverse-catalog/blob/337ea7aede228609cbd5031e3a501f22b6ccc482/openverse_catalog/dags/common/storage/media.py#L265
Suggested Improvement
This is fine when there is just one SQL-oriented provider, but if/when we add more, the code will get unwieldy. Some imperfect ideas: make a separate table for tags (nice for structure and searching by tag, not nice for retrieving media rows central to existing operations), make a database function (nice for DRYness, not nice for switching costs when applied row by row for many many rows).
Another option might be declaring types, for example
create type openverse_tag as (name varchar(255), provider varchar(255));
And then just converting an array of openverse_tags to json directly with row_to_json or something?
Benefit
DRY more maintainable code, maybe more efficient?
Implementation
- 🙋 I would be interested in implementing this feature.
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog