improve media tag processing for sql loads

## Current Situation
The inaturalist provider script uses some non-trivial SQL formulae to build json for tags, and the next SQL-only ingestion will need to do a similar thing unless we come up with something DRYer like we have in python. https://github.com/WordPress/openverse-catalog/blob/337ea7aede228609cbd5031e3a501f22b6ccc482/openverse_catalog/dags/common/storage/media.py#L265

## Suggested Improvement
This is fine when there is just one SQL-oriented provider, but if/when we add more, the code will get unwieldy. Some imperfect ideas: make a separate table for tags (nice for structure and searching by tag, not nice for retrieving media rows central to existing operations), make a database function (nice for DRYness, not nice for switching costs when applied row by row for many many rows).  

Another option might be declaring types, for example

```create type openverse_tag as (name varchar(255), provider varchar(255));```

And then just converting an array of openverse_tags to json directly with row_to_json or something?

## Benefit
DRY more maintainable code, maybe more efficient?

## Implementation

- [ ] 🙋 I would be interested in implementing this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve media tag processing for sql loads #1330

rwidom
openedon Dec 1, 2022

Current Situation

Suggested Improvement

Benefit

Implementation

Assignees

Labels

Type

Projects

Milestone

Relationships

Development