Skip to content

Fix tag encoding #1927

Open
Open

Description

Problem

Tags encoding/decoding appears to not work correctly. Tags with diacritics are not rendered in an expected way.

See https://api.openverse.engineering/v1/images/017ee061-3413-4c44-a610-077adf0635dd/ as an example where remédios is rendered remu00e9dios in the tags.

Description

I don't know if this is an issue during ingestion or with decoding the output during serialization. Either way, it seems inevitably easiest to fix on the API side as fixing an ingestion bug would require backporting the fix. It does not strike me as an expensive operation to do API side. Perhaps there is an Elasticsearch index option that would fix this as well, but it probably depends on what the root cause of this is.

If these tags are stored incorrectly coded in Elasticsearch then queries against them would not work. Openverse does not currently support querying in languages other than English, which has very few diacritics ("naïve" and other double vowel with umlauts being the only examples I can think of regularly other than loan phrases from French like "vis-à-vis" or "à la" etc). That means that we already expect queries against the vast majority of these terms not to work, but if it is indeed the case that these terms are present in ES incorrectly coded, then we'll need to make a note of that for future work that enables querying in other languages.

The outcome of this issue should be (a) a temporary fix to make the returned data from the API correctly formatted and (b) a new issue (if needed) to fix whatever root cause, if that root cause would cause issues with querying tags with diacritics.

Additional context

I marked this as medium priority because while it's a grave issue, Openverse explicitly does not fully support queries in other languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions