Results considered dead if SSL fails during dead link check, even though they might not actually be dead #4371
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: api
Related to the Django API
🧱 stack: catalog
Related to the catalog and Airflow DAGs
💬 talk: discussion
Open for discussions and feedback
Description
Some results are considered "dead" even though they are actually available with cleartext (and in this case, followed by a redirect).
collection.mobiliernational.culture.gouv.fr indeed has an expired certificate, but if you visit the page in cleartext, it redirects to a URL with a valid certificate, https://collection.mobilier-national.fr/recherche
There are other examples of this (and the ones I saw all looked to be French government agency related, but that could be just a coincidence from a cluster of results in particular queries).
@WordPress/openverse-catalog I'm not sure whether this needs a fix in the API or if it's something that we should address during data refresh? It would be nice if the API could follow these redirects, maybe it's safe to retry requests with HTTP when HTTPS fails on an SSL error? What do y'all think @WordPress/openverse-api and catalogue folks?
Reproduction
The logs unfortunately do not show a specific URL. I'm adding the URL to this particular log line in #4333 but for now I don't know the exact works that are failing with this. We can pull it from Elasticsearch by querying on the image urls for this pattern.
The text was updated successfully, but these errors were encountered: