Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floraon URLs need fixing before enabling the provider #4817

Open
obulat opened this issue Aug 27, 2024 · 0 comments
Open

Floraon URLs need fixing before enabling the provider #4817

obulat opened this issue Aug 27, 2024 · 0 comments
Labels
🗄️ aspect: data Concerns the data in our catalog and/or databases 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@obulat
Copy link
Contributor

obulat commented Aug 27, 2024

Description

Floraon provider is disabled because many of its URLs are invalid, and the foreign_identifier is also sometimes invalid.

We should run a batched update query to fix the following:

  • remove the extra "%" in the foreign_identifier field: %flora-on.pt/Carduus-lusitanicus_ori_4OKo.jpg
  • replace "http://%" with "https://" in the url field http://%flora-on.pt/Carduus-lusitanicus_ori_4OKo.jpg
  • remove the extra "/index" in the foreign_landing_url: https://flora-on.pt/index.php?q=Carduus+lusitanicus
  • extract the filetype from the url

Query

SET updated_on = NOW(), foreign_identifier = TRIM(LEADING '%' FROM foreign_identifier), url = REPLACE(url, '[http://%](http://%25/)', 'https://'), filetype = CASE WHEN RIGHT(url, 4) = '.jpg' THEN 'jpg' ELSE null END

Additional context

This query should be fast since there are only 55,010 items from floraon.
After the catalog fix is deployed and the data refresh runs, we should enable this provider in the API admin.

@obulat obulat added 🟩 priority: low Low priority and doesn't need to be rushed 🛠 goal: fix Bug fix 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🗄️ aspect: data Concerns the data in our catalog and/or databases labels Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄️ aspect: data Concerns the data in our catalog and/or databases 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant