Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API displays fewer (if any at all) results on the source collection page for Europeana than it does when filtering by source=europeana #3712

Closed
AetherUnbound opened this issue Jan 26, 2024 · 6 comments
Labels
💻 aspect: code Concerns the software code in the repository 🗄️ aspect: data Concerns the data in our catalog and/or databases 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API

Comments

@AetherUnbound
Copy link
Contributor

Description

Originally reported by @obulat

The API returns significantly fewer results for the source collection view than it does for the search view filtered by source for Europeana. At the time of writing this, the following links showcase this discrepancy:

On staging this contrast is even more stark:

I suspect this has to do with dead link filtering. My guess is that the filtering which is done on search (which continues searching over wider and wider windows to fill up a page of results) is not happening for the collections view. Thus if all the queried results for a collection fail the dead link step, than an empty (or smaller) response is returned. This is just supposition and I haven't looked into the code for it yet.

Additional context

May be related to #3480 and #2293

@AetherUnbound AetherUnbound added 💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API labels Jan 26, 2024
@AetherUnbound
Copy link
Contributor Author

Update: Okay so I was able to track down a single request I made to https://api.openverse.engineering/v1/images/source/europeana/ and sure enough there are pages and pages and pages of deleted results for Europeana due to dead link filtering 😅 I don't know that this is a problem we can solve right now, but certainly the dead link pipeline project (#3585) will help!

Screenshot as an example:
image

What's not shown there are hundreds, if not thousands, of dead link requests as it tries to fill the results 🙃

@AetherUnbound AetherUnbound added the 🗄️ aspect: data Concerns the data in our catalog and/or databases label Jan 26, 2024
@obulat
Copy link
Contributor

obulat commented Jan 27, 2024

Great find!

The difference between the collection views and search views is mainly in the way the query is created. Search execution is the same, so collection views do filter out dead links.

I suspect the reason for different responses is sorting: search views sort results by relevance, while collection views sort them by the time of ingestion into Openverse. Which might mean that we've ingested a lot of dead links recently, worth investigating 🤔

@Hobbesball
Copy link

Hi there! I'm Jolan, I work as an API outreach staff member at Europeana. I see you're working on the connection between OpenVerse and Europeana, I just wanted to pop in and say that if there's anything we can do to help on the Europeana side, please let us know! You can always get in contact with us using our api@europeana.eu email address.

@obulat
Copy link
Contributor

obulat commented Feb 6, 2024

Hi @Hobbesball, thank you for reaching out to us! Nice to meet you, and I'm sure we'll be in contact with more questions :)

This particular issue seems to have been partly resolved, you can see the Europeana images on Openverse staging site: https://staging.openverse.org/image/source/europeana (this feature is not launched yet, so you can only see the staging version for now)

@obulat
Copy link
Contributor

obulat commented Feb 6, 2024

@AetherUnbound, do you think we can close this issue now?

One thing I realized is that the deep pagination limits we set for search might need to be relaxed for the additional search views. Otherwise we should adjust the results label to say something like "More than 10000 results found, showing the newest 100" (for the 5 pages we show)

@AetherUnbound
Copy link
Contributor Author

Thank you @Hobbesball! I think we can, we can re-open if the issue comes back up again 🙂

@obulat would you mind making a separate issue for the behavior you're describing?

@AetherUnbound AetherUnbound closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🗄️ aspect: data Concerns the data in our catalog and/or databases 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

No branches or pull requests

3 participants