Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use size and compression as metrics (original #575) #752

Open
obulat opened this issue Apr 21, 2021 · 1 comment
Open

Use size and compression as metrics (original #575) #752

obulat opened this issue Apr 21, 2021 · 1 comment
Labels
πŸ•Ή aspect: interface Concerns end-users' experience with the software 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API

Comments

@obulat
Copy link
Contributor

obulat commented Apr 21, 2021

This issue has been migrated from the CC Search API repository

Author: aldenstpage
Date: Wed Jul 29 2020
Labels: ✨ goal: improvement,🏷 status: label work required,πŸ™… status: discontinued

Problem

Images with low resolution or high compression sometimes show up in the first page of results, even with popularity boosting.

This issue blocks on consuming outbound data from the web crawler.

Description

We should heavily weigh down results with low resolution and high compression. Both of these metrics can be distilled into a single "quality_penalty" value (high compression OR low resolution will result in higher quality penalties). The thinking here is that small resolution or high compression are strong indicators that an image is not worth showing, but high resolution and low compressibility do not necessarily correlate with relevance.

@sarayourfriend sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature πŸ•Ή aspect: interface Concerns end-users' experience with the software labels Dec 16, 2022
@sarayourfriend
Copy link
Contributor

sarayourfriend commented Dec 16, 2022

This issue might also be a catalogue issue, depending on the process we settle on for general document score calculation. Not pinging the catalogue folks yet because I've pinged them on a few different issues in the last few minutes already for their advice, and we're already discussing something along the lines of what is suggested here as part of the 2023 planning discussion (#343). The correct home and implementation approach for this issue should get settled naturally via that discussion.

@obulat obulat transferred this issue from WordPress/openverse-api Feb 22, 2023
@obulat obulat added 🧱 stack: api Related to the Django API and removed 🧱 stack: backend labels Mar 20, 2023
dhruvkb pushed a commit that referenced this issue Apr 14, 2023
Bumps [tldextract](https://github.com/john-kurkowski/tldextract) from 3.1.0 to 3.3.1.
- [Release notes](https://github.com/john-kurkowski/tldextract/releases)
- [Changelog](https://github.com/john-kurkowski/tldextract/blob/master/CHANGELOG.md)
- [Commits](john-kurkowski/tldextract@3.1.0...3.3.1)

---
updated-dependencies:
- dependency-name: tldextract
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
πŸ•Ή aspect: interface Concerns end-users' experience with the software 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API
Projects
Status: πŸ“‹ Backlog
Development

No branches or pull requests

2 participants