Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Wikimedia script does not pull data for Public Domain images #1730

Closed
obulat opened this issue Jun 18, 2021 · 0 comments · Fixed by WordPress/openverse-catalog#119
Closed
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix good first issue New-contributor friendly help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon

Comments

@obulat
Copy link
Contributor

obulat commented Jun 18, 2021

Bug Description

Currently, we do not parse the license information for the Public Domain and PDM-marked image data from Wikimedia correctly, and therefore discard those images.
For CC-licensed images, imageinfo object has the following data:

"imageinfo": {
    "extmetadata": {
        "LicenseUrl": {
            "value": "https://creativecommons.org/licenses/by-sa/4.0",
            "source": "commons-desc-page",
            "hidden": ""
        },
        "LicenseShortName": {
            "value": "CC BY-SA 4.0",
            "source": "commons-desc-page",
            "hidden": ""
        },
        ...
    }
}

So, we use the following to get a license URL:

license_url = (
    image_info
        .get('extmetadata', {})
        .get('LicenseUrl', {})
        .get('value', '')
        .strip()
)

However, for Public Domain images, the LicenseUrl property has an empty value, so we discard the information for the image as invalid.
To fix this, we can use the LicenseShortName field to get correct data:

    if license_url == '':
        license_name = (
            image_info
                .get('extmetadata', {})
                .get('LicenseShortName', {})
                .get('value', '')
        )
        if license_name.lower() == 'public domain':
            license_info = licenses.get_license_info(license_='publicdomain', license_version='N/A')
        elif license_name.lower() == 'pdm-owner':
            license_info = licenses.get_license_info(license_='pdm', license_version='1.0')

Expected behavior

We expect that the information for Wikimedia Public Domain images is saved to the database.

@obulat obulat added good first issue New-contributor friendly help wanted Open to participation from the community 💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon labels Jun 18, 2021
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix good first issue New-contributor friendly help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant