Skip to content

Unglue.itΒ #1746

Open
Open

Description

This issue has been migrated from the CC Search Catalog repository

Author: annatuma
Date: Fri Nov 15 2019
Labels: providers,✨ goal: improvement,πŸ™… status: discontinued

This is a provider of texts and is therefore blocked by the Catalog not being ready to ingest that content type at this time

Provider API Endpoint / Documentation

https://unglue.it/api/help

Internal users only: CC has an API key for this service, please check CC's password manager.

Provider description

A provider of openly licensed ebooks, some of which are available from Project Gutenberg.

Licenses Provided

They indicate that the works on their site as CC licensed or have another open license. We'd need to restrict ingestion to CC licenses.

Provider API Technical info

There isn't a clear way for a frontend user to filter books on the site by license type.

The basic API documentation doesn't include license info at the high level:
https://unglue.it/api/v1/?format=json

However, they reference an ONIX structure, where rights information is returned in the Epub License field:

CC BY-NC-ND

01
https://creativecommons.org/licenses/by-nc-nd/3.0/

For example:
https://unglue.it/api/onix/by-nc-nd/epub/?max=20

More work is needed to determine if we can get all the information we need for ingestion

General Recommendations for implementation

  • The script should be in the src/cc_catalog_airflow/dags/provider_api_scripts/ directory.
  • The script should have a test suite in the same directory.
  • The script must use the ImageStore class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py).
  • The script should use the DelayedRequester class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py).
  • The script must not use anything from
    src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py, since
    that module is deprecated.
  • If the provider API has can be queried by 'upload date' or something similar,
    the script should take a --date parameter when run as a script, giving the
    date for which we should collect images. The form should be YYYY-MM-DD (so,
    the script can be run via python my_favorite_provider.py --date 2018-01-01).
  • The script must provide a main function that takes the same parameters as from
    the CLI. In our example from above, we'd then have a main function
    my_favorite_provider.main(date). The main should do the same thing calling
    from the CLI would do.
  • The script must conform to PEP8. Please use pycodestyle (available via
    pip install pycodestyle) to check for compliance.
  • The script should use small, testable functions.
  • The test suite for the script may break PEP8 rules regarding long lines where
    appropriate (e.g., long strings for testing).

Examples of other Provider API Scripts

For example Provider API Scripts and accompanying test suites, please see

  • src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py, or
  • src/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_wikimedia_commons.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      πŸ“‹ Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions