Description
openedon Apr 21, 2021
This issue has been migrated from the CC Search Catalog repository
Author: annatuma
Date: Fri Nov 15 2019
Labels: providers,β¨ goal: improvement,π
status: discontinued
This is a provider of texts and is therefore blocked by the Catalog not being ready to ingest that content type at this time
Provider API Endpoint / Documentation
Internal users only: CC has an API key for this service, please check CC's password manager.
Provider description
A provider of openly licensed ebooks, some of which are available from Project Gutenberg.
Licenses Provided
They indicate that the works on their site as CC licensed or have another open license. We'd need to restrict ingestion to CC licenses.
Provider API Technical info
There isn't a clear way for a frontend user to filter books on the site by license type.
The basic API documentation doesn't include license info at the high level:
https://unglue.it/api/v1/?format=json
However, they reference an ONIX structure, where rights information is returned in the Epub License field:
CC BY-NC-ND
01
https://creativecommons.org/licenses/by-nc-nd/3.0/
For example:
https://unglue.it/api/onix/by-nc-nd/epub/?max=20
More work is needed to determine if we can get all the information we need for ingestion
General Recommendations for implementation
- The script should be in the
src/cc_catalog_airflow/dags/provider_api_scripts/
directory. - The script should have a test suite in the same directory.
- The script must use the
ImageStore
class (Import this from
src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py
). - The script should use the
DelayedRequester
class (Import this from
src/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py
). - The script must not use anything from
src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py
, since
that module is deprecated. - If the provider API has can be queried by 'upload date' or something similar,
the script should take a--date
parameter when run as a script, giving the
date for which we should collect images. The form should beYYYY-MM-DD
(so,
the script can be run viapython my_favorite_provider.py --date 2018-01-01
). - The script must provide a main function that takes the same parameters as from
the CLI. In our example from above, we'd then have a main function
my_favorite_provider.main(date)
. The main should do the same thing calling
from the CLI would do. - The script must conform to PEP8. Please use
pycodestyle
(available via
pip install pycodestyle
) to check for compliance. - The script should use small, testable functions.
- The test suite for the script may break PEP8 rules regarding long lines where
appropriate (e.g., long strings for testing).
Examples of other Provider API Scripts
For example Provider API Scripts and accompanying test suites, please see
src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py
andsrc/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py
, orsrc/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py
andsrc/cc_catalog_airflow/dags/provider_api_scripts/test_wikimedia_commons.py
.
Metadata
Assignees
Labels
Type
Projects
Status
π Backlog