Skip to content

Commit

Permalink
README update
Browse files Browse the repository at this point in the history
  • Loading branch information
MichalGawor committed May 16, 2022
1 parent 8915237 commit 01ffa63
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 1 deletion.
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Digital Object Gate library
A Python library for direct download link retrieval of resources referenced by a Persistend Identifier. For list of registered (supported) repositories see: TBD. DOG currently supports following PIDs: HDL and DOI as well as URL's with CMDI content negotiation [more](https://www.clarin.eu/content/component-metadata).

## Supported repositories
Status of currently supported repositories can be found in [spreadsheet](https://docs.google.com/spreadsheets/d/1k4QiuCf2N9rsVNeqewXrhhJlZIF_3M3PVdMwyZRRCRk/edit?usp=sharing). Automatic update of status of registered repositories will come in the future.

## Usage
In order to use Digital Object Gate functionalities, create an instance of doglib.DOG, which loads .json configurations of registered repositories. DOG offers the following methods:
Expand Down Expand Up @@ -28,7 +31,7 @@ returns:

#### fetch(pid: str, format='dict') -> Union\[dict, str\]

Tries to match PID with registered repositories and returns dict with collection's license and description, and links to referenced resources within the collection, otherwise returns empty string.
Tries to match PID with registered repositories and returns dict/string with collection's title, license and description, and links to referenced resources within the collection, otherwise returns empty dict/string.
By default, returns dictionary, if format=='jsons' returns a JSON string.
```Python
from doglib import DOG
Expand All @@ -52,6 +55,24 @@ returns:

```

#### identify(pid: str, format='dict) -> Union\[dict, str\]

Tries to match PID with registered repositories and returns dict/string with collection's title, license, desciption and reverse pid, otherwise returns empty dict/string.
By default, returns dictionary, if format=='jsons' returns a JSON string.
```Python
from doglib import DOG
dog = DOG()
dog.identify("https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3698")
```

returns:
```JSON
{
'item_title': 'LINDAT / CLARIAH-CZ Data & Tools',
'description': 'Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech National Corpus. The corpus includes fiction (ca 24%), professional and scientific literature (ca 40%) and newspapers (ca 36%). \r\n\r\nThe corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: syntactic word, lemma, sublemma, tag and verbtag. The texts are shuffled in random chunks of 100 words at maximum (respecting sentence boundaries).',
'reverse_pid': 'https://hdl.handle.net/11234/1-3698@format=cmdi'}
```

#### is_host_registered(pid: str) -> bool

Checks whether PID is hosted by registered repository or not. Note that it may be slower then expected, due to some repositories using same institutional ID in their PIDs (HDl/DOI). In such cases DOG tries to resolve the PID and match the host with registered repositories.
Expand Down
4 changes: 4 additions & 0 deletions examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@
print("This is fetch() output")
print(ret)
print("\n")
ret = dog.identify(url)
print("This is identify() output")
print(ret)
print("\n")

0 comments on commit 01ffa63

Please sign in to comment.