Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8626e69
added a file about how to search and delete resources in the ts
francescalb May 22, 2025
20c519d
Merge branch 'criteria_update_and_doc' into flb/update-documentation-…
jesper-friis May 22, 2025
8543952
Added possibility of adding more than one type or several values for …
francescalb May 23, 2025
b879d7b
Merge branch 'flb/update-documentation-search-delete' of github.com:E…
francescalb May 23, 2025
7d06099
Merge branch 'master' into flb/update-documentation-search-delete
francescalb May 23, 2025
914ada8
Merge branch 'flb/update-documentation-search-delete' of github.com:E…
francescalb May 23, 2025
a4693c6
Updated documentation so it can be tested
francescalb May 23, 2025
e23d7b6
Corrected typos
francescalb May 23, 2025
dd422ee
import delete in documentation
francescalb May 23, 2025
7c6a833
typo
francescalb May 23, 2025
194e629
Skip doctest on output
francescalb May 23, 2025
1505b7e
remove doctest on documentation...
francescalb May 23, 2025
7a8a92b
remove more tests
francescalb May 23, 2025
8da5ef9
Test running doctest, remove output
francescalb May 23, 2025
672d445
Run the tests
francescalb May 23, 2025
9255301
more testing
francescalb May 23, 2025
8062ae1
Merge branch 'master' into flb/update-documentation-search-delete
francescalb May 23, 2025
76f82b8
Added acquire and load in doc
francescalb May 23, 2025
0d3e11b
Merge branch 'flb/update-documentation-search-delete' of github.com:E…
francescalb May 23, 2025
9b8b21c
Apply suggestions from code review
jesper-friis May 23, 2025
7302feb
Merge branch 'master' into flb/update-documentation-search-delete
jesper-friis May 24, 2025
89a3f82
Update docs/datadoc/fetching-resources-from-a-triplestore.md
francescalb May 26, 2025
8a609b8
Merge branch 'master' into flb/update-documentation-search-delete
francescalb May 26, 2025
7c61e59
Typos
francescalb May 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions docs/datadoc/fetching-resources-from-a-triplestore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
Working with already documented resources
=========================================

The [tripper.datadoc] module also includes functionality for easy searching of the documented resources.

For these examples there must be a triplestore instance available, poplated with some data.
```python
>>> from tripper import Triplestore
>>> from tripper.datadoc import save_datadoc
>>> ts = Triplestore(backend="rdflib")
>>> save_datadoc(ts,"https://raw.githubusercontent.com/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.yaml") # doctest: +ELLIPSIS
{'@graph': [...], ...}

```

Searching the knowledge base
----------------------------

### Get all IRIs of all datasets in the kb

```python
>>> from tripper.datadoc import search
>>> search(ts) # doctest: +ELLIPSIS
[...]

```

This will return a list of all datasets in the knowledge base.


### Search with filtering criteria

Before adding specific filtering criteria it is important to bind non-standard prefixes to corresponding namespaces (standard prefixes defined in the keywords file, like dcterms, dcat, etc do not need to be defined again):

```python
>>> DATA = ts.bind("data", "http://example.com/data#")
>>> MAT = ts.bind("mat", "http://example.com/materials#")

```

It is possible to search for instances of type `dcat:Dataset` in two ways:

```python
>>> search(ts, type="Dataset") # doctest: +ELLIPSIS
[...]

>>> search(ts, type="dcat:Dataset") # doctest: +ELLIPSIS
[...]

```
The first shortened version is only possible for [predefined keywords] that are specifically added in tripper.

Note that full iris (e.g. `http://www.w3.org/ns/dcat#Dataset`) are currently not supported.


You can also search for documented resources of other types or include more than one type in the search.
```python
>>> SEM = ts.bind("sem", "https://w3id.com/emmo/domain/sem/0.1#")
>>> search(ts, type="sem:SEMImage") # doctest: +ELLIPSIS
[...]

>>> search(ts, type=["sem:SEMImage", "dcat:Dataset"]) # doctest: +ELLIPSIS
[...]

```


It is also possible to filter through other criteria:
```python
>>> search(ts, criteria={"creator.name": "Sigurd Wenner"}) # doctest: +ELLIPSIS
[...]

>>> search(ts, criteria={"creator.name": ["Sigurd Wenner", "Named Lab Assistant"]}) # doctest: +ELLIPSIS
[...]

>>> KB = ts.bind('kb', 'http://example.com/kb/' )
>>> search(ts, criteria={"@id": KB.image1}) # doctest: +ELLIPSIS
[...]

```

Note that here the object created when binding the `kb` prefix is a tripper.Namespace, and can be used directly as the second example above.

Fetching metadata and data
--------------------------

The `acquire` function can be used to fetch metadata from the triplestore.
```python
>>> from tripper.datadoc import acquire
>>> acquire(ts, 'https://he-matchmaker.eu/data/sem/SEM_cement_batch2/77600-23-001/77600-23-001_5kV_400x_m001') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
AttrDict({'@id': 'https://he-matchmaker.eu/data/sem/SEM_cement_batch2/77600-23-001/77600-23-001_5kV_400x_m001', ...})


```

Similarly the load function can be used to fetch the data using the information about the dowload URL in the metadata.
The syntax is the same as above. Note though that for this specific example you would need access to a server that
is not available to the general public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an example of how to use load() can be added here...


Removing instances in the knowledge base
----------------------------------------

Be very careful when using this, as there is a high risk that you delete data from others if you have access to delete on a shared knowledge base.

The same criteria as shown above can be used e.g.:

```python
>>> from tripper.datadoc import delete
>>> delete(ts, criteria={"@id": KB.image1})
>>> delete(ts, criteria={"creator.name": "Sigurd Wenner"})

```
It is also possible to remove everything in the triplestore with `delete(ts)`, but this is strongly discouraged.



[predefined keywords]: keywords.md
[tripper.datadoc]: https://emmc-asbl.github.io/tripper/latest/datadoc/introduction
24 changes: 24 additions & 0 deletions tests/datadoc/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,30 @@ def test_datadoc():
SAMPLE["SEM_cement_batch2/77600-23-001"],
}

# Filter on more than one type in the search
assert search(ts, type=["dcat:Dataset", SEM.SEMImage]) == [
SEMDATA["SEM_cement_batch2/77600-23-001/77600-23-001_5kV_400x_m001"]
]

# Filter on one criterion, but add it as a list
assert set(search(ts, criteria={"creator.name": ["Sigurd Wenner"]})) == {
SEMDATA["SEM_cement_batch2/77600-23-001/77600-23-001_5kV_400x_m001"],
SEMDATA["SEM_cement_batch2/77600-23-001"],
SEMDATA["SEM_cement_batch2"],
}

# Filter on more than one value for a criterion
assert set(
search(
ts,
criteria={
"creator.name": ["Sigurd Wenner", "Named Lab Assistant"]
},
)
) == {
SEMDATA["SEM_cement_batch2/77600-23-001/77600-23-001_5kV_400x_m001"],
}

with pytest.raises(NoSuchTypeError):
search(ts, type="invalid-type")

Expand Down
3 changes: 2 additions & 1 deletion tests/input/semdata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ Dataset:
title: SEM image of cement
description: Back-scattered SEM image of cement sample 77600 from Heidelberg, polished with 1 µm diamond compound.
creator:
name: Sigurd Wenner
- name: Sigurd Wenner
- name: Named Lab Assistant
contactPoint:
hasName: Sigurd Wenner
hasEmail: <Sigurd.Wenner@sintef.no>
Expand Down
26 changes: 16 additions & 10 deletions tripper/datadoc/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1285,21 +1285,27 @@ def make_query(
)

if type:
if ":" in type:
expanded_iri = ts.expand_iri(type)
crit.append(f"?iri rdf:type <{expanded_iri}> .")
else:
if keywords is None:
keywords = Keywords()
typ = keywords.superclasses(type)
if not isinstance(typ, str):
typ = typ[0]
crit.append(f"?iri rdf:type <{ts.expand_iri(typ)}> .") # type: ignore
types = [type] if not isinstance(type, list) else type
for t in types:
if ":" in t:
expanded_iri = ts.expand_iri(t)
crit.append(f"?iri rdf:type <{expanded_iri}> .")
else:
if keywords is None:
keywords = Keywords()
typ = keywords.superclasses(t)
if not isinstance(typ, str):
typ = typ[0]
crit.append(f"?iri rdf:type <{ts.expand_iri(typ)}> .") # type: ignore

def add_crit(k, v, regex=False, s="iri"):
"""Add criteria to SPARQL query."""
nonlocal n
key = f"@{k[1:]}" if k.startswith("_") else k
if isinstance(v, list):
for ele in v:
add_crit(key, ele, regex=regex, s=s)
return
if "." in key:
newkey, restkey = key.split(".", 1)
if newkey in expanded:
Expand Down