Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
8f2e7cf
rewrite: Combine with PyEntityshape to avoid unnecessary http requests.
dpriskorn Jun 23, 2023
41d3c66
fix: Fix CI
dpriskorn Jun 23, 2023
f5a9174
fix: Fix CI
dpriskorn Jun 23, 2023
ca334ad
fix: Fix all tests
dpriskorn Jun 23, 2023
d0ffbbd
fix: Pre-commit fix
dpriskorn Jun 23, 2023
4c194ef
Merge pull request #1 from dpriskorn/rewrite_to_python_module
Jun 23, 2023
a09a35f
docs: Improve the README.md
dpriskorn Jun 23, 2023
1ba1e20
docs: Improve the README.md
dpriskorn Jun 23, 2023
024a6db
docs: Improve the README.md and remove unused files
dpriskorn Jun 23, 2023
f22a7df
Merge pull request #3
Jun 23, 2023
b54e546
docs: Improve the README.me with limitations
dpriskorn Jun 24, 2023
ef65589
fix: Fix bug with "missing" in StatementResponse. Test it. Thanks to …
dpriskorn Jun 24, 2023
e34f8ea
Merge pull request #6 from dpriskorn/fix_statement_response
Jun 24, 2023
708191c
chore: Update version
dpriskorn Jun 24, 2023
f8ea316
chore: Rename tests
dpriskorn Jun 24, 2023
92b9ffc
docs: Add notebooks to README.md.
dpriskorn Jun 24, 2023
f59bfb2
docs: Improve README.md.
dpriskorn Jun 24, 2023
4b265cf
docs: Improve README.md with known working schemas
dpriskorn Jun 25, 2023
58f0ceb
feat: Add new __str__ and __repr__ methods to make it easier to outpu…
dpriskorn Jun 25, 2023
b092b3e
docs: Improve README.md CLI example
dpriskorn Jun 25, 2023
06f9f2b
chore: Bump version
dpriskorn Jun 25, 2023
9d7c9e3
fix: Fix all tests
dpriskorn Jun 25, 2023
db8780b
rewrite: Split methods
dpriskorn Jun 25, 2023
4e032ac
Merge pull request #11 from dpriskorn/print_results
Jun 25, 2023
b052e9f
chore: Bump version
dpriskorn Jun 25, 2023
c6f4a26
docs: Update README.md.
dpriskorn Jun 25, 2023
30bcd22
fix: Fix missing return in __str__ and test it.
dpriskorn Jun 25, 2023
0d9e5d8
Merge pull request #12 from dpriskorn/fix_missing_return
Jun 26, 2023
517fd2a
feat: Add lookup of labels using WikibaseIntegrator. Only support Wik…
dpriskorn Jul 20, 2023
cf15909
Merge pull request #14 from dpriskorn/add_support_for_fetching_labels
Jul 20, 2023
4da7c2f
chore: Bump version
dpriskorn Jul 20, 2023
9fccf83
docs: Add limitation about Wikidata being the only supported Wikibase…
dpriskorn Jul 20, 2023
88efb7a
Merge pull request #16
Jul 20, 2023
9931586
chore: Bump version
dpriskorn Jul 20, 2023
1d2367b
chore: Add test for danish language label.
dpriskorn Jul 20, 2023
7c8054f
docs: Add information about lexeme support is experimental
dpriskorn Jul 20, 2023
abd30f7
rewrite: Add support for any Wikibase and test it.
dpriskorn Jul 29, 2023
fe22a60
fix: Fix ruff errors
dpriskorn Jul 29, 2023
9851bcc
Merge pull request #19
Oct 11, 2023
2a2adc8
Allow usage with Python 3.12.1+
LeMyst Jan 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .github/workflows/lint_python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: lint_python
on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install --user ruff
- run: ruff --format=github --target-version=py37 .

lint_python:
needs: ruff
runs-on: ubuntu-latest
# env:
# ia_sandbox: ${{ secrets.ia_sandbox }}
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry
- uses: actions/setup-python@v4
with:
python-version: 3.x
cache: 'poetry'
# - name: Redis Server in GitHub Actions
# uses: supercharge/redis-github-action@1.4.0
- run: pip install --upgrade pip wheel
- run: poetry install --with=dev
- run: poetry run black --check .
- run: poetry run codespell src/ tests/ *.md *.py # --ignore-words-list="" --skip="*.css,*.js,*.lock"
- run: mkdir --parents --verbose .mypy_cache
- run: poetry run mypy --ignore-missing-imports --install-types --non-interactive --exclude compareshape.py --exclude shape.py .
- run: poetry run safety check
- run: poetry run pytest .
50 changes: 0 additions & 50 deletions .github/workflows/python-app.yml

This file was deleted.

70 changes: 70 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
repos:
- repo: local
hooks:
# - id: dead
# name: dead
# entry: dead
# args:
# - "--exclude"
# - "test_data/test_content.py|^deprecated/" # regex separate by "|"
# language: system
# pass_filenames: false
# # types: [python]


- id: black
name: black
language: system
entry: black .
types: [python]

- id: codespell
name: codespell
language: system
entry: codespell
# args:
# - "src/"
# - "tests/"
# - "*.md"
# - "*.py"
# pass_filenames: false
types_or: [python, markdown]
exclude: ^test_data/

- id: ruff
name: ruff
language: system
entry: ruff
args:
# Tell ruff to fix sorting of imports
- "--fix"
- "--format=github"
- "--target-version=py37"
- "."
# types: [python]
pass_filenames: false

# https://jaredkhan.com/blog/mypy-pre-commit
- id: mypy
name: mypy
entry: mypy
language: python
# use your preferred Python version
# language_version: python3.7
# additional_dependencies: ["mypy==0.790"]
types: [python]
# use require_serial so that script
# is only called once per commit
require_serial: true
exclude: shape.py|compareshape.py
# Print the number of files as a sanity-check
# verbose: true

# - id: pytest
# name: pytest
# language: system
# entry: pytest
# args:
# # - "--durations=10"
# - "-x"
# pass_filenames: false
2 changes: 0 additions & 2 deletions .pylintrc

This file was deleted.

138 changes: 121 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,121 @@
# entityshape
An api to compare a wikidata item with an entityschema

This api is available at http://entityshape.toolforge.org/api. The api requires 3 parameters to return a result as follows:
1. __language__: e.g. _en_ the language to return property names in
2. __entity__: e.g. _Q42_ the wikidata entity to check
3. __entityschema__: e.g. _E14_ the entityschema to check against

The api returns a json object containing the following:
1. __error__: details of any error which may have occurred
2. __schema__: the entityschema checked against
3. __name__: the display name of the entityschema
4. __validity__: the validity of the schema (currently unused)
5. __properties__: a json object describing the validity of each property in the entity
6. __statements__: a json object describing the validity of each statement in the entity

This repository also contains the source code for the user script at https://www.wikidata.org/w/User:Teester/EntityShape.js which allow use of this api on wikidata entity pages
# [Entityshape](https://www.wikidata.org/wiki/Q119899931)
A python library to compare a wikidata entity
(item or lexeme) with a
[Wikibase Entity Schema](https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas).

Based on https://github.com/Teester/entityshape by Mark Tully
and https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn

# Features
* compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.
* determine whether an item is valid according to a certain schema or not
* support for any Wikibase

# Limitations
The shape and compareshape classes currently only support:
* cardinality (too many or not enough values)
* whether the property is allowed or not
* whether the value of a statement on a given property is correct/incorrect

It is still a bit unclear if and how the qualifier validation works.

Validation of lexemes is still considered experimental.
Feel free to open an issue with a working or non-working example.

# Installation
Get it from pypi

`$ pip install pyentityshape`

# Usage

## Jupyter Notebooks
Example notebooks with code for validation of multiple items:
[hiking paths](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-hiking-paths-in-sweden.ipynb)
[campsites](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-campsites-in-sweden.ipynb)
[shelters](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-shelters-in-sweden.ipynb)

## CLI
Example:
```
# Note that we default to English so the lang parameter here is optional.
# Note that we default to Wikidata so the mediawiki_api_url and wikibase_url parameters here are optional.
e = EntityShape(eid="E1",
entity_id="Q1",
lang="en",
# mediawiki_api_url='http://localhost/api.php',
# wikibase_url='http://wikibase.svc'
)
result = e.validate_and_get_result()
# Get human readable result
print(result)
"Valid: False\nProperties_without_enough_correct_statements: instance of (P31)"
# Access the data
print(result.properties_without_enough_correct_statements)
"{'P31'}"
```

## Validation
The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js

It currently checks these five conditions that all have to be false for the item to be valid:
1. properties with too many statements found
2. incorrect statements found
3. some required properties are missing
4. properties without enough correct statements found
5. statements with properties that are not allowed found

## Known working schemas
This library currently only supports a subset of all features in the ShEx specification.

The following Entity Schemas are known to work:
* [hiking path](https://www.wikidata.org/w/index.php?title=EntitySchema:E375&oldid=1833851062)
* [shelter](https://www.wikidata.org/w/index.php?title=EntitySchema:E398&oldid=1923235264)

# Background
This library is the glue between libraries like [Wikibase
Integrator](https://github.com/LeMyst/WikibaseIntegrator/) and entityschemas.

It makes it easy to batch check a whole subset of Wikidata
items against a schema. Nice!

# TODO
The CompareShape and Shape classes should be rewritten using OOP
and enums to avoid passing strings around because that is not
nice to debug or maintain.

What do we want to know from the CompareShape class?

On the property level:
* whether the property is mandatory and present/missing

On the statement level
* whether the cardinality of values is allowed (min/max)
* whether the value(s) are correct/incorrect

Cases:
* mandatory property is missing
* optional property is missing (this is not invalidating)
* a property has an incorrect value
* a property has a correct value
* a property has too many values
* a property has not enough values
* ?

# ShEx Tip
When working on your Entity Schemas the constraints here are nice to know/remember
https://shex.io/shex-primer/#tripleConstraints

# Thanks
Big thanks to [Myst](https://github.com/LeMyst) and
[Christian Clauss](https://github.com/cclauss) for
advice and help with Ruff to make this better.

# License
GPLv3+

# What I learned
* Forking other peoples undocumented spaghetti code is not much fun.
* I want to find a more reliable validator that support somevalue and novalue
* Pydantic is wonderful yet again it makes working with OOP easy peasy :)
* Ruff is crazy fast and very nice!
77 changes: 0 additions & 77 deletions app.py

This file was deleted.

Loading