-
Notifications
You must be signed in to change notification settings - Fork 0
Export data to wikidata #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
@admin.action(description="Create new Wikidata items for selected publications") | ||
def export_to_wikidata(modeladmin, request, queryset): | ||
created_count, updated_count, error_records = export_publications_to_wikidata(queryset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the message, please add log statements of different levels of detail here (using at least info
and debug
).
) | ||
try: | ||
from wikibaseintegrator.datatypes import Url | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not go with the second one right-away?
SPARQL_ENDPOINT = settings.WIKIBASE_API_URL.replace("/w/api.php", "/query/sparql") | ||
|
||
# constant for all dates | ||
CALENDAR_MODEL = "http://www.wikidata.org/entity/Q1985727" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use https://...
here?
parts = date_str.split("-") | ||
if len(parts) == 1 and parts[0].isdigit(): | ||
# "YYYY" | ||
return f"{parts[0]}-01-01", 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will alwas return the first of the month, but when you normalise 2024-09
as the end of a period, you should return the last day of the month.
https://stackoverflow.com/a/43663 seems to be a good solution to do that, you will have to pass "begin or end" to the function though, but I like the idea to encapsulate the normalisation in one function.
def add_time_claims(dates, prop_nr, statements): | ||
for ds in dates: | ||
iso, prec = normalize_date_and_precision(ds) | ||
timestamp = f"+{iso}T00:00:00Z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the above, I'm fine with the hours/mins/seconds here are the same for beginning and end. We operate at day-level precision.
entity = wikibase_integrator.item.get(entity_id=existing_qid) | ||
entity.claims.add(statements) | ||
try: | ||
entity.write(summary="Update publication via OptimapBot") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to know here which fields you are going to update?
Will your claims overwrite all existing claims?
I would prefer to only update the ones that do not exist yet. It's not our task to fix existing entities.
if publication.geometry: | ||
geometries = getattr(publication.geometry, "geoms", [publication.geometry]) | ||
for geom in geometries: | ||
if getattr(geom, "geom_type", None) != "Point": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also work if the geometry (GEOMETRYCOLLECTION
) is multiple points?
if getattr(geom, "geom_type", None) != "Point": | ||
geom = geom.centroid | ||
statements.append(GlobeCoordinate(prop_nr=P_GEOMETRY, latitude=geom.y, longitude=geom.x, precision=0.0001)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add statements based on the bounding box/envelope: https://docs.djangoproject.com/en/5.2/ref/contrib/gis/gdal/#django.contrib.gis.gdal.OGRGeometry.extent
The suitable properties seem to be these:
https://www.wikidata.org/wiki/Property:P1332
https://www.wikidata.org/wiki/Property:P1333
https://www.wikidata.org/wiki/Property:P1334
https://www.wikidata.org/wiki/Property:P1335
updated_count = 0 | ||
error_records = [] | ||
|
||
for publication in publications: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add .info
and .debug
log statements in this loop for detailed progress information.
error_records = [] | ||
|
||
for publication in publications: | ||
if not publication.publicationDate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the only required field, or is it the only one that may be missing according to our data model?
Closes #113