Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust WikidataDisplay to correctly display proceedings with multiple events #42

Open
tholzheim opened this issue Dec 15, 2022 · 3 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@tholzheim
Copy link
Collaborator

The proceedings query currently does not aggregate the event items per proceeding resulting in multiple records for one proceeding if it has multiple events.

image

@tholzheim tholzheim added the enhancement New feature or request label Dec 15, 2022
@tholzheim tholzheim self-assigned this Dec 15, 2022
@tholzheim
Copy link
Collaborator Author

The event aggregation is possible by using the following query as subquery

SELECT 
  ?item 
  (GROUP_CONCAT(?_event; SEPARATOR = "|") AS ?event) 
  (GROUP_CONCAT(?_eventLabel; SEPARATOR = "|") AS ?eventLabel) 
  (GROUP_CONCAT(?_eventSeries; SEPARATOR = "|") AS ?eventSeries) 
  (GROUP_CONCAT(?_eventSeriesLabel; SEPARATOR = "|") AS ?eventSeriesLabel) 
  (GROUP_CONCAT(?_eventSeriesOrdinal; SEPARATOR = "|") AS ?eventSeriesOrdinal)
  (GROUP_CONCAT(?_dblpEventId; SEPARATOR = "|") AS ?dblpEventId) 
WHERE {
  ?item wdt:P31 wd:Q1143604;
    wdt:P179 wd:Q27230297;
    wdt:P4745 ?_event.
  ?_event rdfs:label ?_eventLabel.
  FILTER((LANG(?_eventLabel)) = "en")
  OPTIONAL { ?_event wdt:P10692 ?_dblpEventId. }
  OPTIONAL {
    ?_event p:P179 ?_partOfTheEventSeriesStmt.
    ?_partOfTheEventSeriesStmt ps:P179 ?_eventSeries;
      pq:P1545 ?_eventSeriesOrdinal.
    ?_eventSeries rdfs:label ?_eventSeriesLabel.
    FILTER((LANG(?_eventSeriesLabel)) = "en")
  }
}
GROUP BY ?item

Unfortunately, with the aggregation the query got more expensive and runs into a timeout.
One of the expensive parts of the query seems to be the string transformation clause.

OPTIONAL {
?item wdt:P4109 ?URN_NBN.
wd:P4109 wdt:P1630 ?URN_NBNFormatterUrl.
BIND(IRI(REPLACE(?URN_NBN, '^(.+)$', ?URN_NBNFormatterUrl)) AS ?URN_NBNUrl).
}

By excluding this the query yields a result.

The solution for now is to format the external ids in code instead within the query.

Contradicts

#@TODO - use formatterUris from Wikidata

@tholzheim
Copy link
Collaborator Author

By splitting the value on the defined separator when generating the links we get
image

@tholzheim
Copy link
Collaborator Author

The new version of the query

#
# get CEUR-WS Proceedings records by Volume with linked Event and EventSeries
#
# WF 2022-08-13
#
# the Volume number P478 is sometimes available with the proceedings item and sometimes as a qualifier
# of
#
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
?item
?itemLabel
?itemDescription
?ceurwspart
?sVolume
?Volume
?short_name
?dblpProceedingsId
?ppnId
?event
?eventLabel
?dblpEventId
?eventSeries
?eventSeriesLabel
?eventSeriesOrdinal
?title
?language_of_work_or_name
?language_of_work_or_nameLabel
?URN_NBN
?publication_date
?fullWorkUrl
?described_at_URL
?homePage
WHERE {
?item wdt:P31 wd:Q1143604;
wdt:P179 wd:Q27230297;
rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = "en")
OPTIONAL {
?item schema:description ?itemDescription.
FILTER((LANG(?itemDescription)) = "en")
}
OPTIONAL { ?item wdt:P478 ?Volume. }
OPTIONAL { ?item (p:P179/pq:P478) ?_sVolume. BIND(xsd:integer(?_sVolume) as ?sVolume)}
OPTIONAL { ?item wdt:P1813 ?short_name. }
OPTIONAL { ?item wdt:P8978 ?dblpProceedingsId. }
OPTIONAL { ?item wdt:P6721 ?ppnId. }
OPTIONAL {?item wdt:P4109 ?URN_NBN.}
OPTIONAL { ?item wdt:P1476 ?title. }
OPTIONAL { ?item wdt:P577 ?publication_date. }
OPTIONAL { ?item wdt:P953 ?fullWorkUrl. }
OPTIONAL { ?item wdt:P973 ?described_at_URL. }
OPTIONAL { ?item wdt:P856 ?homePage. }
OPTIONAL {
?item wdt:P407 ?language_of_work_or_name.
?language_of_work_or_name rdfs:label ?language_of_work_or_nameLabel.
FILTER((LANG(?language_of_work_or_nameLabel)) = "en")
}
{
SELECT
?item
(GROUP_CONCAT(?_event; SEPARATOR = "|") AS ?event)
(GROUP_CONCAT(?_eventLabel; SEPARATOR = "|") AS ?eventLabel)
(GROUP_CONCAT(?_eventSeries; SEPARATOR = "|") AS ?eventSeries)
(GROUP_CONCAT(?_eventSeriesLabel; SEPARATOR = "|") AS ?eventSeriesLabel)
(GROUP_CONCAT(?_eventSeriesOrdinal; SEPARATOR = "|") AS ?eventSeriesOrdinal)
(GROUP_CONCAT(?_dblpEventId; SEPARATOR = "|") AS ?dblpEventId)
WHERE {
?item wdt:P31 wd:Q1143604;
wdt:P179 wd:Q27230297;
wdt:P4745 ?_event.
?_event rdfs:label ?_eventLabel.
FILTER((LANG(?_eventLabel)) = "en")
OPTIONAL { ?_event wdt:P10692 ?_dblpEventId. }
OPTIONAL {
?_event p:P179 ?_partOfTheEventSeriesStmt.
?_partOfTheEventSeriesStmt ps:P179 ?_eventSeries;
pq:P1545 ?_eventSeriesOrdinal.
?_eventSeries rdfs:label ?_eventSeriesLabel.
FILTER((LANG(?_eventSeriesLabel)) = "en")
}
}
GROUP BY ?item
}
}
ORDER BY ?sVolume

is still expensive and has varying execution times from 36s to timeout (~ 1/5 get a timeout)
Maybe we need to split the query. Since the event data is queried through a sub-query and both query parts are distinct to the proceedings item it would be easy to merge them afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant