Test Scholia queries on other SPARQL endpoints #2063

Daniel-Mietchen · 2022-07-21T22:22:23Z

Is your feature request related to a problem? Please describe.

Scholia uses the Wikidata Query Service to run SPARQL queries over the Wikidata corpus.
The Wikidata Query Service uses Blazegraph as the backend for providing the SPARQL endpoint.
Blazegraph is not designed for graphs much larger than about 100 million items, which is about the size of the current Wikidata
An evaluation of Blazegraph alternatives for Wikidata is ongoing, with no clear timeline towards a solution.

Describe the solution you'd like

I'd like us to explore running Scholia on other SPARQL endpoints, Blazegraph or otherwise. We have done some of this in a past, but not in a way that would be scalable across all Scholia queries.

Describe alternatives you've considered

A relatively straightforward approach might be to build a workflow based on running Scholia via the SPARQL endpoint (default: Blazegraph again) of a dedicated Wikibase instance that holds a copy of a recent Wikidata dump. There could even be several such Wikibases, each serving a specific subset (e.g. per Scholia aspect).

Additional context

Other options would be to start exploring non-Blazegraph endpoints, e.g. https://wikidata.demo.openlinksw.com/sparql (running on Virtuoso) or https://qlever.cs.uni-freiburg.de/wikidata/ (running on QLever)

Assess implications of Blazegraph failure playbook for Scholia #1721

Daniel-Mietchen · 2022-07-21T22:34:38Z

I just created a simplified version of one of our queries - country_authors.sparql

SELECT
?author 
(COUNT(DISTINCT ?citing_work) AS ?number_of_citing_works)
(SAMPLE(?organization_) AS ?organization)
(SAMPLE(?work) AS ?example_work)
WHERE {
  ?author wdt:P27 | wdt:P1416/wdt:P17 | wdt:P108/wdt:P17 wd:Q35 .
  ?work wdt:P50 ?author .
  OPTIONAL { ?citing_work wdt:P2860 ?work . }
  OPTIONAL {
    ?author wdt:P1416 | wdt:P108 ?organization_ .
    ?organization_ wdt:P17 wd:Q35 .
  }
}
GROUP BY ?author

It times out on Wikidata, fails on QLever and executes on that Virtuoso instance.

WolfgangFahl · 2022-07-22T11:49:48Z

The query runs successfully on some of our endpoints

date;sparqlquery -qn authorsCitingWork -en blazegraph -f github;date

blazegraph 2018 instance (13 secs for ~786 results) Fr 22. Jul 13:41:13 CEST 2022 Fr 22. Jul 13:41:26 CEST 2022
jena 2020 instance ( for ~10117 results) Fr 22. Jul 13:39:32 CEST 2022 - still running via command line will report later
stardog 2022 instance (108 secs for ~14266 results) Fr 22. Jul 13:37:10 CEST 2022 - Fr 22. Jul 13:39:02 CEST 2022

WolfgangFahl · 2023-01-10T04:16:16Z

see ad-freiburg/qlever#859

egonw · 2023-03-10T05:15:49Z

Virtuoso-on-AWS: https://wikidata.demo.openlinksw.com/sparql

(Does not support the Wikidata blazegraph functions)

WolfgangFahl · 2023-03-10T07:54:40Z

https://ceur-ws.org/Vol-3262/paper9.pdf and https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData have a list of candidates. I also intend to talk to the wikidata team on the next meeting and would love to have a proper blazegraph mirror running at our RWTH Aachen i5 http://wikidata.dbis.rwth-aachen.de/ machine which should be suitable for the task with 256 GB RAM and 10 TB SSD. I never got a proper blazegraph mirror endpoint with all necessary special services running in the past 6 years that i have been attempting to get my own copy of wikidata running.

egonw · 2023-03-10T09:49:45Z

Oh, you're in Aachen?

Daniel-Mietchen added the enhancement some suggestions to improve Scholia label Jul 21, 2022

WolfgangFahl mentioned this issue Jan 29, 2024

convert templated queries to named queries and separate concerns by introducing named query middleware #2412

Open

egonw mentioned this issue Mar 2, 2024

Top-level configurable #2429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Scholia queries on other SPARQL endpoints #2063

Test Scholia queries on other SPARQL endpoints #2063

Daniel-Mietchen commented Jul 21, 2022 •

edited

Loading

Daniel-Mietchen commented Jul 21, 2022

WolfgangFahl commented Jul 22, 2022

WolfgangFahl commented Jan 10, 2023

egonw commented Mar 10, 2023 •

edited

Loading

WolfgangFahl commented Mar 10, 2023 •

edited

Loading

egonw commented Mar 10, 2023

Test Scholia queries on other SPARQL endpoints #2063

Test Scholia queries on other SPARQL endpoints #2063

Comments

Daniel-Mietchen commented Jul 21, 2022 • edited Loading

Daniel-Mietchen commented Jul 21, 2022

WolfgangFahl commented Jul 22, 2022

WolfgangFahl commented Jan 10, 2023

egonw commented Mar 10, 2023 • edited Loading

WolfgangFahl commented Mar 10, 2023 • edited Loading

egonw commented Mar 10, 2023

Daniel-Mietchen commented Jul 21, 2022 •

edited

Loading

egonw commented Mar 10, 2023 •

edited

Loading

WolfgangFahl commented Mar 10, 2023 •

edited

Loading