-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Scholia queries on other SPARQL endpoints #2063
Comments
I just created a simplified version of one of our queries - country_authors.sparql SELECT
?author
(COUNT(DISTINCT ?citing_work) AS ?number_of_citing_works)
(SAMPLE(?organization_) AS ?organization)
(SAMPLE(?work) AS ?example_work)
WHERE {
?author wdt:P27 | wdt:P1416/wdt:P17 | wdt:P108/wdt:P17 wd:Q35 .
?work wdt:P50 ?author .
OPTIONAL { ?citing_work wdt:P2860 ?work . }
OPTIONAL {
?author wdt:P1416 | wdt:P108 ?organization_ .
?organization_ wdt:P17 wd:Q35 .
}
}
GROUP BY ?author It times out on Wikidata, fails on QLever and executes on that Virtuoso instance. |
The query runs successfully on some of our endpoints
|
Virtuoso-on-AWS: https://wikidata.demo.openlinksw.com/sparql (Does not support the Wikidata blazegraph functions) |
https://ceur-ws.org/Vol-3262/paper9.pdf and https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData have a list of candidates. I also intend to talk to the wikidata team on the next meeting and would love to have a proper blazegraph mirror running at our RWTH Aachen i5 http://wikidata.dbis.rwth-aachen.de/ machine which should be suitable for the task with 256 GB RAM and 10 TB SSD. I never got a proper blazegraph mirror endpoint with all necessary special services running in the past 6 years that i have been attempting to get my own copy of wikidata running. |
Oh, you're in Aachen? |
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
I'd like us to explore running Scholia on other SPARQL endpoints, Blazegraph or otherwise. We have done some of this in a past, but not in a way that would be scalable across all Scholia queries.
Describe alternatives you've considered
A relatively straightforward approach might be to build a workflow based on running Scholia via the SPARQL endpoint (default: Blazegraph again) of a dedicated Wikibase instance that holds a copy of a recent Wikidata dump. There could even be several such Wikibases, each serving a specific subset (e.g. per Scholia aspect).
Additional context
Other options would be to start exploring non-Blazegraph endpoints, e.g. https://wikidata.demo.openlinksw.com/sparql (running on Virtuoso) or https://qlever.cs.uni-freiburg.de/wikidata/ (running on QLever)
The text was updated successfully, but these errors were encountered: