This project analyzes SPARQL queries. As input, it takes a positive set of queries and a negative set of queries. It extracts SPARQL features (e.g. UNION) of all queries. Afterwards it builds a decision tree (J48/C4.5), which is supposed to match all features of the positives and no features of the negatives. It is evaluated by the fMeasure (fScore).
- LSQ (The Linked SPARQL Queries Dataset) is utilized to extract SPARQL features from the single queries. The resulting files are in TURTLE format, and are using SPIN representation as well as the LSQ vocabulary.
- The code of this project extracts the SPARQL features and creates an ARFF file.
- Weka is utilized for data analysis using a decision tree.
The positive set and a negative sets of SPARQL queries can be generated by SPAB, which uses TripleStore benchmark results.
- LSQ
- Download LSQ sources at GitHub
- Run
mvn clean install
- LsqSpinToArff
- Clone the Git repository
- Import the code as Maven project
- Use the Main class and input parameters:
- File with positive queries
- File with negative queries
- Output directory
- (Optional) LSQ Jar file
Some documentation is available in the wiki at https://github.com/dice-group/LsqSpinToArff/wiki
Data Science Group
University of Paderborn
Adrian Wilke