From d3051471d01715056590d6e5f6523764c97d5795 Mon Sep 17 00:00:00 2001 From: Gabriele Date: Wed, 6 Dec 2023 18:12:34 +0000 Subject: [PATCH] updated homepage with summary table of metrics --- docs/_config.yml | 2 +- docs/_includes/header.html | 2 +- docs/index.md | 1233 ++++++++++++++++- docs/quality_dimensions/accuracy.md | 12 +- docs/quality_dimensions/amount_of_data.md | 8 +- docs/quality_dimensions/believability.md | 8 +- docs/quality_dimensions/consistency.md | 8 +- docs/quality_dimensions/currency.md | 8 +- docs/quality_dimensions/interpretability.md | 47 +- .../representational_conciseness.md | 45 +- .../representational_consistency.md | 8 +- docs/quality_dimensions/security.md | 4 +- .../{volatility.md => timeliness.md} | 4 +- docs/quality_dimensions/understandability.md | 37 +- docs/quality_dimensions/verifiability.md | 19 +- docs/quality_dimensions/versatility.md | 14 +- 16 files changed, 1348 insertions(+), 111 deletions(-) rename docs/quality_dimensions/{volatility.md => timeliness.md} (74%) diff --git a/docs/_config.yml b/docs/_config.yml index e092d61..ddd5f04 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -1,4 +1,4 @@ -title: KGHeartbeat +title: KGHeartBeat author: name: Gabriele Tuozzo, Maria Angela Pellegrino email: gabrieletuozzo@gmail.com diff --git a/docs/_includes/header.html b/docs/_includes/header.html index 244d0d4..166d2aa 100644 --- a/docs/_includes/header.html +++ b/docs/_includes/header.html @@ -82,7 +82,7 @@ 3.3 Verifiability

4. Dataset dynamicity

4.1 Currency - 4.2 Volatility + 4.2 Timeliness

5. Contextual

5.1 Completeness 5.2 Amount of data diff --git a/docs/index.md b/docs/index.md index dbe01b3..d9c69b3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -20,7 +20,7 @@ layout: home Accessibility of the SPARQL endpoint - https://bit.ly/4a4xRA6 + bit.ly/4a4xRA6 Checking whether the server responds to a SPARQL query @@ -32,16 +32,1241 @@ layout: home Availability/SPARQL-endpoint + + + Output + 0 + The SPARQL endpoint is offline. + + + 1 + The SPARQL endpoint is online. + + + -1 + The SPARQL endpoint is missing. + + + + Accessibility of the RDF dump + bit.ly/4a4xRA6 + Checking whether an RDF dump is provided and can be downloaded + + + Input + Metadata, (working) SPARQL endpoint, VoID file + + + Algorithm + + Availability/RDF-Dump + + Output 0 - The SPARQL endpoint is offline. + The RDF dump is offline. 1 - The SPARQL endpoint is online. + The RDF dump is online. -1 - The SPARQL endpoint is missing. + The RDF dump is missing. + + + + Derefereaceability of the URI + bit.ly/3R8tYBA + HTTP URIs should be dereferenceable, i.e. HTTP clients should be able to retrieve the resources identified by the URI + + + Input + (working) SPARQL endpoint + + + Algorithm + + Availability/URIs-dereferenciability + + + + Output + [0,1] + Best value: 1. + + + + Licensing + + + Machine-redeable license + bit.ly/4a4xRA6 + detection of the indication of a license in the VoID description or in the dataset itself + + + Input + Metadata, VoID file, (working) SPARQL endpoint + + + Algorithm + + Licensing/MR-License + + + + Output + 0 + The license can't be recovered + + + 1 + The license can be recovered + + + + Human-readable license + bit.ly/4a4xRA6 + Detection of a license in the documentation of the dataset + + + Input + (working) SPARQL endpoint + + + Algorithm + + Licensing/Human-readable + + + + Output + 0 + The license can't be recovered + + + 1 + The license can be recovered + + + + Interlinking + + + Degree of connection + bit.ly/4a4xRA6 + d(i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description +richness through sameAs by using network measures + + + Input + Metadata + + + Algorithm + + Interlinking/Degr-Connection + + + + Output + N + Number of external links + + + + Clustering coefficient + https://bit.ly/4a4xRA6 + (i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description +richness through sameAs by using network measures + + + Input + Metadata + + + Algorithm + + Interlinking/Clustering-coefficient + + + + Output + [0,1] + Best value: 1 + + + + Centrality + bit.ly/4a4xRA6 + (i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description +richness through sameAs by using network measures + + + Input + Metadata + + + Algorithm + + Interlinking/Centrality + + + + Output + [0,1] + Best value: 1 + + + + + sameAs chains + bit.ly/4a4xRA6 + (i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description +richness through sameAs by using network measures + + + Input + Metadata + + + Algorithm + + Interlinking/sameAs + + + + Output + [0,1] + Best value: 1 + + + + Security + + + Access to data is secure + bit.ly/4a4xRA6 + use of login credentials (or use of SSL or SSH) + + + Input + (working) SPARQL endpoint + + + Algorithm + + Security/Sec-acc + + + + Output + 0 + Authentication is required + + + 1 + No authentication required + + + + Access to data is secure + bit.ly/4a4xRA6 + HTTPS support + + + Input + (working) SPARQL endpoint + + + Algorithm + + Security/https + + + + Output + 0 + Does not use HTTPS + + + 1 + Use HTTPS + + + + Performance + + + Low latency + bit.ly/4a4xRA6 + if an HTTP-request is not answered within an average time of one second, the latency of the data source is considered too low + + + Input + (working) SPARQL endpoint + + + Algorithm + + Performance/Latency + + + + Output + [0,1] + 1 if latency less than 1'' otherwise 1000/avg latency + + + + High Throughput + bit.ly/4a4xRA6 + no. of answered HTTP-requests per second + + + Input + (working) SPARQL endpoint + + + Algorithm + + Performance/Throughput + + + + Output + [0,1] + 1 if more than 5 requests otherwise num of fulfilled requests/200 + + + + Intrinsic + + + Accuracy + + + Empty annotation labels + bit.ly/41dNO2P + labels, comments, notes which identifies triples whose property’s object value is empty string + + + Input + (working) SPARQL endpoint + + + Algorithm + + Accuracy/Empty-label + + + + Output + [0,1] + Best value: 1 + + + + White space in annotation + bit.ly/41dNO2P + presence of white space in labels + + + Input + (working) SPARQL endpoint + + + Algorithm + + Accuracy/Whitespace + + + + Output + [0,1] + Best value: 1 + + + + Datatype consistency + bit.ly/41dNO2P + Incompatible with data type range + + + Input + (working) SPARQL endpoint + + + Algorithm + + Accuracy/Datatype + + + + Output + [0,1] + Best value: 1 + + + + Functional property violation + bit.ly/3Rts9QV + FP = 1 - num of triples of with inconsistent values for functional properties / num of triples + + + Input + (working) SPARQL endpoint + + + Algorithm + + Accuracy/FP + + + + Output + [0,1] + Best value: 1 + + + + Inverse functional violation + bit.ly/3Rts9QV + IFP = 1 - num of triples of with inconsistent values for functional properties / num of triples + + + Input + (working) SPARQL endpoint + + + Algorithm + + Accuracy/IFP + + + + Output + [0,1] + Best value: 1 + + + + Consistency + + + Entities as members of disjoint classes + bit.ly/4a4xRA6 + no. of entities described as members of disjoint classes / total no. of entities described in the dataset + + + Input + (working) SPARQL endpoint + + + Algorithm + + Consistency/Disjoint + + + + Output + [0,1] + Best value: 1 + + + Misplaced classes or properties + bit.ly/4a4xRA6 + Detection of a URI defined as a class is used as a property or a URI defined as a property is used as a class + + + Input + (working) SPARQL endpoint + + + Algorithm + + Consistency/Misplaced + + + + Output + [0,1] + Best value: 1 + + + + Use of members of deprecated classes or properties + bit.ly/4a4xRA6 + detection of use of OWL classes owl:DeprecatedClass and owl:DeprecatedProperty + + + Input + (working) SPARQL endpoint + + + Algorithm + + Consistency/Deprecated + + + + Output + [0,1] + Best value: 1 + + + + Invalid usage of undefined classes and properties + bit.ly/4a4xRA6 + detection of classes and properties used without any formal definition + + + Input + (working) SPARQL endpoint + + + Algorithm + + Consistency/Undefined + + + + Output + [0,1] + Best value: 1 + + + + Ontology hijacking + bit.ly/4a4xRA6 + detection of the redefinition by third parties of external classes/ properties such that reasoning over data using those external terms is affected + + + Input + (working) SPARQL endpoint + + + Algorithm + + Consistency/Hijacking + + + + Output + [0,1] + Best value: 1 + + + + Conciseness + + + Intensional conciseness + bit.ly/4a4xRA6 + number of unique attributes of a dataset in relation to the overall number of attributes in a target schema + + + Input + (working) SPARQL endpoint + + + Algorithm + + Conciseness/Int-Conc + + + + Output + [0,1] + Best value: 1 + + + + Extensional conciseness + bit.ly/4a4xRA6 + number of unique objects in relation to the overall number of object representations in the datase + + + Input + (working) SPARQL endpoint + + + Algorithm + + Conciseness/Ext-Conc + + + + Output + [0,1] + Best value: 1 + + + + Trust + + + Reputation + + + Reputation of the dataset + bit.ly/41dNO2P + analyzing page rank of the dataset + + + Input + Metadata + + + Algorithm + + Reputation/PageRank + + + + Output + [0,1] + Best value: 1 + + + + Believability + + + Meta-information about the identity of information provider + bit.ly/41dNO2P + checking whether the provider/contributor is contained in a list of trusted providers + + + Input + Metadata + + + Algorithm + + Believability/Meta-info + + + + Output + [0,1] + Best value: 1 + + + + Verifiability + + + Verifying publisher information + bit.ly/41dNO2P + stating the author and his contributors, the publisher of the data and its sources + + + Input + (working) SPARQL endpoint, VoID file + + + Algorithm + + Verifiability/Publisher-info + + + + Output + 0 + If no information are provided + + + 1 + 1 if all information are provided + + + + Verifying authenticity of the dataset + bit.ly/41dNO2P + whether the dataset uses a provenance vocabulary + + + Input + (working) SPARQL endpoint, VoID file + + + Algorithm + + Verifiability/Vocabs + + + + Output + [0,1] + Best value: 1 + + + + Verifying usage of digital signatures + bit.ly/41dNO2P + signing a document containing an RDF serialisation or signing an RDF graph + + + Input + (working) SPARQL endpoint + + + Algorithm + + Verifiability/Publisher-info + + + + Output + 0 + 0 if the signature is not present + + + 1 + 0 if the signature is present + + + + Dataset dynamicity + + + Currency + + + Time since the last modification + bit.ly/3RtIeWV + currency only measures the time since the last modification + + + Input + (working) SPARQL endpoint, VoID file + + + Algorithm + + Currency/LastModification + + + + Output + 0 + if the modification date is correctly retrieved + + + 1 + otherwise + + + + Specification of the modification date of statements + bit.ly/3RtIeWV + use of dates as the point in time of the last verification of a statement represented by dcterms:modifieds + + + Input + (working) SPARQL endpoint, VoID file + + + Algorithm + + Currency/Modification + + + + Output + 0 + if the modification date can't be retrieved + + + 1 + if the modification date is correctly retrieved + + + + Age of data + bit.ly/3RtIeWV + current time - created time + + + Input + (working) SPARQL endpoint, VoID file + + + Algorithm + + Currency/AgeOfData + + + + Output + 0 + if the creation date can't be recovered + + + 1 + if the creation date is correctly recovered + + + + Timeliness + + + Stating the recency and frequency of data validation + bit.ly/3RtIeWV + it corresponds to the "stating the [...] frequency of data validation" + + + Input + (working) SPARQL endpoint + + + Algorithm + + Timeliness/Frequency + + + + Output + 0 + if the frequency can't be retrieved + + + 1 + if the frequency is correctly retrieved + + + + Contextual + + + Completeness + + + Interlinking completeness + bit.ly/3RtIeWV + degree to which interlinks are not missing + + + Input + Metadata + + + Algorithm + + Completeness/Interl + + + + Output + [0,1] + Best value: 1 + + + + Amount of data + + + Number of triples + bit.ly/3RtIeWV + Number of triples + + + Input + Metadata, (working) SPARQL endpoint + + + Algorithm + + AmountOfData/Triples + + + + Output + 0 + if the number of triples can't be retrieved + + + 1 + if the number of triples can be retrieved + + + + Level of detail + bit.ly/3RtIeWV + Number of triples + + + Input + VoID file, (working) SPARQL endpoint + + + Algorithm + + AmountOfData/Property + + + + Output + 0 + if the number of property can't be retrieved + + + 1 + if the number of property can be retrieved + + + + Scope + bit.ly/3RtIeWV + Number of entities + + + Input + VoID file, (working) SPARQL endpoint + + + Algorithm + + AmountOfData/Entities + + + + Output + 0 + if the number of entity can't be retrieved + + + 1 + if the number of entity can be retrieved + + + + Representational + + + Representational-conciseness + + + Keeping URI short + bit.ly/3RtIeWV + detection of long URIs or those that contain query param-eters + + + Input + (working) SPARQL endpoint + + + Algorithm + + Rep-Conc/ShortUri + + + + Output + [0,1] + Best value: 1 + + + + Representational-consistency + + + Re-use of existing vocabularies + bit.ly/3RtIeWV + usage of established vocabularies + + + Input + (working) SPARQL endpoint + + + Algorithm + + Rep-Cons/NewVocabs + + + + Output + 0 + if there are new vocabularies + + + 1 + if all used vocabularies are already defined + + + + Re-use of existing terms + bit.ly/3RtIeWV + detection of whether existing terms from all relevant vo-cabularies for that particular domain have been reuse + + + Input + (working) SPARQL endpoint + + + Algorithm + + Rep-Cons/NewTerms + + + + Output + 0 + if there are new terms + + + 1 + if all used terms are already defined + + + + Understandability + + + human-readable labelling of classes, properties and entities by providing rdfs:label + bit.ly/3RtIeWV + no. of entities described by stating an rdfs:label or rdfs:comment in the dataset / total no. of entities described in the data + + + Input + (working) SPARQL endpoint + + + Algorithm + + Under/Label + + + + Output + [0,1] + Best value: 1 + + + + Indication of metadata about a dataset + bit.ly/3RtIeWV + checking for the presence of the title, content and URI of the dataset + + + Input + Metadata + + + Algorithm + + Under/info + + + + Output + 0 + if there aren't all information + + + 1 + if there are all information required + + + + Indication of an exemplary SPARQL query + bit.ly/3RtIeWV + detecting whether examples of SPARQL queries are provided + + + Input + Metadata + + + Algorithm + + Under/example + + + + Output + 0 + if there is no example + + + 1 + if at least one example is present + + + + indication of a regular expression that matches the URIs of a dataset + bit.ly/3RtIeWV + detecting whether a regular expression that matches the +URIs is present + + + Input + VoID file, (working) SPARQL endpoint + + + Algorithm + + Under/regex + + + + Output + 0 + if the regex is not indicated + + + 1 + if the regex is indicated + + + + indication of the vocabularies used in the dataset + bit.ly/3RtIeWV + checking whether a list of vocabularies used in the dataset is provided + + + Input + (working) SPARQL endpoint + + + Algorithm + + Under/Vocabs + + + + Output + [0,1] + Best value: 1 + + + + Interpretability + + + no misinterpretation of missing values + bit.ly/3RtIeWV + detecting the use of blank nodes + + + Input + (working) SPARQL endpoint + + + Algorithm + + Under/Bns + + + + Output + [0,1] + Best value: 1 + + + + atypical use of collections containers and reification + bit.ly/3RtIeWV + detection of the non-standard usage of collections, containers and reification + + + Input + (working) SPARQL endpoint + + + Algorithm + + Under/Rdf + + + + Output + 0 + if RDF structures are used. + + + 1 + if RDF structures aren't used. + + + + Versatility + + + Usage of multiple languages + bit.ly/3RtIeWV + checking whether data is available in different languages + + + Input + (working) SPARQL endpoint + + + Algorithm + + Versatility/Languages + + + + Output + 0 + if no lang tag is retrieved + + + 1 + if at least a lang tag is retrieved + + + + Different serialization formats + bit.ly/3RtIeWV + checking whether data is available in different serialization formats + + + Input + (working) SPARQL endpoint + + + Algorithm + + Versatility/Serialization + + + + Output + 0 + if no serialization format is provided + + + 1 + if at least a serialization format is provided + + + + Accessing of data in different ways + bit.ly/3RtIeWV + cchecking whether the data is available as a SPARQL endpoint and is available for download as an RDF dump + + + Input + (working) SPARQL endpoint + + + Algorithm + + Versatility/Access + + + + Output + 0 + if the sparql endpoint or rdf dump is not online + + + 1 + if SPARQL endpoint and RDF dump are online. + \ No newline at end of file diff --git a/docs/quality_dimensions/accuracy.md b/docs/quality_dimensions/accuracy.md index 8261de8..5a01c00 100644 --- a/docs/quality_dimensions/accuracy.md +++ b/docs/quality_dimensions/accuracy.md @@ -3,13 +3,13 @@ title: Intrinsic category --- ## Accuracy -1. [Empty label](#empty-label) -2. [Whitespace at the beginnig or end of the label](#whitespace-at-the-beginnig-or-end-of-the-label) -3. [Wrong datatype](#wrong-datatype) +1. [Empty annotation labels](#empty-annotation-labels) +2. [White space in annotation](#white-space-in-annotation) +3. [Datatype consistency](#datatype-consistency) 4. [Functional property violation](#functional-property-violation) 5. [Inverse functional property violation](#inverse-functional-property-violation) -#### **Empty label** +#### **Empty annotation labels** For the calculation of this metric, we first recover the label in the KG with the follow query: ```sql PREFIX skosxl: @@ -66,7 +66,7 @@ $$ --- -#### **Whitespace at the beginnig or end of the label** +#### **White space in annotation** Always using the query to retrieve all the labels on the triples (which we saw [here](#empty-label)), but this time scrolling through the different labels we go to apply the strip() function on each of the labels, Then, the string obtained is compared with the one before applying the function and if they are the same, it means that the label did not present the problem of spaces, otherwise a $wSP$ counter is incremented. At the end of the process, the following formula is applied to obtain the value of the data, where $L_{KG}$ is the number of KG labels. $$ @@ -75,7 +75,7 @@ $$ --- -#### **Wrong datatype** +#### **Datatype consistency** In this case we used the W3C documentation available [here](https://www.w3.org/TR/xmlschema11-2/). From this document, in addition to the data types, for each of them the regex has also been indicated which defines the range of values that it can take on. In our application an hash table was therefore created, where each entry is made up of a key, which is one of the data types, while the value is the corresponding regex which determines the domain. At this point we just have to catch up all triples from the KG and filter out those that contain a literal to perform the type checking (the check can also be done directly with a query on the SPARQL endpoint, but this often leads to overloading and the query might fail). The value calculation mechanism is given by the following pseudo code. diff --git a/docs/quality_dimensions/amount_of_data.md b/docs/quality_dimensions/amount_of_data.md index 730dd8f..49a4481 100644 --- a/docs/quality_dimensions/amount_of_data.md +++ b/docs/quality_dimensions/amount_of_data.md @@ -4,8 +4,8 @@ title: Contextual category ## Amount of data 1. [Number of triples](#number-of-triples) -2. [Number of properties](#number-of-properties) -3. [Number of entities](#number-of-entities) +2. [Level of detail](#level-of-detail) +3. [Scope](#scope) #### **Number of triples** To calculate the number of triples in the KG we can proceed in two ways. The first consists in recovering the data through the metadata, in particular the *triples* key. This method is only applied when actual triples cannot be counted by accessing the SPARQL endpoint. Because the metadata is not updated along with the content of the KG. The following query is used for count the number of triples: @@ -16,7 +16,7 @@ WHERE { ?s ?p ?o } To quantize the metric, if we can count the number of triples in the KG, we assign 1 to the metric, 0 otherwise. -#### **Number of properties** +#### **Level of detail** We can only obtain this type of value by executing a SPARQL query. In particular, the number of properties is given to us by this query: ```sql @@ -44,7 +44,7 @@ UNION ``` To quantize the metric, if we can count the number of properties in the KG, we assign 1 to the metric, 0 otherwise. -#### **Number of entities** +#### **Scope** In this case we simply recover it by searching for the triple with $void:entities$ predicate inside the VoID file. As an alternative if there isn't a VoID file available, we execute the following query on the SPARQL endpoint. ```sql diff --git a/docs/quality_dimensions/believability.md b/docs/quality_dimensions/believability.md index 4bf79d6..d08cc65 100644 --- a/docs/quality_dimensions/believability.md +++ b/docs/quality_dimensions/believability.md @@ -4,11 +4,9 @@ title: Trust category ## Believability -1. [Title](#title) -2. [Description](#description) -3. [Sources](#sources) -4. [Reliable provider](#reliable-provider) -5. [Trust value](#trust-value) +1. [meta-information about the identity of information provider](#meta-information-about-the-identity-of-information-provider) + +### Meta-information about the identity of information provider #### **Title** To recover the title, we simply analyze the KG metadata, in diff --git a/docs/quality_dimensions/consistency.md b/docs/quality_dimensions/consistency.md index 8f77b71..f2c8b65 100644 --- a/docs/quality_dimensions/consistency.md +++ b/docs/quality_dimensions/consistency.md @@ -3,17 +3,17 @@ title: Intrinsic category --- ## Consistency -1. [Number of entities defined as member of disjoint class](#number-of-entities-defined-as-member-of-disjoint-class) +1. [Entities as members of disjoint classes](#entities-as-members-of-disjoint-classes) 2. [Misplaced classes](#misplaced-class) 3. [Misplaced properties](#misplaced-properties) -4. [Deprecated classe and properties](#deprecated-classes-and-deprecated-properties) +4. [Use of members of deprecated classes or properties](#use-of-members-of-deprecated-classes-or-properties) 5. [Undefined classes](#undefined-classes) 6. [Undefined properties](#undefined-properties) 7. [Ontology Hijacking](#ontology-hijacking) ### **Consistency** -#### **Number of entities defined as member of disjoint class** +#### **Entities as members of disjoint classes** For the calculation of this metric we first execute the following query on the SPARQL endpoint for recover all the triples with the $owl:disjointWith$ predicate: ```sql PREFIX owl: @@ -91,7 +91,7 @@ $$ --- -#### **Deprecated classes and deprecated properties** +#### **Use of members of deprecated classes or properties** For the calculation of this metric, we execute the following query that count the number of triples in the KG with the predicate ```owl:DeprecatedClass``` and ```owl:DeprecatedProperty```. Note that we rely on the fact that whoever created the dataset knows that that class or property is deprecated, but uses it by reporting it. ```sql diff --git a/docs/quality_dimensions/currency.md b/docs/quality_dimensions/currency.md index e1f254b..90a2009 100644 --- a/docs/quality_dimensions/currency.md +++ b/docs/quality_dimensions/currency.md @@ -3,12 +3,12 @@ title: Dataset dynamicity category --- ## Currency -1. [Creation date](#creation-date) -2. [Modification date](#modification-date) +1. [Age of data](#age-of-data) +2. [Specification of the modification date of statements](#specification-of-the-modification-date-of-statements) 3. [Time since last modification](#time-since-last-modification) 4. [History of changes made](#history-of-changes-made) -#### **Creation date** +#### **Age of data** The value regarding the KG creation date can be obtained from the VoID file or by executing a query on the SPARQL endpoint. In the VoID file we look for a triple having $dcterms:created$ as predicate. Instead the query for the endpoint should be of the type: ```sql @@ -25,7 +25,7 @@ with that predicate. To quantize the metric, if the creation date is indicated, --- -#### **Modification date** +#### **Specification of the modification date of statements** This value can also be obtained either from the file VoID or by executing a query. In the VoID file we look for the triple with predicate $dcterms:modified$, while on the SPARQL endpoint we execute the following query: diff --git a/docs/quality_dimensions/interpretability.md b/docs/quality_dimensions/interpretability.md index 8cb7ed5..0e67485 100644 --- a/docs/quality_dimensions/interpretability.md +++ b/docs/quality_dimensions/interpretability.md @@ -3,10 +3,10 @@ title: Accessibility category --- ## Interpretability -1. [Number of blank nodes](#number-of-blank-nodes) -2. [Use of RDF structures](#rdf-structures) +1. [No misinterpretation of missing values](#no-misinterpretation-of-missing-values) +2. [Atypical use of collections containers and reification](#atypical-use-of-collections-containers-and-reification) -#### **Number of blank nodes** +#### **No misinterpretation of missing values** To count the number of nodes we use the following query: ```sql SELECT (COUNT(?bnode) AS ?triples) @@ -24,9 +24,44 @@ FILTER(?p NOT IN (rdf:type)) Then we use the following formmula where $numBN$ is the output of the first query and $numDlc$ is the output of the second query: $$ -m_{BN} = \frac{numBN}{numDlc} +m_{BN} = 1 - \frac{numBN}{numDlc} $$ --- -#### **RDF structures** -For the calculation of this metric we use the same method described [here](./representational_conciseness#use-of-rdf-structures). +#### **Atypical use of collections containers and reification** +In this case we check that there are no RDF data structures, in fact their use comes discouraged by W3C. To check their use we use the following query: +```sql +PREFIX rdf: +PREFIX rdfs: +SELECT (COUNT(?s) AS ?triples) +WHERE{ +{?s rdf:type rdf:List } +UNION +{?s rdf:type rdf:Statement} +UNION +{?s rdf:type rdf:Alt} +UNION +{?s rdf:type rdf:Bag} +UNION +{?s rdf:type rdf:Seq} +UNION +{?s rdf:type rdf:Container} +UNION +{?s rdf:subject ?o} +UNION +{?s rdf:predicate ?o} +UNION +{?s rdf:object ?o} +UNION +{?s rdfs:member ?o} +UNION +{?s rdf:first ?o} +UNION +{?s rdf:rest ?o} +UNION +{?s rdf:_’[0-9]+’} +} +``` +If the query returns even just one result, it means we have +a triple that declares the use of RDF structures and therefore +we attribute a value of 1 to the data (i.e. they are used), 0 otherwise. diff --git a/docs/quality_dimensions/representational_conciseness.md b/docs/quality_dimensions/representational_conciseness.md index ae5c200..44b4b12 100644 --- a/docs/quality_dimensions/representational_conciseness.md +++ b/docs/quality_dimensions/representational_conciseness.md @@ -3,10 +3,9 @@ title: Representational category --- ## Representational-conciseness -1. [URIs length](#uris-length) -2. [Use of RDF structures](#rdf-structures) +1. [Keeping URI short](#keeping-uri-short) -#### **URIs Length** +#### **Keeping URI short** For the calculation of this metric we need all the KG triples. For recover it, we use the following query: ```sql @@ -27,42 +26,4 @@ $$ m_repCons = \frac{urlApproved}{dlc} $$ ---- - -#### **Use of RDF structures** -In this case we check that there are no RDF data structures, in fact their use comes discouraged by W3C. To check their use we use the following query: -```sql -PREFIX rdf: -PREFIX rdfs: -SELECT (COUNT(?s) AS ?triples) -WHERE{ -{?s rdf:type rdf:List } -UNION -{?s rdf:type rdf:Statement} -UNION -{?s rdf:type rdf:Alt} -UNION -{?s rdf:type rdf:Bag} -UNION -{?s rdf:type rdf:Seq} -UNION -{?s rdf:type rdf:Container} -UNION -{?s rdf:subject ?o} -UNION -{?s rdf:predicate ?o} -UNION -{?s rdf:object ?o} -UNION -{?s rdfs:member ?o} -UNION -{?s rdf:first ?o} -UNION -{?s rdf:rest ?o} -UNION -{?s rdf:_’[0-9]+’} -} -``` -If the query returns even just one result, it means we have -a triple that declares the use of RDF structures and therefore -we attribute a value of 1 to the data (i.e. they are used), 0 otherwise. \ No newline at end of file +--- \ No newline at end of file diff --git a/docs/quality_dimensions/representational_consistency.md b/docs/quality_dimensions/representational_consistency.md index d3417c5..3ebb171 100644 --- a/docs/quality_dimensions/representational_consistency.md +++ b/docs/quality_dimensions/representational_consistency.md @@ -3,14 +3,14 @@ title: Representational category --- ## Representational-consistency -1. [Reuse of vocabularies](#reuse-of-vocabularies) -2. [Reuse of terms](#reuse-of-terms) +1. [Re-use of existing vocabularies](#re-use-of-existing-vocabularies) +2. [Re-use of existing terms](#re-use-of-existing-terms) -#### **Reuse of vocabularies** +#### **Re-use of existing vocabularies** For the calculation of this metric we need the vocabularies used in the KG. To recover this information we have used the same method that we have seen [here](./verifiability#vocabularies). Then thanks to the REST API of the Linked Open Vocabularies (LOV), we check if the vocabularies is standard (i.e. is in the LOV). We assign at the metric 1 if no new vocabularies are defined, otherwise 0. Furthermore, track of the new vocabularies used will also be kept. -#### **Reuse of terms** +#### **Re-use of existing terms** In this case we need all the terms used in the KG. We can get this information by using the following query: ```sql diff --git a/docs/quality_dimensions/security.md b/docs/quality_dimensions/security.md index d3e87e7..e4b35bb 100644 --- a/docs/quality_dimensions/security.md +++ b/docs/quality_dimensions/security.md @@ -4,13 +4,13 @@ title: Accessibility category For the calculation of the following two metrics we will need the SPARQL endpoint to be present and active see how [here](./availability#sparql-endpoint). ## Security -1. [Authentication](#authentication) +1. [Access to data is secure](#access-to-data-is-secure) 2. [Use HTTPS](#use-https) ### **Security** -#### **Authentication** +#### **Access to data is secure** To check this metric we use the same query used to test the availability of the SPARQL endpoint (see [here](./availability#sparql-endpoint)), but in this case we check if the status code 401 is returned to us. To quantize the metric, if 401 is returned, we assign 0 to this metric, 1 otherwise. --- diff --git a/docs/quality_dimensions/volatility.md b/docs/quality_dimensions/timeliness.md similarity index 74% rename from docs/quality_dimensions/volatility.md rename to docs/quality_dimensions/timeliness.md index 25dd39a..7b78a9d 100644 --- a/docs/quality_dimensions/volatility.md +++ b/docs/quality_dimensions/timeliness.md @@ -2,9 +2,9 @@ title: Dataset dynamicity category --- ## Volatility -1. [Update frequency](#update-frequency) +1. [Stating the recency and frequency of data validation](#stating-the-recency-and-frequency-of-data-validation) -#### **Update frequency** +#### **Stating the recency and frequency of data validation** For the calculation of this metric we use this query: ```sql PREFIX dcterms: diff --git a/docs/quality_dimensions/understandability.md b/docs/quality_dimensions/understandability.md index c5315bc..2f44deb 100644 --- a/docs/quality_dimensions/understandability.md +++ b/docs/quality_dimensions/understandability.md @@ -3,12 +3,13 @@ title: Representational category --- ## Understandability -1. [Number of label](#number-of-labels) -2. [Presence of example](#presence-of-examples) -3. [URIs regex](#uris-regex) -4. [Vocabularies](#vocabularies-1) +1. [Human-readable labelling of classes, properties and entities by providing rdfs:label](#human-readable-labelling-of-classes-properties-and-entities-by-providing-rdfslabel) +2. [Indication of metadata about a dataset](#indication-of-metadata-about-a-dataset) +3. [Indication of an exemplary SPARQL query](#indication-of-an-exemplary-sparql-query) +4. [Indication of a regular expression that matches the URIs of a dataset](#indication-of-a-regular-expression-that-matches-the-uris-of-a-dataset) +5. [Indication of the vocabularies used in the dataset](#indication-of-the-vocabularies-used-in-the-dataset) -#### **Number of labels** +#### **Human-readable labelling of classes, properties and entities by providing rdfs:label** For the calculation of this metric we execute the following query on the SPARQL endpoint ```sql PREFIX skosxl: @@ -64,11 +65,29 @@ $$ m_{label} = \frac{numLabel}{|T_{KG}|} * 100 $$ -#### **Presence of examples** +#### **Indication of an exemplary SPARQL query** We check if in the KG resources provided there are some examples of SPAQRL query or other examples on how to use the KG. To obtain this type of data we simply need to analyze the "resources" field within the metadata and search for resources that have the *example* tag. The metric is quantized by assigning 1 if there are examples, 0 otherwise. -#### **URIs regex** +#### **Indication of a regular expression that matches the URIs of a dataset** To obtain the URIs regex we follow the same steps that we have illustred [here](./amount_of_data#number-of-entities). To quantize the metric, if a regex is indicated, we assign 1 to this metric, 0 otherwise. -#### **Vocabularies** -For the calculation of this metric and quantization we use the same method illustred [here](./verifiability#vocabularies). \ No newline at end of file +#### **Indication of the vocabularies used in the dataset** +For the calculation of this metric and quantization we use the same method illustred [here](./verifiability#vocabularies). + + +### **Indication of metadata about a dataset** + +#### **Title** +To recover the title, we simply analyze the KG metadata, in +in particular the “title” field. + +--- + +#### **Description** +The description however, as with the title, can be recovered +from the metadata and is present in the “Description” field. + +--- + +#### **Sources** +By KG source we mean all relevant information from the provider. It is a field present within the metadata and is structured as a list of values containing: the web address, name and provider email. The field has the key “sources”. diff --git a/docs/quality_dimensions/verifiability.md b/docs/quality_dimensions/verifiability.md index e827c2f..6707d74 100644 --- a/docs/quality_dimensions/verifiability.md +++ b/docs/quality_dimensions/verifiability.md @@ -1,16 +1,13 @@ ---- +verifying authenticity of the dataset--- title: Trust category --- ## Verifiability -1. [Vocabularies](#vocabularies) -2. [Authors](#authors) -3. [Contributors](#contributors) -4. [Publishers](#publishers) -5. [Sources](#sources) -6. [Signature](#signature) - -#### **Vocabularies** +1. [Verifying authenticity of the dataset](#verifying-authenticity-of-the-dataset) +2. [Verifying publisher information](#verifying-publisher-information) +3. [Verifying usage of digital signatures](#verifying-usage-of-digital-signatures) + +#### **Verifying authenticity of the dataset** For recover the vocabularies used in the KG we can use two different approach. The first is try to parse the VoID file if available and we have to search the triples with the $void:vocabulary$ triples. The second method is to use the following query on the SPARQL endpoint. ```sql @@ -26,6 +23,8 @@ $$ --- +### Verifying publisher information + #### **Authors** Also the authors can be recovered via the VoID file or the SPARQL endpoint. In the VoID file we have to search the triples with the predicate equals to $dcterms:creator$. As alternative, we execute the following query on the SPARQL endpoint. @@ -76,7 +75,7 @@ To quantize this metric, we assign 1 if sources is indicated, 0 otherwise. --- -### **Signature** +### **Verifying usage of digital signatures** To check and retrieve the signature on the KG if present, the following query is executed: ```sql diff --git a/docs/quality_dimensions/versatility.md b/docs/quality_dimensions/versatility.md index 960f818..5cd2cab 100644 --- a/docs/quality_dimensions/versatility.md +++ b/docs/quality_dimensions/versatility.md @@ -3,12 +3,12 @@ title: Representational category --- ## Versatility -1. [Languages](#languages) -2. [Serialization formats](#serialization-formats) -3. [Access to the KG](#access-to-the-kg) +1. [Usage of multiple languages](#usage-of-multiple-languages) +2. [Different serialization formats](#different-serialization-formats) +3. [Accessing of data in different ways](#accessing-of-data-in-different-ways) -#### **Languages** +#### **Usage of multiple languages** To check if there are different languages supported (and this is indicated), we use the following query: ```sql @@ -22,7 +22,7 @@ To quantize this metric, we assign 1 if we have indication about the languages u --- -#### **Serialization formats** +#### **Different serialization formats** We calculate this metric bby using two different methods: the first is to look for triples with *void:feature* predicate within the VoID file, the second one involves executing the following query on the SPARQL endpoint: ```sql @@ -38,5 +38,5 @@ UNION To quantize this metric, we assign 1 if we have indication about the serialization formats available, 0 otherwise. --- -#### **Access to the KG** -In this metric we insert the available links to access to the KG, only if this links are online. The metric is then quantized by giving value 1 in the case there is at least one working access method, 0 otherwise. \ No newline at end of file +#### **Accessing of data in different ways** +In this metric we insert the available links to access to the KG, only if this links are online. The metric is then quantized by giving value 1 in the case we can access at the KG both throught the SPARQL endpoint or by downloading the RDF dump. \ No newline at end of file