fix: typo and some link to the algorithm

isislab-unisa · Dec 6, 2023 · 733c05a · 733c05a
1 parent d305147
commit 733c05a
Show file tree

Hide file tree

Showing 8 changed files with 34 additions and 56 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -30,7 +30,7 @@ layout: home
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#sparql-endpoint">Availability/SPARQL-endpoint</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#accessibility-of-the-sparql-endpoint">Availability/SPARQL-endpoint</a>
             </td>
         </tr>
         <tr>
@@ -59,7 +59,7 @@ layout: home
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#rdf-dump">Availability/RDF-Dump</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#accessibility-of-the-rdf-dump">Availability/RDF-Dump</a>
             </td>
         </tr>
         <tr>
@@ -88,7 +88,7 @@ layout: home
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#uris-dereferenciability">Availability/URIs-dereferenciability</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/availability#derefereaceability-of-the-uri">Availability/URIs-dereferenciability</a>
             </td>
         </tr>
         <tr>
@@ -233,7 +233,7 @@ richness through sameAs by using network measures</i></td>
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/interlinking#number-of-same-as-chains">Interlinking/sameAs</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/interlinking#sameas-chains">Interlinking/sameAs</a>
             </td>
         </tr>
         <tr>
@@ -310,7 +310,7 @@ richness through sameAs by using network measures</i></td>
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/performance">Performance/Latency</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/performance#low-latency">Performance/Latency</a>
             </td>
         </tr>
         <tr>
@@ -331,7 +331,7 @@ richness through sameAs by using network measures</i></td>
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/performance#throughput">Performance/Throughput</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/performance#high-throughput">Performance/Throughput</a>
             </td>
         </tr>
         <tr>
@@ -622,7 +622,7 @@ richness through sameAs by using network measures</i></td>
         <tr>
             <td rowspan="1">Algorithm</td>
             <td colspan="3">
-            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/reputation#pagerank">Reputation/PageRank</a>
+            <a href="https://isislab-unisa.github.io/KGHeartbeat/quality_dimensions/reputation#reputation-of-the-dataset">Reputation/PageRank</a>
             </td>
         </tr>
         <tr>
@@ -650,9 +650,13 @@ richness through sameAs by using network measures</i></td>
             </td>
         </tr>
         <tr>
-            <td>Output</td>
-            <td>[0,1]</td>
-            <td>Best value: 1</td>
+            <td rowspan="2">Output</td>
+            <td>0</td>
+            <td>if the provider isn't in the list of trusted providers</td>
+        </tr>
+        <tr>
+            <td>1</td>
+            <td>if the provider isn't in the list of trusted provider</td>
         </tr>
         <tr><tr><tr><tr></tr></tr></tr></tr>
                  <tr>
@@ -1027,7 +1031,7 @@ richness through sameAs by using network measures</i></td>
             <th colspan="5" style="text-align: center;">Understandability</th>
         </tr>
         <tr>
-            <td rowspan="8">human-readable labelling of classes, properties and entities by providing rdfs:label</td>
+            <td rowspan="8">Human-readable labelling of classes, properties and entities by providing rdfs:label</td>
             <td rowspan="8"><a href="https://bit.ly/3RtIeWV">bit.ly/3RtIeWV</a></td>
             <td colspan="4"><i>no. of entities described by stating an rdfs:label or rdfs:comment in the dataset / total no. of entities described in the data</i></td>
         </tr>
@@ -1098,7 +1102,7 @@ richness through sameAs by using network measures</i></td>
         </tr>
  <tr><tr><tr><tr></tr></tr></tr></tr>
         <tr>
-            <td rowspan="8">indication of a regular expression that matches the URIs of a dataset</td>
+            <td rowspan="8">Indication of a regular expression that matches the URIs of a dataset</td>
             <td rowspan="8"><a href="https://bit.ly/3RtIeWV">bit.ly/3RtIeWV</a></td>
             <td colspan="4"><i>detecting whether a regular expression that matches the
 URIs is present </i></td>
@@ -1124,7 +1128,7 @@ URIs is present </i></td>
         </tr>
  <tr><tr><tr><tr></tr></tr></tr></tr>
         <tr>
-            <td rowspan="8">indication of the vocabularies used in the dataset</td>
+            <td rowspan="8">Indication of the vocabularies used in the dataset</td>
             <td rowspan="8"><a href="https://bit.ly/3RtIeWV">bit.ly/3RtIeWV</a></td>
             <td colspan="4"><i>checking whether a list of vocabularies used in the dataset is provided</i></td>
         </tr>
@@ -1148,7 +1152,7 @@ URIs is present </i></td>
             <th colspan="5" style="text-align: center;">Interpretability</th>
         </tr>
         <tr>
-            <td rowspan="8">no misinterpretation of missing values</td>
+            <td rowspan="8">No misinterpretation of missing values</td>
             <td rowspan="8"><a href="https://bit.ly/3RtIeWV">bit.ly/3RtIeWV</a></td>
             <td colspan="4"><i>detecting the use of blank nodes</i></td>
         </tr>
@@ -1169,7 +1173,7 @@ URIs is present </i></td>
         </tr>
  <tr><tr><tr><tr></tr></tr></tr></tr>
         <tr>
-            <td rowspan="8">atypical use of collections containers and reification</td>
+            <td rowspan="8">Atypical use of collections containers and reification</td>
             <td rowspan="8"><a href="https://bit.ly/3RtIeWV">bit.ly/3RtIeWV</a></td>
             <td colspan="4"><i>detection of the non-standard usage of collections, containers and reification</i></td>
         </tr>

diff --git a/docs/quality_dimensions/availability.md b/docs/quality_dimensions/availability.md
@@ -3,12 +3,11 @@ title: Accessibility category
 ---
 
 ## Availability
-1. [SPARQL endpoint](#sparql-endpoint)
-2. [RDF Dump](#rdf-dump)
-3. [URIs dereferenciability](#uris-dereferenciability)
-4. [Inactive links](#inactive-links)
+1. [Accessibility of the SPARQL endpoint](#accessibility-of-the-sparql-endpoint)
+2. [Accessibility of the RDF dump](#accessibility-of-the-rdf-dump)
+3. [Derefereaceability of the URI](#derefereaceability-of-the-uri)
 
-#### **SPARQL endpoint**
+#### **Accessibility of the SPARQL endpoint**
 First of we need to check that it is present
 for the KG we are considering. The SPARQL endpoint link can be recovered in three different ways:
 1. The first (easiest) is to analyze the metadata and search for the resource with the tag in the resources field api/sparql or whose key is sparql.
@@ -32,7 +31,7 @@ offline and given value 0.
 
 ---
 
-#### **RDF dump**
+#### **Accessibility of the RDF dump**
 To check for the presence of the RDF dump we have three possible approaches:
 1. We can analyze the metadata and check if in the resources field there are one or more resources with one of the following tags: ```application/rdf+xml```, ```text/turtle```, ```application/x-ntriples```, ```application/x-nquads```, ```text/n3```, ```rdf```,```text/rdf+n3```, ```rdf/turtle```.
 2. Another method is to check inside the VoID file (if available). In this case we search for the triple having ```void:dataDump``` as its predicate.
@@ -49,7 +48,7 @@ Once the dump link has been retrieved, a simple HEAD request is made on the URL,
 
 ---
 
-#### **URIs dereferenciability**
+#### **Derefereaceability of the URI**
 5000 triples (which contain URIs) are randomly retrieved with this query:
 
 ```sql
@@ -74,6 +73,3 @@ m_{def} = \frac{|Dereferencable(U_g)|}{|U_g|}
 $$
 
 ---
-
-#### **Inactive links**
-All links present in the "resources" field in the metadata are recovered for the KG selected and a HEAD request is performed on each of this links. If there are links that are not active, the data is given a value of 0, otherwise 1.
diff --git a/docs/quality_dimensions/believability.md b/docs/quality_dimensions/believability.md
@@ -8,23 +8,6 @@ title: Trust category
 
 ### Meta-information about the identity of information provider
 
-#### **Title**
-To recover the title, we simply analyze the KG metadata, in
-in particular the “title” field.
-
----
-
-#### **Description**
-The description however, as with the title, can be recovered
-from the metadata and is present in the “Description” field.
-
----
-
-#### **Sources**
-By KG source we mean all relevant information from the provider. It is a field present within the metadata and is structured as a list of values containing: the web address, name and provider email. The field has the key “sources”.
-
----
-
 #### Reliable provider
 The presence in a list of reliable providers is verified by recovering the keywords associated with the KG. These are present in the metadata in the "keyword" field. Among the many values it contains,
 there is also the one relating to the provider. Then, the list is traversed and each value is compared with a list of providers deemed reliable. Since there is still no a list of reliable provider in the LOD panorama, we build this list by including the most well-known providers in the panorama of LOD. The list can be seen in the following table and is not to be considered exhaustive and definitive.
@@ -48,9 +31,6 @@ there is also the one relating to the provider. Then, the list is traversed and
         <td>Academic</td>
     </tr>
 </table>
-
+The value assigned to this metric will be 1 if the provider is in the list of trusted providers, 0 otherwise
 ---
 
-#### **Trust value**
-It is a score which is between 0 and 1 which helps the KG user to understand how much information about the believability is available.
-In fact, this value is calculated as a weighted average based on how many of the following values are present: name, description, URL and presence in the reliable provider list. For each of these values, if the KG has it, 1 is assigned, otherwise 0. The sum is made and then divided by 4. The value obtained will be the trust value of the dataset. We also use this value to quantize the entire believability dimension, because this is a value which summarizes all the metrics.
diff --git a/docs/quality_dimensions/currency.md b/docs/quality_dimensions/currency.md
@@ -6,7 +6,6 @@ title: Dataset dynamicity category
 1. [Age of data](#age-of-data)
 2. [Specification of the modification date of statements](#specification-of-the-modification-date-of-statements)
 3. [Time since last modification](#time-since-last-modification)
-4. [History of changes made](#history-of-changes-made)
 
 #### **Age of data**
 The value regarding the KG creation date can be obtained from the VoID file or by executing a query on the SPARQL endpoint. In the VoID file we look for a triple having $dcterms:created$ as predicate. Instead the query for the endpoint should be of the type:

diff --git a/docs/quality_dimensions/interlinking.md b/docs/quality_dimensions/interlinking.md
@@ -6,7 +6,7 @@ title: Accessibility category
 1. [Degree of connection](#degree-of-connection)
 2. [Clustering coefficient](#clustering-coefficient)
 3. [Centrality](#centrality)
-4. [Number of *same as* chains](#number-of-same-as-chains)
+4. [sameAs chains](#sameas-chains)
 
 For the caluculation of the Degree of connection, clustering coefficient and centrality, we utilize a tool for network measurement. We use a Python library named ```networkx``` for our purpose. In KGHeartbeat, the module called [```Graph.py```](https://github.com/isislab-unisa/KGHeartbeat/blob/main/Graph.py) is responsable to the caluculation of these three value. In particular, it is responsible for creating the graph that contains all the KGs that can be retrieved automatically from Internet. The external connections for every KG are analyzed (field
 present in the metadata under the "external links" key) and for each connection we find, we insert the node inside the graph, labeled with the id of the KG and insert the edge with a weight equal to the number of triples with which it is connected to the other KGs. The process is then iterated for every KGs recovered. At the end of these process, on this Graph we calculate: *Degree of connection*, *Clustering coefficient* and *Centrality*. 
@@ -24,7 +24,7 @@ The clustering coefficient (specifically here we calculate the local clustering
 Centrality allows us to understand how important the KG is inside the graph and it is also a value between [0-1]. A higher centrality means a higher importance of the node, that is, it is involved in many connections. Instead, the lower it is, the more it means that those node is in the peripheral areas of the graph.
 
 ---
-#### **Number of *same as* chains**
+#### **sameAs chains**
 In this case we use the following query which counts the number of triples that have the ```owl:sameAs``` predicate.
 
 ```sql

diff --git a/docs/quality_dimensions/licensing.md b/docs/quality_dimensions/licensing.md
@@ -5,7 +5,6 @@ title: Accessibility category
 ## Licensing
 1. [Machine-readable license](#machine-readable-license)
 2. [Human-readable license](#human-readable-license)
-3. [License in the metadata](#license-in-the-metadata)
 
 
 #### **Machine-readable license**

diff --git a/docs/quality_dimensions/performance.md b/docs/quality_dimensions/performance.md
@@ -3,12 +3,12 @@ title: Accessibility category
 ---
 
 ## Performance
-1. [Latency](#latency)
-2. [Throughput](#throughput)
+1. [Low latency ](#low-latency)
+2. [High Throughput](#high-throughput)
 
 The values calculated in this case are latency and throughput. Since they are highly variable tests, they are repeated several times and the mean, standard deviation, maximum and minimum are calculated. In fact, the values could vary due to the difference in performance of our network over time or the load of the server where the SPARQL endpoint is located (as well as the performance of the server network itself).
 
-#### **Latency**
+#### **Low latency **
 The test is repeated 5 times and involves the execution of one
 simple query that retrieves a generic triple of the dataset and comes
 measured the time between the request for the triple and when
@@ -22,5 +22,5 @@ LIMIT 1
 To quantize the latency, if the latency is less than 1 second, then 1 is assigned to this metric. Otherwise we average the five latency measurements and divide by a 1000.
 
 ---
-#### **Throughput**
+#### **High Throughput**
 Also in this case the test is repeated 5 times and we use the same previous query. But in this case we see in a second how many requests we can complete. The query executes in a while loop that stops after one second, and a count counter is incremented each time the query returns the result. At the end of each test, this variable will contain the number of requests and responses completed. To quantize the metric, if the throughput is greater than 5, we assign 1 to this metric. Otherwise we divide the throughput obtained by 200 and the value obtained is the value for the metric.
diff --git a/docs/quality_dimensions/reputation.md b/docs/quality_dimensions/reputation.md
@@ -3,7 +3,7 @@ title: Trust category
 ---
 
 ## Reputation
-1. [PageRank](#pagerank)
+1. [Reputation of the dataset ](#reputation-of-the-dataset)
 
-#### **PageRank**
+#### **Reputation of the dataset **
 Since for the calculation of interlinking [here](./interlinking) we have built the graph containing all the KGs with the related external links, the function that calculates the PageRank on the node of interest (which corresponds to the KG we are analyzing) will be applied to this graph. The function used is the one made available by networkx and we pass it as input the ID of the KG whose PageRank we want to calculate. The function will return a value between 1 and 10. The closer the data is to 10, the more the KG has a high reputation and therefore of good quality. For quantize the metric, and get a value between 0 and 1, we divide the pagerank by 10.0