-
Notifications
You must be signed in to change notification settings - Fork 21
Semantization
My approach to semantization is to go as fast as possible to an RDF syntax, and then to treat it with RDF tools. For XML, I use Gloze which makes a generic translation of XML, then a rule base in N3 with Euler / Eye.
There are several methods:
- XML transform methods: XSLT or XQuery
- RDF transform methods: canonical (direct mapping) conversion in RDF, then SPARQL or other RDF technique
- tabular methods: use RDF property URI's or Turtle terms as columns names in CSV, then convert in RDF with
semantic_formsin command line, see Semantize raw stuff - JSON methods:
- use a specific a JSON-LD context to achieve the final result; caveat: JSON-LD 1.1 is young , and not all is implemented (like https://json-ld.org/spec/latest/json-ld/#nested-properties)
- use JSON transform techniques, for example Semantic Bus
An example of method 1: https://framagit.org/Scrutari/RDFexport/blob/master/scrutari-to-rdf.xslt
Below details on method 2.
For 3 (CSV), there is a useful site, but it's limited to SKOS: http://labs.sparna.fr/skos-play/convert
The CSV file can be conveniently obtained via LibreOffice.
A typical first part of the data mill (direct mappping):
java -cp $JARS deductions.runtime.connectors.CSVImporterApp \
"/home/jmv/data/Voisins sur site.csv" \
urn:gv/contacts \
~/data/adding-details-each_row.ttl \
,The arguments are:
- input CSV file or URL (with header row)
- base URL for the rows,
- URL or File ( in Turtle ) for adding details to each row
- separator
It leverages on the column to URI mappings obtained from the first row. The first row must consist of property URI's (or abbreviated Turtle URI's with well known prefixes). Here is a typical example of an RDF CSV file (nature field trips):
dbo:startDate, rdfs:label, dct:subject, schema:performer, schema:departureStation, nature:returnStation,,,,,
D 31 mars 2019, Forêt de Nanteau-Poloigny, Lichénologie, G. Carlier, départ Paris-Gare de Lyon à 8h16 pour Bagneaux-sur-Loing 9h23, retour de Bagneaux à 17h30 + ANVL + CNCE,,,,,
D 14 avril 2019, Forêt de Fontainebleau, mycologie et botanique, JP CHABRIER et A LAURON, Paris Gare de Lyon 8h16 pour Montigny-sur-Loing 9h04, retour de Tomery 18h58 + CNCE + ANVL + SMF,,,,,
When a column header is not a URI, the tools creates one with the given prefix and the header.
The file adding-details-each_row.ttl contains in this example:
PREFIX schema: <http://schema.org/>
<any:ROW> a schema:Event .
<any:ROW> <http://purl.org/NET/c4dm/event.owl#agent> <http://naturalistes.chez.com/> .Or, in another example:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<any:ROW> a foaf:Organization .When a cell in tabular data looks like an URI or abbreviated Turtle URI's with well known prefixes, it is expanded.
A sample produced by this step is:
<urn:gv/contacts/row/117>
a <http://vocab.sindice.net/csv/Row> , <http://xmlns.com/foaf/0.1/Organization> ;
<http://purl.org/dc/elements/1.1/subject>
"Equipe de psychologue à domicile" ;
<http://vocab.sindice.net/csv/rowPosition>
"117" ;
<http://www.virtual-assembly.org/ontologies/1.0/pair#administrativeName>
"Aurore équipe mobile" ;
<http://www.virtual-assembly.org/ontologies/1.0/pair#arrivalDate>
"01/10/2016" ;
<http://www.virtual-assembly.org/ontologies/1.0/pair#building>
"Oratoire" ;
<http://www.virtual-assembly.org/ontologies/1.0/pair#room>
"1er étage" ;
<http://www.virtual-assembly.org/ontologies/1.0/pair#status>
"Contribue" ;
<http://xmlns.com/foaf/0.1/familyName>
"Auffret" ;
<http://xmlns.com/foaf/0.1/givenName>
"Marianne" ;
<http://xmlns.com/foaf/0.1/img>
<http://www.larchipel.paris/wp-content/uploads/2014/10/logo-sans-rup-2012.jpg> ;
<http://xmlns.com/foaf/0.1/mbox>
"m.auffret@aurore.asso.fr" ;
<http://xmlns.com/foaf/0.1/name>
"Equipe mobile" .Then a second step can be a SPARQL query that will for instance split Person and Organization fields that are mixed in a single CSV row. Here is a typîcal SPARQL query for extracting Person (foaf:Organization has already been created in first step). Here, we copy Person specific properties into a new foaf:Person instance, and the rest of the properties are copied to a newly forged URI. Note that we must create a triple to connect the person and the organization. Note that we need SPARQL Update for manipulating graphs.
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dct: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix cco: <http://purl.org/ontology/cco/core#>
prefix pair: <http://virtual-assembly.org/pair_v2#>
DELETE { GRAPH <urn:gv/contacts> {?ORGA ?P ?O .}}
INSERT { GRAPH <urn:/Voisins/new> {
?PERSON a foaf:Person ;
foaf:familyName ?FN ;
foaf:givenName ?GN ;
foaf:mbox ?MB ;
foaf:phone ?PH .
?newURI a foaf:Organization .
?newURI ?P ?O .
?newURI pair:hasResponsible ?ORGA .
}}
WHERE { GRAPH <urn:gv/contacts> {
?PERSON a <http://vocab.sindice.net/csv/Row> ;
foaf:familyName ?FN ;
foaf:givenName ?GN ;
foaf:mbox ?MB ;
foaf:phone ?PH .
?ORGA ?P ?O .
FILTER( ?P != foaf:familyName &&
?P != foaf:givenName &&
?P != foaf:mbox ?MB &&
?P != foaf:phone )
BIND (URI(CONCAT( STR(?ORGA), "-org") ) AS ?newURI)
}}This can be conveniently run in command line:
java -cp $JARS tdb.tdbupdate --loc=TDB \
--update=~/src/assemblee-virtuelle.github.io/grands-voisins-v2/split-person-orga.rqUsing Euler / Eye rule engine for Trutle and N3 rule language:
eye --nope --query rules-documentorQ.n3 rules-documentor.n3 \
~/data/Non_public/Voisins-sur-site.csv.ttl ~/ontologies/foaf.n3 \
> ~/src/assemblee-virtuelle.github.io/grands-voisins-v2/onto.ttlThen step 4 can be dispatching triples to user-specific named graphs, so that each user has complete edit and delete rights on her data. For this , there is application UserNamedGraphsDispatcherApp .
in command line:
java -cp $JARS deductions.runtime.sparql_cache.algos.UserNamedGraphsDispatcherApp urn:/Voisins/newPrerequisite : have installed Gloze source in ~/src/Gloze .
BASE_URI="$HOME/mystuff"
XML_INPUT=/usr/share/accountsservice/interfaces/com.ubuntu.AccountsService.Input.xml
source ~/src/Gloze/setClasspath.sh
java -cp $LIBS \
-Dgloze.lang=N3 \
-Dgloze.base=$BASE_URI \
-Dgloze.verbose=true \
com.hp.gloze.GlozeURL \
$XML_INPUT \
> `basename $XML_INPUT.ttl`Meaningful RDF means here RDF obeying a well-known vocabulary, and when possible referencing well-known URI's like those of dbPedia (Linked Open Data). Let's take the example of a KML input, the sample in the Wikipedia KML page.
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark>
<name>New York City</name>
<description>New York City</description>
<Point>
<coordinates>-74.006393,40.714172,0</coordinates>
</Point>
</Placemark>
</Document>
</kml>Here is what is obtained by Gloze :
@base <file:///home/jmv/mystuff> .
</home/jmv/mystuff> <http://www.opengis.net/kml/2.2#kml>
[ <http://www.opengis.net/kml/2.2#Document>
[ <http://www.opengis.net/kml/2.2#Placemark>
[ <http://www.opengis.net/kml/2.2#Point>
[ <http://www.opengis.net/kml/2.2#coordinates>
"-74.006393,40.714172,0" ] ;
<http://www.opengis.net/kml/2.2#description>
"New York City" ;
<http://www.opengis.net/kml/2.2#name>
"New York City"
] ] ] .As you can see, every instance of an XML tag becomes an instance of an RDF predicate. In beetwin predicates are inserted blank nodes introduced by the [] notation.
We could have given to Gloze the XML Schema of KML, but I don't bother that .
Our target vocabulary will be WGS84 Geo RDF (W3C's Geo ontology). We could have chosen the GeoNames ontology: http://www.geonames.org/ontology/documentation.html
Here are some N3 rules. N3 is language similar to SPARQL. In fact N3 was the main inspiration for SPARQL.
A KML Placemark becomes a Geo SpatialThing :
PREFIX kml: <http://www.opengis.net/kml/2.2#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
{?CONTAINER kml:Placemark ?PLACE} => {?PLACE a geo:SpatialThing}A KML Placemark has a KML Point; W3C's Geo ontology does allow a SpatialThing to have several points through location property.
{?PLACE kml:Point ?POINT} => {?POINT a geo:SpatialThing . ?PLACE geo:location ?POINT.}A kml:name becomes a rdfs:label.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
{?S kml:name ?NAME } => { ?S rdfs:label ?NAME }description
TO BE CONTINUED
BASE_URI="$HOME/mystuff"
TURTLE_INPUT=http://jmvanel.free.fr/jmv.rdf
source ~/src/Gloze/setClasspath.sh
java -cp $LIBS \
-Dgloze.lang=N3 \
-Dgloze.base=$BASE_URI \
-Dgloze.verbose=true \
-Dgloze.order=seq \
com.hp.gloze.GlozeURL \
$TURTLE_INPUT \
> $TURTLE_INPUT.xml