Skip to content

Commit c701298

Browse files
committed
Add session for triples #8
1 parent 7fb0b04 commit c701298

File tree

4 files changed

+214
-0
lines changed

4 files changed

+214
-0
lines changed
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
---
2+
layout: default
3+
title: "Bonus Lesson 12 : Triples, Counting, Analysing and Combining"
4+
nav_order: 12
5+
parent: Tutorial
6+
---
7+
8+
9+
## Lesson 12 : Using Triples for Counting, Analysing and Combining
10+
11+
In Metafacture every record is assinged a record id. Using this id Metafacture is able to disassemble metadata records into so called triples.
12+
With triples Metafacture is able to count, analyse and even combine or reassable metadata. In this session we learn about triples and their use cases.
13+
14+
![Triple](https://metafacture.org/img/triple.png)
15+
16+
### What is a triple in metafacture?
17+
18+
A triple describes a combinantion of Subject, Predicate, Object. It is inspired by RDF triples but in context of Metafacture you do not need URI for subject and predicate. Instead the triples contain certain values that are derived from the metadata stream events metafacture processes.
19+
20+
The subject is usually the id of an record in metafacture. Most decoders and handlers assinge a specific or an upcounted record id to each record. But you can also use the flux command `change-id`to specify the record id that you want to set.
21+
22+
### Small excursion: Metadata stream evens in metafacture
23+
24+
If you decode e.g. MARC Data and set the value in 001 as the record id ( when using marc-xml you need to add `| change-id(idliteral="id", keepidliteral="true")`)
25+
Metafacture internally translates this as the following:
26+
27+
Marc Record
28+
```
29+
001
30+
948469390
31+
245 Ind1=1 Ind2=2
32+
$$a Title
33+
$$b Remainder of title
34+
$$c Resposibility Statement
35+
```
36+
37+
`decode-marc21` translates this to a sequence of metadata events:
38+
-> Start record 948469390
39+
-> Literal 001: 948469390
40+
-> Start entity 24510
41+
-> Literal a: Title
42+
-> Literal b: Remainder of title
43+
-> Literal c: Respondibility Statement
44+
-> End entity
45+
-> End record
46+
47+
### Generating triples
48+
49+
When using the flux command `stream to triples` this is translated into triples:
50+
Each entity and each top level literal is turned into a triple:
51+
52+
```TSV
53+
948469390 001 948469390
54+
948469390 24510 {a:Title,b:Remainder of title,c: Respondibility Statement}
55+
56+
```
57+
58+
If you have metadata stream events like these:
59+
60+
![Generating Triples](https://metafacture.org/img/generatingTriples.png)
61+
62+
-> Start record record-id
63+
-> Literal name: Klaus
64+
-> Start entity died
65+
-> Literal when: 1401
66+
-> Literal where: HH
67+
-> End entity
68+
-> End record
69+
70+
With `stream-to-triples` this is translated into:
71+
72+
```TSV
73+
record-id name Klaus
74+
record-id died {when:1401,where:HH}
75+
```
76+
77+
In context of a full metafacture workflow this example would look something like this:
78+
79+
Input of an formeta record:
80+
```
81+
record-id{name: Klaus, died {when: 1401, where: HH} }
82+
```
83+
84+
FLUX:
85+
```
86+
inputFile
87+
|open-file
88+
|as-lines
89+
|decode-formeta
90+
|stream-to-triples
91+
|print
92+
;
93+
```
94+
95+
If you print it out directly it would look something like this:
96+
```
97+
record-id:name=Klaus (STRING)
98+
record-id:died={when:1401,where:HH} (ENTITY)
99+
```
100+
101+
You can see that the literal is typed by the statement in the braket. [See this example in the playground.](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cstream-to-triples%0A%7Cprint%0A%3B&data=record-id%7Bname%3A+Klaus%2C+died+%7Bwhen%3A+1401%2C+where%3A+HH%7D+%7D)
102+
103+
104+
In order to improve the readability or change the output to your liking you can configure the output with the `template` function
105+
106+
e.g. if you add `|template("Subject:${s} Predicate:${p} Object:${o}")` before print the output would change to:
107+
108+
```TSV
109+
Subject:record-id Predicate:name Object:Klaus
110+
Subject:record-id Predicate:died Object:{when:1401,where:HH}
111+
```
112+
113+
### Listing values to records
114+
115+
After you stream records into triples you can do a lot of things. The first use case would be listing all contributors of all records and the record id of the publication that they belong to.
116+
117+
FLUX:
118+
```
119+
"https://raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21"
120+
| open-http
121+
| as-lines
122+
| decode-marc21 // sets the value of 001 as record-id
123+
| fix(transformationFile)
124+
| stream-to-triples
125+
| template("${s}\t${o}")
126+
| print
127+
;
128+
```
129+
130+
FIX:
131+
```
132+
do list(path:"100??|700??","var":"$i")
133+
copy_field("$i.a","name.$append")
134+
end
135+
retain("name")
136+
```
137+
138+
It outputs the record id and the name of each contributor:
139+
140+
```TSV
141+
946638705 Kim, Soonsik
142+
94685887X Grunsky, Konrad
143+
947459928 Henze, Hartwig
144+
948469390 Burning, Michael
145+
950561274 Ru§�ppell, Georg
146+
950592463 Pru§�tting, Hanns
147+
950974439 Bigalke, Rainer
148+
953176436 Hanau, Peter
149+
954369300 Hommelhoff, Peter
150+
954377915 Horn, Norbert
151+
```
152+
153+
### Counting triples
154+
155+
After you stream records into triples you can do quiet a lot of things e.g. counting values.
156+
157+
158+
![Counting Triples](https://metafacture.org/img/countingTriples.png)
159+
160+
You are able to count the value of each element of an triple. Usually the object is the most interesting, but also the predicate can be interesting to.
161+
162+
In the next example we copy the publication places into a new top level field:
163+
164+
FLUX
165+
```
166+
"https://raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21"
167+
| open-http
168+
| as-lines
169+
| decode-marc21 // sets the value of 001 as record-id
170+
| fix(transformationFile)
171+
| stream-to-triples
172+
| count-triples(countBy="object")
173+
| template("${o}\ttimes\t${s}")
174+
| print
175+
;
176+
```
177+
178+
FIX:
179+
```
180+
do list(path:"260??","var":"$i")
181+
copy_field("$i.a","publicationPlace.$append")
182+
end
183+
184+
retain("publicationPlace")
185+
```
186+
187+
This results inot:
188+
```
189+
1 times Berlin
190+
1 times Bern
191+
1 times Brussels
192+
1 times Frankfurt am Main
193+
1 times Hohenwarsleben
194+
1 times Husum
195+
5 times Ko§�ln
196+
1 times Mu§�nchen
197+
1 times Mu§�nster/Westf.
198+
1 times New York
199+
1 times Newcastle
200+
1 times Oxford
201+
1 times Vienna
202+
1 times Washington, D.C./Baltimore
203+
```
204+
205+
You can add the following flux command after count triples in order to sort the results by the number of appeareance:
206+
`| sort-triples(by="object",order="decreasing",numeric="true")`
207+
208+
## Counting patterns
209+
210+
TODO
211+
212+
## Merging records
213+
214+
TODO: Add this section or add this as additional lesson.

docs/images/countingTriples.png

20.5 KB
Loading

docs/images/generatingTriples.png

68.7 KB
Loading

docs/images/triple.png

26.9 KB
Loading

0 commit comments

Comments
 (0)