Use Confluence for enterprise architecture, data governance, compliance and more with taxonomies and save on specialized tools.
Azure subscription used to host the add-on is no longer available. None of the queue, key, blob and other identifiers are valid, so anyone attempting to self-host the add-on will need to take care of the infrastructure set up.
Build sequence to automate:
mvn clean install validate
npm install
npm run release
mvn package -DskipTests
Install the ngrok tunnel:
sudo alternatives --install /usr/bin/ngrok ngrok <path to ngrok> <priority>
Run the tunnel on specific port:
ngrok http -subdomain=dalstonsemantics -region=au 8110
Running the actual application with Maven (default URL of https://dalstonsemantics.au.ngrok.io is baked into default config):
mvn spring-boot:run
Running the actual application with Java using fat jar (default URL of https://dalstonsemantics.au.ngrok.io is baked into default config):
java -jar target/taxonomies-for-confluence-MAJOR.MINOR.POINT.jar
Text must be included in quotes, othwewise taxonomiesForConfluenceSubjectNotation
, taxonomiesForConfluenceSubjectUri
and taxonomiesForConfluenceSubjectPreferredLabel
can be used with CQL. AltLabel is not included in the index because (a) Atlassian doesnt support it and (b) Use case for alt labels is human readability and search, not integration.
https://dalstonsemantics.atlassian.net/wiki/rest/api/search?cql=taxonomiesForConfluenceSubjectNotation=7291&limit=25 https://dalstonsemantics.atlassian.net/wiki/rest/api/search?cql=taxonomiesForConfluenceSubjectUri=%22https://dalstonsemantics.com/ns/au/gov/abs/anzsic/7291%22&limit=25 https://dalstonsemantics.atlassian.net/wiki/rest/api/search?cql=taxonomiesForConfluenceSubjectPreferredLabel="Office Administrative Services"&limit=25
https://dalstonsemantics.atlassian.net/wiki/rest/api/search?cql=taxonomiesForConfluenceTypeUri=%22https://dalstonsemantics.com/taxonomy/sdlc/readme%22&limit=25 https://dalstonsemantics.atlassian.net/wiki/rest/api/search?cql=taxonomiesForConfluenceTypePreferredLabel="Readme"&limit=25
Atlassian recommendation is to prefix the property key with the name of your add-on to ensure they are globally unique. There are several alternatives:
- Use
taxonomiesForConfluence
as prefix (so resulting names aretaxonomiesForConfluenceSubject
,taxonomiesForConfluenceSubjectNotation
, etc.). It follows the recommendation, but name of our plugin may change going forward - Use
dalstonSemantics
as prefix (so resulting names aredalstonSemanticsSubject
,dalstonSemanticsSubjectNotation
, etc.). It lets us vary the plugin name, but does not follow the guide to the letter. - Use
dcterms
as prefix (so resulting names aredctermsSubject
,dctermsSubjectNotation
, etc.). It lets us vary the plugin name, explicitly matches the property used in the knowledge graph, but does not follow the guidelines at all and vast majority of people will not know whatdcterms
stand for.
Decision - use option #1 (taxonomiesForConfluence
prefix).
Given that we now have two bylines that may actually be swapped on load, we need to decide what to do with the icons.
Decision - Keep concept scheme and concept icons for the tables. In pages use "S tab" for Subject icon, "T tab" for Type icon and "R tab" for Relation macro.
- Adopt
https
as a scheme for namespaces created or used in connection with this application. Host
repository has single context.Taxonomies
repository has 3 contexts per client:https://tfc.dalstonsemantics.com/taxonomy/${iss}-${version}
,https://tfc.dalstonsemantics.com/content/${iss}
andhttps://tfc.dalstonsemantics.com/taxonomy-version/${iss}
whereiss
is the client identifier supplied by confluence,${version}
is generated by the add-on.- Taxonomy stores all imported taxonomies and their provenance.
- Content stores all content and subject data and their provenance.
- Taxonomy version stores the versions as a linked list
- Content is
https://tfc.dalstonsemantics.com/content/{iss}-${content}
whereiss
is the client identifier supplied by Confluence andcontent
is unique identifier of content within the client instance supplied by Confluence, supplied by Confluence. This includes pages, blog posts (withteam:page
orteam:blogpost
types) and properties (recorded asrdf:Statement
withrdf:predicate
set todc:subject
). Macros are created withhttps://tfc.dalstonsemantics.com/macro/${marcro}
where${macro}
is unique macro identifier assigned by Confluence (recorded asrdf:Statement
withrdf:predicate
set todc:subject
). - Content classes
https://dalstonsemantics.com/ns/com/atlassian/${type}
wheretype
is eitherpage
orblogpost
supplied by Confluence. - Agent is
https://tfc.dalstonsemantics.com/agent/${sub}
withsub
being unique user identifier supplied by Confluence (unique across instances). - Activity is
https://tfc.dalstonsemantics.com/activity/${uuid}
where uuid is GUID generated by the application.
- User cache is using full URI for the request.
- AddOn cache is using full URI for the request.
- Sticking with ehcache for more control and to avoid calculating tokens in the call (alternative is to configure RestTemplate with cache). See https://dzone.com/articles/caching-with-apache-http-client-and-spring-resttem for caching with Spring RestTemplate.
- AtlassianHost cache is using clientKey.
- No caching for Content call because we need latest version always in the webhooks.
- Cashing static content using Spring configuration
spring.resources.chain
andspring.resources.cache
. - JS are assined unique names per version for effective browser caching.
- Images are cached with max value of 365 days.
- Responses to Related macro are cached by adding
Cache-Control
header explicitly.
- Use remote Rdf4J repositories.
- Separate host (single repository) and taxonomy (per Atlassian Host) repositories.
- Shard taxonomy repositories based on the client identifier supplied by Confluence.
For TLS configuration see https://docs.microsoft.com/en-us/azure/aks/ingress-tls
az acr create --resource-group dalstonsemantics --name dalstonsemantics --sku Basic
az acr login --name dalstonsemantics
docker push dalstonsemantics.azurecr.io/rdf4j-server:4.2.4-tomcat-9.0.73-jdk17-temurin-jammy
docker push dalstonsemantics.azurecr.io/rdf4j-workbench:4.2.4-tomcat-9.0.73-jdk17-temurin-jammy
docker push dalstonsemantics.azurecr.io/tfc:4.16.0
VM: https://docs.microsoft.com/en-us/azure/virtual-machines/dv4-dsv4-series
az aks create --resource-group dalstonsemantics --name tfc-production --node-count 1 --node-vm-size Standard_D4s_v4 --generate-ssh-keys --network-plugin azure --attach-acr dalstonsemantics --enable-managed-identity
az aks get-credentials --resource-group dalstonsemantics --name tfc-production
Once managed identity is enabled with --enable-managed-identity
new roles need to be created and assigned: https://azure.github.io/aad-pod-identity/docs/getting-started/role-assignment/
az aks show --resource-group dalstonsemantics --name tfc-production --query identityProfile.kubeletidentity.clientId -o tsv
az role assignment create --role "Managed Identity Operator" --assignee 080fa512-47e6-4851-a494-ea68d6d1c1dd --scope /subscriptions/1677ad7d-8b89-4171-b3a1-c9e8d2f4cca0/resourceGroups/MC_dalstonsemantics_tfc-production_uksouth
az role assignment create --role "Virtual Machine Contributor" --assignee 080fa512-47e6-4851-a494-ea68d6d1c1dd --scope /subscriptions/1677ad7d-8b89-4171-b3a1-c9e8d2f4cca0/resourceGroups/MC_dalstonsemantics_tfc-production_uksouth
Then AAD Pod Identity needs to be installed via Helm or manifest: https://azure.github.io/aad-pod-identity/docs/getting-started/installation/
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm repo update
helm install aad-pod-identity aad-pod-identity/aad-pod-identity
Apply exception for MIC to make sure that its calls are not proxied via NMI as per https://azure.github.io/aad-pod-identity/docs/configure/application_exception/
kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/mic-exception.yaml
Create new managed identity using node resource group for the cluster:
az aks show --resource-group dalstonsemantics --name tfc-production --query nodeResourceGroup -otsv
az identity create --resource-group MC_dalstonsemantics_tfc-production_uksouth --name tfc
az identity show --resource-group MC_dalstonsemantics_tfc-production_uksouth --name tfc --query clientId -otsv
az identity show --resource-group MC_dalstonsemantics_tfc-production_uksouth --name tfc --query id -otsv
Create identity and biding in the tfc namespace:
kubectl apply -f ./tfc-identity-manifest.yaml --namespace tfc
kubectl apply -f ./tfc-identity-binding-manifest.yaml --namespace tfc
kubectl create namespace tfc
Using version 1.0.0 of ingress controller to be compatible with the latest version of the Helm chart:
az acr import --name dalstonsemantics --source k8s.gcr.io/ingress-nginx/controller:v0.48.1 --image ingress-nginx/controller:v0.48.1
az acr import --name dalstonsemantics --source k8s.gcr.io/ingress-nginx/controller:v1.0.0 --image ingress-nginx/controller:v1.0.0
az acr import --name dalstonsemantics --source docker.io/jettech/kube-webhook-certgen:v1.5.1 --image jettech/kube-webhook-certgen:v1.5.1
az acr import --name dalstonsemantics --source k8s.gcr.io/defaultbackend-amd64:1.5 --image defaultbackend-amd64:1.5
az acr import --name dalstonsemantics --source quay.io/jetstack/cert-manager-controller:v1.8.0 --image jetstack/cert-manager-controller:v1.8.0
az acr import --name dalstonsemantics --source quay.io/jetstack/cert-manager-webhook:v1.8.0 --image jetstack/cert-manager-webhook:v1.8.0
az acr import --name dalstonsemantics --source quay.io/jetstack/cert-manager-cainjector:v1.8.0 --image jetstack/cert-manager-cainjector:v1.8.0
For original version of the chart use --version 3.34.0
.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx --namespace tfc --set controller.replicaCount=1 --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux --set controller.image.registry=dalstonsemantics.azurecr.io --set controller.image.image=ingress-nginx/controller --set controller.image.tag=v1.0.0 --set controller.image.digest="" --set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux --set controller.admissionWebhooks.patch.image.registry=dalstonsemantics.azurecr.io --set controller.admissionWebhooks.patch.image.image=jettech/kube-webhook-certgen --set controller.admissionWebhooks.patch.image.tag=v1.5.1 --set controller.admissionWebhooks.patch.image.digest="" --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux --set defaultBackend.image.registry=dalstonsemantics.azurecr.io --set defaultBackend.image.image=defaultbackend-amd64 --set defaultBackend.image.tag=1.5 --set controller.service.externalTrafficPolicy=Local
Monitor IP assignment:
kubectl --namespace tfc get services -o wide -w nginx-ingress-ingress-nginx-controller
Add allocated IP to the subdomain record:
az network dns record-set a add-record --resource-group dalstonsemantics --zone-name dalstonsemantics.com --record-set-name tfc --ipv4-address <ADDRESS>
kubectl label namespace tfc cert-manager.io/disable-validation=true
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace tfc --version v1.8.0 --set installCRDs=true --set nodeSelector."beta\.kubernetes\.io/os"=linux --set image.repository=dalstonsemantics.azurecr.io/jetstack/cert-manager-controller --set image.tag=v1.8.0 --set webhook.image.repository=dalstonsemantics.azurecr.io/jetstack/cert-manager-webhook --set webhook.image.tag=v1.8.0 --set cainjector.image.repository=dalstonsemantics.azurecr.io/jetstack/cert-manager-cainjector --set cainjector.image.tag=v1.8.0
No need to specify namespace, since we are using a cluster-wide issuer.
kubectl apply -f .\ci-manifest.yaml
helm --namespace tfc delete cert-manager
kubectl delete namespace cert-manager
kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.3.1/cert-manager.crds.yaml
Certificate requests were stuck if old ClusterIssuer was not removed.
kubectl delete -f .\ci-manifest.yaml
StatefulSet: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Rdf4j Server is deployed as stateful set, Rdf4j Workbench is deployed as Deployment with single replica. Taxonomies for Confluence is responsible for sharding - we use rdf4j repository per client, supply list of servers with well-known names to tfc via Spring configuration on startup, during startup repository pools are created with connections added as clients connect.
To scale we need to:
- Add rdf4j-server replicas
- Shut downs tfc replicas
- Update configuration to add new rdf4j-server to config
- Restart rfc back up with required number of replicas
kubectl apply -f .\tfc-manifest.yaml --namespace tfc
kubectl scale statefulset rdf4j-server --namespace tfc --replicas=0
kubectl scale statefulset rdf4j-server --namespace tfc --replicas=1
kubectl scale deployment rdf4j-workbench --namespace tfc --replicas=0
kubectl scale deployment rdf4j-workbench --namespace tfc --replicas=1
kubectl scale deployment tfc --namespace tfc --replicas=0
kubectl scale deployment tfc --namespace tfc --replicas=1
We are using Azure file share, and so need to use Azure Backup and Recovery services https://docs.microsoft.com/en-us/azure/backup/backup-afs. Backups are daily 2am UTC (end of day US West Coast). Repositories are configured with ns:forceSync
set to true
to force flush on updates.
Restore sequence:
- Shut down tfc
- Shut down rdf4j-server
- Restore
- Bring back rdf4j-server
- Bring back tfc
We only need two Tomcat users to start with, so can simply store entire tomcat-user.xml
files as opaque Secret
s.
These files are mounted into new location /usr/local/tomcat/users
in rdf4j containers and server.xml
is modified accordingly in Dockerfile
to
point to users/tomcat-users.xml
.
kubectl create secret generic rdf4j-server-tomcat-users --from-file=tomcat-users=./rdf4j_data_server_users/tomcat-users.xml --namespace tfc
kubectl create secret generic rdf4j-workbench-tomcat-users --from-file=tomcat-users=./rdf4j_data_workbench_users/tomcat-users.xml --namespace tfc
When tfc
needs to connect we use ConfigMap
for host names and a Secret
for usernames and passwords. Because rdf4j-server is statefulset we need
host names in the form http://rdf4j-server-0.rdf4j-server.tfc.svc.cluster.local:8080/rdf4j-server
.
kubectl create secret generic tfc --from-literal=tfc_host_repository_username=XXX --from-literal=tfc_host_repository_password=XXX --from-literal=tfc_taxonomy_repository_username=XXX --from-literal=tfc_taxonomy_repository_password=XXX --namespace=tfc
- Resrict access to
/installed
and/uninstalled
endpoints from https://ip-ranges.atlassian.com/ use ofnginx.ingress.kubernetes.io/whitelist-source-range
on theIngress
. Three differentIngress
entities are created. For this to work we need to use--set controller.service.externalTrafficPolicy=Local
when creating ingress controller. - Check that baseUrl host is ending with
atlassian.net
(perform check synchronously as we are persisting the AtlassianHost) - Restrict volume of requests to 20 requests per second per single IP (via Ingress configuration)
- Restrict taxonomy repository volume to
max-size
(check prior to uploading taxonomy or indexing)
- Avoid storing personal information such as name and image. Instead dynamically retrieve them from Atlassian API
- Initially provide simple federation on the server, if this is causing too much delay switch to client retrieving data from Atlassian API
- blog_created
- blog_trashed
- blog_restored
- blog_updated
- blog_removed
- content_created
- content_updated
- content_removed
- page_created
- page_trashed
- page_restored
- page_updated
- page_removed
Event for moving the page from Archive, page_moved
, is not handled because archived page is different from trashed
.
For event payload see WEBHOOKS.md.
- Webhook documentation is under https://developer.atlassian.com/cloud/confluence/modules/webhook/
excludeBody
isfalse
by default, so omitted from descriptor.propertyKeys
is only supported by Jira, so omitted from descriptor.- Moving to webhooks wholesale even if delivery is not reliable and history becomes unavailable. Pages are created for human point-in-time consumption. Audit use case can be addressed separately via audit logging from UI.
- Page and blog updates are in-place.
- Trashing or archiving a page or a blog keeps it in place, just sets the status to "trashed"
- Removing the page or blog keeps the subject Statement triples with existing status, usually "current".
- Creation and update of page property triggers a fetch of content and setting of the subject Statement.
- Generally using accountId from the event to avoid extra REST calls to Atlassian
GET https://dalstonsemantics.atlassian.net/wiki/rest/api/content/{content-id}/version/{version}?expand=content.body.storage
Then parse
<ac:structured-macro ac:name=\"taxonomies-for-confluence-relation\" ac:schema-version=\"1\" ac:local-id=\"6ce014fa-97b1-49a8-8b19-6028a8254930\" ac:macro-id=\"c61d3942-378f-4011-b2c0-2a48e96378d6\"><ac:parameter ac:name=\"prefLabel\">Data base management systems</ac:parameter><ac:parameter ac:name=\"uri\">https://dalstonsemantics.com/reg/au/apra/cpg235/0165</ac:parameter></ac:structured-macro>
Ideally environments (Dev, Prod) are organized by subscription or resource group. We don't really have the luxury at this point. We got to a wrong start with existing naming with the items below being in the same resource group:
- Kubernetes cluster is called
tfc-production
, with related Log Analytics workspacetfc-production
and Recovery Services Vaulttfc-production
- this is production specific compute - Key Vault is called
tfc
and it is common to all environments
Environments can be thought of mapping to Atlassian Host Product and Application Key. This way we can:
- Keep single resouce group with "production environment" in Azure (forgetting about the naming issues for a second).
- Keep production-specific compute naming (
tfc-production
) - Share infrastructure such as container registry (
dalstonsemantics
), key vault (tfc
), storage service (tfcproduction
becausetfc
was not available) - Where distinction needs to be made, e.g. messages and state changes, use filters and filter by application key
- Single version of the Statement derived from page or blog post property (don’t try to create multiple prov statements, it is not sound as provenance and versioning are not the same thing). When updating, delete Statements written previously but keep the Agent.
- Single version of the Statement derived from macro, same as with page or blog post property
- No PROV statements generated for import, as we can't know the provenance beyound (unless we start importing from )
Approach we are taking to versioning is very simple:
- Assuming versions move forward
- We can calculate diff between the versions with basic CUD (Create, Update, Delete) - possibly using delta ontology as per our other work here: https://github.com/cadmiumkitty/streaming-linked-data-to-the-ui
- Move involves swapping versions and migrating pages (both macros and property)
- Assuming there is only one draft moving through states of "draft"->"importing"->"calculating difference"->"draft"->"calculating conten impact"->"awaiting switch to current"->"switching to current"->"current"->"historical"
- Versions are just a list with head. We don't keep version numbers, but rather a linked list, with head going through the states
- There is enough existing terms to complete migration without creating an extra bit of UI - using
owl:deprecated
as a substitution to complete deletion anddcterms:replaces
anddcterms:replacedBy
for page migration. - When calculating impact we only conseder "current" Statemens about "current" pages. Pages in Trash will not be updated, and conflicts will need to be dealt with if and when the page is restored. This is to keep the counters consistent with the Taxonomy Admin and Taxonomy pages.
State machne transitions are managed via command messages sent to a specific queue in Storage Queue. Filtering on the queues is not possible, so we need to settle on using application key as part of the queue name to separate environments. Write to version store synchronously updating the status then fire even to finish processing.
Existing work: https://www.w3.org/wiki/SkosCoreGuideToc/SectionVersioning and http://zbw.eu/labs/en/blog/skos-history-new-method-for-change-tracking-applied-to-stw-thesaurus-for-economics
docker scout cves dalstonsemantics.azurecr.io/rdf4j-server:4.2.4-tomcat-9.0.73-jdk17-temurin-jammy > rdf4j-server-4.2.4-tomcat-9.0.73-jdk17-temurin-jammy.txt
docker scout cves dalstonsemantics.azurecr.io/rdf4j-workbench:4.2.4-tomcat-9.0.73-jdk17-temurin-jammy > rdf4j-workbench-4.2.4-tomcat-9.0.73-jdk17-temurin-jammy.txt
docker scout cves dalstonsemantics.azurecr.io/tfc:4.16.0 > tfc-4.16.0.txt
org.owasp:dependency-check-maven
is bound to Maven validate
.
Scanner is configured to disable .NET, Node and Archive analysers given we are not using either of them.
com.github.spotbugs:spotbugs-maven-plugin
is bound to Maven validate
. We are excluding CRLF_INJECTION_LOGS
defect (Log CRLF injection) because it was addressed, yet the scanner can't pick the implementat up.
To view results one needs to use
mvn spotbugs:gui
Release branch convention is release/<Major>.<Minor>.<Point>
such as release/1.13.0
.
git tag -a v1.13.0 -m "Release 1.13.0"
git push origin v1.13.0
git push origin development:master
Initial release only includes four simple taxonomies:
- Australian and New Zealand Standard Industrial Classification: https://github.com/cadmiumkitty/anzsic-taxonomy/releases/download/v1.0.0/anzsic.ttl
- Policy Taxonomy: https://github.com/cadmiumkitty/policy-taxonomy/releases/download/v1.0.0/policy-taxonomy.ttl
- SDLC Document Types: https://github.com/cadmiumkitty/sdlc-document-types-taxonomy/releases/download/v1.0.0/sdlc-document-types-taxonomy.ttl
- TOGAF Architecture Repository Document Types: https://github.com/cadmiumkitty/togaf-architecture-repository-document-types-taxonomy/releases/download/v1.0.0/togaf-architecture-repository-document-types-taxonomy.ttl
- TOGAF Content Metamodel: https://github.com/cadmiumkitty/togaf-content-metamodel-ontology/releases/download/v2.0.1/VocabularyTOGAFContentMetamodelV2.ttl
Additional schemas and taxonomies (RDF, RDFS are reditributed on a URI I can control):
- RDF: https://github.com/cadmiumkitty/data-governance/releases/download/v2.0.0/22-rdf-syntax-ns.ttl
- RDFS: https://github.com/cadmiumkitty/data-governance/releases/download/v2.0.0/rdf-schema.ttl
- Schema.org: https://schema.org/version/latest/schemaorg-current-https.ttl
- Data Governance Schema and Taxonomy: https://github.com/cadmiumkitty/data-governance/releases/download/v2.0.0/data-governance.ttl");
Use scheme URIs for the catalog records for now, pass these identifiers to controller in scheme
URL-encoded form parameters, hardcode actual catalog, map to resources, stick resources into common message that can be processed by common resource importer.
Best Practice: When to Use rdfs:label Versus dcterms:title
Labels used across ConceptScheme, Concept and Content:
- ConceptScheme:
skos:prefLabel
,rdfs:label
- Concept:
skos:prefLabel
- Content:
dcterms:title
RUN wget https://download-gcdn.ej-technologies.com/jprofiler/jprofiler_linux_13_0_1.tar.gz -P /tmp/ && tar -xzf /tmp/jprofiler_linux_13_0_1.tar.gz -C /usr/local &&rm /tmp/jprofiler_linux_13_0_1.tar.gz
andEXPOSE 8849
to Dockerfile for rdf4j-server- Add
8849:8849
tocompose.yml
for rdf4j-server - Add
-agentpath:/usr/local/jprofiler13.0.1/bin/linux-x64/libjprofilerti.so=port=8849
tovariables-rdf4j-server-docker.env
Parsing rules in order of implementation priority:
- Multiple properties of page as the resource - Table with no properties in the header of {Property in the first column} {Value in the second column}.
- Multiple resources with composite identifiers of page-table-rownumber - Table of {Property in the header} {Value in the cell}
- Multiple resources with custom identifiers in first column without a property
- Single property of page as the resource, outside of the tables where properties are in the first column or in the header - {Property} {Value - single word, multiple words in quotes with double quote escaping or single quotes, paragraph if at the start of the paragraph, macro
ac:plain-text-body
content if macro} (not currently implemented)
Use content of any of the macro's ac:plain-text-body
as value, unless it is a resource macro in which case pick up the URI.
- Initial implementation for multiple resources with composite identifiers is to remove all
rdf:Statements
that have eitherteam:macroId
orteam:tableId
properties set. - Simple tables only (one level deep)
- Change prefix for all statements created from Confluence Page Property from
https://tfc.dalstonsemantics.com/content/
tohttps://tfc.dalstonsemantics.com/property/
- Add
team:macroId
to all statements created from Confluence Page Macros
- Use value of the relation property for
dcterms:relation
- Use content identifier and version for table of properties of the page (otherwise too much noise for not enought gain)
- Use content identifier and version for table of resources (otherwise too much noise for not enought gain)
- Page names are unique within a space, so we can create a service that grabs an identifier that is globally unique
- Page links with spaces in them are:
<ac:link ac:card-appearance="inline"><ri:page ri:space-key="IUM" ri:content-title="Sample SPARQL Page" ri:version-at-save="1" /><ac:link-body>Sample SPARQL Page</ac:link-body></ac:link>
- Page links without spaces in them are:
<ac:link ac:card-appearance="inline"><ri:page ri:content-title="Sample SPARQL Page" ri:version-at-save="1" /><ac:link-body>Sample SPARQL Page</ac:link-body></ac:link>
- Need to also deal with links like
<a href="#001">
and<a href="https://emorozov.atlassian.net/wiki/spaces/TFC/blog/2022/12/27/13991937/Sample+Blog#001">
and<a href="https://emorozov.atlassian.net/wiki/spaces/TFC/pages/23101443/Data+Steward#001">
- Statements are always created and re-created based on specific content id. We control statement URIs.
- IRIs can be used to as subject (in the identifier column) or as object (for a given predicate)
- Stable identifiers have namespace of content (
https://tfc.dalstonsemantics.com/content/
) - Only reference/anchor suffix is supported for stable identifiers
<a href="#001">
is the easiest case - both as subject and object as we simply use the resource IRI without creating any additional content records<a href="https://emorozov.atlassian.net/wiki/spaces/TFC/blog/2022/12/27/13991937/Sample+Blog#001">
and<a href="https://emorozov.atlassian.net/wiki/spaces/TFC/pages/23101443/Data+Steward#001">
we can simply state that it is the resonsibilty of user to create these. We'll translate to what the resource identifiers would have been and use them as either subject (may look weird and can screw the results in materialization) or object<a href="https://google.com">
is also trivial. We don't need to create identifiers from scratch and can simply create Statements where these IRIs are either subject or object.
- Delete content (but don't delete linked content)
- Delete and add content that is linked (this is the last we heard about it)
- Add agents that are linked (its content will never change)
- Add content and its statements
- We are not adding provenance to content and agents, provenance is only added to statements.
- Separate tabs for Classes and Properties, so that we can always show top classes and top properties and create different stats
- Use isDefinedBy to identify defining resources, calculates stats similar to Concept Schemes
- Calculate stats for all classes and properties where no isDefinedBy is used
- Create extra item in the tree to show classes and properties where no isDefinedBy is used
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX team: <https://dalstonsemantics.com/ns/com/atlassian/>
PREFIX anzsic: <https://dalstonsemantics.com/ns/au/gov/abs/anzsic/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?content_id ?account_id
WHERE {
?page a team:page ;
team:contentId ?content_id.
?agent a prov:Agent ;
team:accountId ?account_id.
?page ?property ?agent .
}
https://emorozov.atlassian.net/wiki/rest/api/content/25460737/version/2?expand=content.body.storage