-
Notifications
You must be signed in to change notification settings - Fork 145
Graph Versioning Implementation Plan
Version: 1.2 Date: 2026-01-07 Author: Implementation Plan (AI-assisted)
This document outlines the implementation plan for adding GitHub-based RDF graph versioning to LinkedDataHub with Memento protocol support, as specified in versioning.md.
Design Decisions:
- ✅ Per-dataspace RDF configuration (in
config/system.trig) - ✅ Asynchronous background commits (non-blocking)
- ✅ Best-effort versioning (degrades gracefully if GitHub unavailable)
- ✅ Basic versioning first, Memento protocol in phase 2
- ✅ Custom Jersey-based GitHub client (no external dependencies)
Decision: Custom Jersey Client Implementation
Why roll our own instead of using a library (e.g., kohsuke:github-api):
✅ Zero new dependencies
- LinkedDataHub already has JAX-RS Jersey Client fully configured
- No additional library bloat (~1MB+ for external GitHub libraries)
- One less dependency to maintain/update/audit
✅ Perfect architectural fit
- Reuse existing HTTP client infrastructure (
noCertClient) - Leverage existing retry logic patterns (see
GraphStoreClient.java) - Integration with LinkedDataHub's logging/monitoring
- Consistent error handling across the codebase
✅ Minimal GitHub API usage
- We only need 4 REST endpoints (
PUT/GET/DELETEfile,GETcommits) - External libraries provide 100+ methods we don't need
- GitHub REST API is straightforward JSON over HTTP
✅ Already have JSON processing
- LinkedDataHub uses
jersey-media-json-processing - Simple POJO mapping for GitHub API responses
- No new serialization framework needed
✅ Educational value & maintainability
- Clear understanding of exactly what's happening
- No "magic" from external library
- Easier debugging and troubleshooting
- Future-proof against library deprecation
Trade-off:
- More code to write (~350-400 lines vs ~200 with library)
- Need to handle GitHub API specifics (Base64 encoding, SHA requirements, pagination)
Conclusion: For our limited use case (4 endpoints), a custom client is lighter, cleaner, and more maintainable.
Maven Dependencies: NONE required - all functionality uses existing dependencies
┌─────────────────────────────────────────────────────────────┐
│ LinkedDataHub │
│ │
│ ┌────────────┐ ┌──────────────────┐ │
│ │ Graph │──┬──▶│ VersioningFilter │ │
│ │ (JAX-RS) │ │ └─────────┬────────┘ │
│ └────────────┘ │ │ async │
│ │ ▼ │
│ ┌────────────┐ │ ┌──────────────────┐ │
│ │ Graph │ └──▶│ GraphVersioning │ │
│ │ Store │ │ Service │ │
│ │ Client │ └─────────┬────────┘ │
│ └────────────┘ │ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────────┐ │
│ │ │ GitHubClient │ │
│ │ │ (Jersey-based) │ │
│ │ └─────────┬────────┘ │
│ ▼ │ │
│ ┌────────────┐ │ │
│ │ Fuseki │ │ │
│ │ Triplestore│ │ │
│ └────────────┘ │ │
└────────────────────────────────┼────────────────────────────┘
│
▼
┌─────────────────┐
│ GitHub Repo │
│ (graphs/*.nt) │
└─────────────────┘
Graph Modification (PUT/POST/PATCH/DELETE):
- Request →
Graph.java→ GraphStoreClient → Fuseki (write to triplestore) - Response →
VersioningFilter(intercepts) - Filter checks if dataspace has versioning enabled
- If enabled: Submit async task to
GraphVersioningService - Service: Serialize Model → N-Triples → GitHub commit
- Return
HTTP 200/201(doesn't wait for GitHub)
Historical Retrieval (GET with ?version=sha):
- Request →
Graph.javadetects version parameter - Call
GraphVersioningService.getVersionAtCommit() - Fetch from GitHub → Parse N-Triples → Return Model
File: pom.xml
No changes needed. All required dependencies already present:
- ✅ Jersey Client API (
org.glassfish.jersey.core:jersey-client) - ✅ JSON Processing (
org.glassfish.jersey.media:jersey-media-json-processing) - ✅ Apache Jena for RDF (
org.apache.jena:jena-arq)
Effort: 0 hours
File: src/main/java/com/atomgraph/linkeddatahub/vocabulary/LAPP.java
Add versioning-related properties:
public static final Property versioningRepository = property("versioningRepository");
public static final Property branch = property("branch"); // For doap:GitRepository
public static final Property pathPrefix = property("pathPrefix"); // For doap:GitRepositoryDesign Decision:
- Use
doap:GitRepositoryfor repository metadata (DOAP vocabulary athttp://usefulinc.com/ns/doap#) - Application links to repository via
lapp:versioningRepositoryproperty - Repository has standard
doap:locationplus customlapp:branchandlapp:pathPrefix - Benefits: Standard vocabulary, reusable repository resources, cleaner RDF structure
Effort: 1 hour
File: config/system.trig
Example Configuration:
@prefix doap: <http://usefulinc.com/ns/doap#> .
<urn:linkeddatahub:apps/end-user> a lapp:Application, lapp:EndUserApplication ;
dct:title "LinkedDataHub" ;
lapp:origin <https://localhost:4443> ;
ldt:ontology <https://localhost:4443/ns#> ;
ldt:service <urn:linkeddatahub:services/end-user> ;
ac:stylesheet <static/xsl/layout.xsl> ;
lapp:adminApplication <urn:linkeddatahub:apps/admin> ;
lapp:frontendProxy <http://varnish-frontend:6060/> ;
lapp:public true ;
# Versioning configuration - links to repository resource
lapp:versioningRepository <urn:linkeddatahub:versioning/graphs-repo> .
# Separate repository resource using DOAP vocabulary
<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
doap:location <https://github.com/AtomGraph/ldh-graphs> ;
lapp:branch "main" ;
lapp:pathPrefix "graphs" .GitHub Token Configuration:
Since tokens shouldn't be in RDF, use one of:
Option A: Environment Variable (recommended)
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxx"Option B: Java System Property
-Dgithub.token=ghp_xxxxxxxxxxxxxOption C: Encrypted in RDF (advanced)
lapp:versioningToken "encrypted:base64encodedvalue"Effort: 1 hour (documentation)
File: src/main/java/com/atomgraph/linkeddatahub/client/GitHubClient.java
package com.atomgraph.linkeddatahub.client;
import jakarta.json.Json;
import jakarta.json.JsonArray;
import jakarta.json.JsonObject;
import jakarta.ws.rs.client.Client;
import jakarta.ws.rs.client.Entity;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.Instant;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Base64;
import java.util.Date;
import java.util.Optional;
/**
* Custom Jersey-based GitHub API client for graph versioning.
* Implements only the 4 REST endpoints we need.
*/
public class GitHubClient {
private static final Logger log = LoggerFactory.getLogger(GitHubClient.class);
private static final String API_BASE = "https://api.github.com";
private static final String ACCEPT_HEADER = "application/vnd.github+json";
private final Client httpClient;
private final String token;
private final String owner;
private final String repo;
private final String branch;
public GitHubClient(Client httpClient, String token, String owner,
String repo, String branch) {
this.httpClient = httpClient;
this.token = token;
this.owner = owner;
this.repo = repo;
this.branch = branch;
}
/**
* PUT /repos/{owner}/{repo}/contents/{path}
* Create or update a file.
*
* @param path File path (e.g., "graphs/data/products.nt")
* @param content Raw file content
* @param message Commit message
* @return Commit SHA
*/
public String putFile(String path, byte[] content, String message) {
// Encode content to Base64 (GitHub requirement)
String encodedContent = Base64.getEncoder().encodeToString(content);
// Try to get existing file SHA (needed for updates)
Optional<String> existingSha = getFileSha(path);
// Build request body
var bodyBuilder = Json.createObjectBuilder()
.add("message", message)
.add("content", encodedContent)
.add("branch", branch);
if (existingSha.isPresent()) {
bodyBuilder.add("sha", existingSha.get());
}
JsonObject body = bodyBuilder.build();
// PUT request
Response response = httpClient.target(API_BASE)
.path("repos/{owner}/{repo}/contents/{path}")
.resolveTemplate("owner", owner)
.resolveTemplate("repo", repo)
.resolveTemplate("path", path)
.request(MediaType.APPLICATION_JSON)
.header("Authorization", "Bearer " + token)
.header("Accept", ACCEPT_HEADER)
.put(Entity.json(body));
if (response.getStatus() == 201 || response.getStatus() == 200) {
JsonObject result = response.readEntity(JsonObject.class);
String commitSha = result.getJsonObject("commit").getString("sha");
log.info("GitHub file {} committed: {}", path, commitSha);
return commitSha;
} else {
String error = response.readEntity(String.class);
log.error("GitHub API error {}: {}", response.getStatus(), error);
throw new RuntimeException("GitHub API error: " + response.getStatus());
}
}
/**
* GET /repos/{owner}/{repo}/contents/{path}
* Get file SHA (for detecting existence and getting update token).
*/
private Optional<String> getFileSha(String path) {
try {
Response response = httpClient.target(API_BASE)
.path("repos/{owner}/{repo}/contents/{path}")
.queryParam("ref", branch)
.resolveTemplate("owner", owner)
.resolveTemplate("repo", repo)
.resolveTemplate("path", path)
.request(MediaType.APPLICATION_JSON)
.header("Authorization", "Bearer " + token)
.header("Accept", ACCEPT_HEADER)
.get();
if (response.getStatus() == 200) {
JsonObject result = response.readEntity(JsonObject.class);
return Optional.of(result.getString("sha"));
} else {
return Optional.empty(); // File doesn't exist
}
} catch (Exception e) {
log.debug("File {} not found: {}", path, e.getMessage());
return Optional.empty();
}
}
/**
* GET /repos/{owner}/{repo}/contents/{path}?ref={commitSha}
* Get file content at specific commit.
*/
public byte[] getFileAtCommit(String path, String commitSha) {
Response response = httpClient.target(API_BASE)
.path("repos/{owner}/{repo}/contents/{path}")
.queryParam("ref", commitSha)
.resolveTemplate("owner", owner)
.resolveTemplate("repo", repo)
.resolveTemplate("path", path)
.request(MediaType.APPLICATION_JSON)
.header("Authorization", "Bearer " + token)
.header("Accept", ACCEPT_HEADER)
.get();
if (response.getStatus() == 200) {
JsonObject result = response.readEntity(JsonObject.class);
String encodedContent = result.getString("content");
// Decode Base64 content (remove newlines first)
String cleaned = encodedContent.replace("\n", "").replace("\r", "");
return Base64.getDecoder().decode(cleaned);
} else {
throw new RuntimeException("Failed to get file at commit " + commitSha);
}
}
/**
* GET /repos/{owner}/{repo}/commits?path={path}&until={date}
* Find commit closest to datetime for specific file.
*/
public Optional<String> findCommitAtDatetime(String path, Date datetime) {
// Format date for GitHub API (ISO 8601)
String until = DateTimeFormatter.ISO_INSTANT.format(
datetime.toInstant()
);
Response response = httpClient.target(API_BASE)
.path("repos/{owner}/{repo}/commits")
.queryParam("path", path)
.queryParam("until", until)
.queryParam("per_page", 1) // We only need the first one
.resolveTemplate("owner", owner)
.resolveTemplate("repo", repo)
.request(MediaType.APPLICATION_JSON)
.header("Authorization", "Bearer " + token)
.header("Accept", ACCEPT_HEADER)
.get();
if (response.getStatus() == 200) {
JsonArray commits = response.readEntity(JsonArray.class);
if (commits.isEmpty()) {
return Optional.empty();
}
JsonObject firstCommit = commits.getJsonObject(0);
return Optional.of(firstCommit.getString("sha"));
} else {
log.error("Failed to find commit at datetime {}", datetime);
return Optional.empty();
}
}
/**
* DELETE /repos/{owner}/{repo}/contents/{path}
* Delete a file.
*/
public void deleteFile(String path, String message) {
// Need SHA to delete
Optional<String> sha = getFileSha(path);
if (sha.isEmpty()) {
log.warn("Cannot delete file {} - not found", path);
return;
}
JsonObject body = Json.createObjectBuilder()
.add("message", message)
.add("sha", sha.get())
.add("branch", branch)
.build();
Response response = httpClient.target(API_BASE)
.path("repos/{owner}/{repo}/contents/{path}")
.resolveTemplate("owner", owner)
.resolveTemplate("repo", repo)
.resolveTemplate("path", path)
.request(MediaType.APPLICATION_JSON)
.header("Authorization", "Bearer " + token)
.header("Accept", ACCEPT_HEADER)
.method("DELETE", Entity.json(body));
if (response.getStatus() != 200) {
log.error("Failed to delete file {}", path);
}
}
}Effort: 14 hours (including error handling, Base64 encoding/decoding, testing)
File: src/main/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningService.java
package com.atomgraph.linkeddatahub.server.service;
import com.atomgraph.linkeddatahub.client.GitHubClient;
import com.atomgraph.linkeddatahub.apps.model.Application;
import com.atomgraph.linkeddatahub.vocabulary.LAPP;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFDataMgr;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import jakarta.inject.Inject;
import java.io.ByteArrayOutputStream;
import java.io.ByteArrayInputStream;
import java.net.URI;
import java.util.Date;
import java.util.Optional;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
* Service for versioning RDF graphs to GitHub.
*/
public class GraphVersioningService {
private static final Logger log = LoggerFactory.getLogger(GraphVersioningService.class);
private final GitHubClient githubClient;
private final Application application;
private final ExecutorService executor;
private final String pathPrefix;
@Inject
public GraphVersioningService(Application application, Client httpClient, String githubToken) {
this.application = application;
// Read repository configuration from linked doap:GitRepository resource
Resource repoResource = application.getPropertyResourceValue(LAPP.versioningRepository);
if (repoResource == null) {
log.info("No versioning repository configured");
this.githubClient = null;
this.pathPrefix = null;
this.executor = null;
return;
}
// Parse GitHub URL from doap:location
String location = repoResource.getProperty(DOAP.location).getString();
// Extract owner/repo from https://github.com/owner/repo
String[] parts = location.replace("https://github.com/", "").split("/");
String owner = parts[0];
String repoName = parts[1];
String branch = repoResource.getProperty(LAPP.branch).getString();
this.pathPrefix = repoResource.getProperty(LAPP.pathPrefix).getString();
// Initialize custom Jersey-based GitHub client
this.githubClient = new GitHubClient(httpClient, githubToken, owner, repoName, branch);
// Background executor for async commits
this.executor = Executors.newFixedThreadPool(4);
}
/**
* Check if versioning is enabled for this application.
*/
public boolean isVersioningEnabled() {
return githubClient != null; // Enabled if repository configured
}
/**
* Map graph URI to file path.
* Example: https://localhost:4443/data/products/ → graphs/data/products.nt
*/
private String graphUriToPath(URI graphUri) {
URI baseUri = application.getURI();
String relativePath = baseUri.relativize(graphUri).getPath();
// Remove trailing slash
relativePath = relativePath.replaceAll("/$", "");
return pathPrefix + "/" + relativePath + ".nt";
}
/**
* Commit a graph version asynchronously.
*/
public void commitGraphVersionAsync(URI graphUri, Model model, String message) {
if (!isVersioningEnabled()) {
log.debug("Versioning not enabled for application");
return;
}
executor.submit(() -> {
try {
commitGraphVersion(graphUri, model, message);
} catch (Exception e) {
log.error("Failed to commit graph version for " + graphUri, e);
// TODO: Add to retry queue
}
});
}
/**
* Commit a graph version synchronously.
*/
private String commitGraphVersion(URI graphUri, Model model, String message) {
// Serialize to N-Triples
ByteArrayOutputStream baos = new ByteArrayOutputStream();
RDFDataMgr.write(baos, model, Lang.NTRIPLES);
byte[] content = baos.toByteArray();
// Map URI to path
String path = graphUriToPath(graphUri);
// Commit to GitHub
String commitSha = githubClient.putFile(path, content, message);
log.info("Committed graph {} to GitHub at {}", graphUri, commitSha);
return commitSha;
}
/**
* Retrieve graph at specific commit.
*/
public Optional<Model> getGraphAtCommit(URI graphUri, String commitSha) {
try {
String path = graphUriToPath(graphUri);
byte[] content = githubClient.getFileAtCommit(path, commitSha);
// Parse N-Triples back to Model
ByteArrayInputStream bais = new ByteArrayInputStream(content);
Model model = ModelFactory.createDefaultModel();
RDFDataMgr.read(model, bais, Lang.NTRIPLES);
return Optional.of(model);
} catch (Exception e) {
log.error("Failed to retrieve graph at commit " + commitSha, e);
return Optional.empty();
}
}
/**
* Retrieve graph at specific datetime (Memento).
*/
public Optional<Model> getGraphAtDatetime(URI graphUri, Date datetime) {
try {
String path = graphUriToPath(graphUri);
Optional<String> commitSha = githubClient.findCommitAtDatetime(path, datetime);
if (commitSha.isEmpty()) {
return Optional.empty();
}
return getGraphAtCommit(graphUri, commitSha.get());
} catch (Exception e) {
log.error("Failed to find commit at datetime " + datetime, e);
return Optional.empty();
}
}
/**
* Shutdown executor on application shutdown.
*/
public void shutdown() {
executor.shutdown();
}
}Effort: 16 hours (including integration, testing, error handling)
File: src/main/java/com/atomgraph/linkeddatahub/server/filter/response/VersioningFilter.java
package com.atomgraph.linkeddatahub.server.filter.response;
import com.atomgraph.linkeddatahub.server.service.GraphVersioningService;
import com.atomgraph.linkeddatahub.resource.Graph;
import org.apache.jena.rdf.model.Model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;
import java.net.URI;
import java.util.Date;
/**
* Filter that intercepts graph modifications and commits versions to GitHub.
*/
@Provider
@Priority(Priorities.USER + 400)
public class VersioningFilter implements ContainerResponseFilter {
private static final Logger log = LoggerFactory.getLogger(VersioningFilter.class);
@Inject
private GraphVersioningService versioningService;
@Override
public void filter(ContainerRequestContext request,
ContainerResponseContext response) {
// Only process Graph resource responses
if (!isGraphResource(request)) {
return;
}
// Only process successful modifications
if (!isSuccessfulModification(request, response)) {
return;
}
try {
// Get graph URI
URI graphUri = request.getUriInfo().getAbsolutePath();
// Get modified model from request attribute
// (Set by Graph.java during PUT/POST/PATCH)
Model model = (Model) request.getProperty("ldh.graph.model");
if (model == null) {
log.warn("No model found in request attributes for {}", graphUri);
return;
}
// Generate commit message
String method = request.getMethod();
String agent = getAuthenticatedAgent(request);
String message = String.format("%s %s by %s at %s",
method, graphUri, agent, new Date());
// Commit asynchronously
versioningService.commitGraphVersionAsync(graphUri, model, message);
// Add basic Link header for future Memento use
// Link: <commit-sha>; rel="version"
// (Will be enhanced in Phase 3)
} catch (Exception e) {
log.error("Error in versioning filter", e);
// Don't fail the response - versioning is best-effort
}
}
private boolean isGraphResource(ContainerRequestContext request) {
// Check if resource info matches Graph class
return request.getUriInfo().getMatchedResources().stream()
.anyMatch(r -> r instanceof Graph);
}
private boolean isSuccessfulModification(ContainerRequestContext request,
ContainerResponseContext response) {
String method = request.getMethod();
int status = response.getStatus();
return (method.equals("PUT") || method.equals("POST") ||
method.equals("PATCH") || method.equals("DELETE")) &&
(status >= 200 && status < 300);
}
private String getAuthenticatedAgent(ContainerRequestContext request) {
// Extract from AgentContext or SecurityContext
// TODO: Implement based on LinkedDataHub's auth mechanism
return "system";
}
}Effort: 12 hours (including integration with Graph.java to pass Model)
File: src/main/java/com/atomgraph/linkeddatahub/Application.java
Add to the HK2 binder configuration:
// In Application.java constructor, add to AbstractBinder:
bind(GraphVersioningService.class)
.to(GraphVersioningService.class)
.in(Singleton.class);
// GitHub token from environment variable
bindFactory(new Factory<String>() {
@Override
public String provide() {
String token = System.getenv("GITHUB_TOKEN");
if (token == null) {
token = System.getProperty("github.token");
}
if (token == null) {
log.warn("No GitHub token configured - versioning will be disabled");
return "";
}
return token;
}
@Override
public void dispose(String instance) {}
})
.named("githubToken")
.to(String.class);Effort: 4 hours (testing DI, handling edge cases)
File: src/main/java/com/atomgraph/linkeddatahub/resource/Graph.java
Add model to request attributes after successful operations:
// In put() method, after model is persisted:
getUriInfo().getRequestContext().setProperty("ldh.graph.model", model);
// Similarly in post() and patch() methodsEffort: 2 hours
Objective: Detect when GitHub files have been manually edited outside of LinkedDataHub, causing drift from Fuseki state.
Design Approach:
Manual edits to GitHub repository files (though not expected in normal operation) would create version history inconsistency. This feature detects such drift and fails the operation with a clear error message.
Implementation Strategy:
-
Store Last Known SHA:
- In-memory cache (ConcurrentHashMap) mapping file path → last committed SHA
- Populated after each successful
putFile()operation - Persisted to disk on shutdown (optional for Phase 2.2)
-
Pre-Commit Verification:
- Before each
putFile(), fetch current file SHA from GitHub API - Compare with cached last known SHA
- If different: external edit detected → throw
VersionConflictException - If same or no cache entry: proceed with commit
- Before each
-
Error Handling:
-
VersionConflictExceptionincludes:- Graph URI
- Expected SHA (from cache)
- Actual SHA (from GitHub)
- Error message for operators
- Log warning with details for debugging
- Don't attempt auto-resolution/merge - require manual intervention
- Optional: Configuration flag
lapp:versioningOverwriteOnConflictto force overwrite
-
Code Changes:
File: src/main/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningService.java
Add to class:
private final ConcurrentHashMap<String, String> lastKnownShas = new ConcurrentHashMap<>();
private void detectExternalEdits(String path) throws VersionConflictException {
String cachedSha = lastKnownShas.get(path);
if (cachedSha == null) {
return; // First commit, no conflict possible
}
Optional<String> currentSha = githubClient.getFileSha(path);
if (currentSha.isEmpty()) {
return; // File deleted externally, will be recreated
}
if (!cachedSha.equals(currentSha.get())) {
String msg = String.format(
"External edit detected for %s: expected SHA %s but found %s",
path, cachedSha, currentSha.get()
);
log.error(msg);
throw new VersionConflictException(msg);
}
}
// In commitGraphVersion(), before githubClient.putFile():
detectExternalEdits(path);
String commitSha = githubClient.putFile(path, content, message);
lastKnownShas.put(path, commitSha); // Update cacheNew Exception:
File: src/main/java/com/atomgraph/linkeddatahub/exception/VersionConflictException.java
package com.atomgraph.linkeddatahub.exception;
public class VersionConflictException extends RuntimeException {
public VersionConflictException(String message) {
super(message);
}
}Configuration Option (Optional):
<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
doap:location <https://github.com/AtomGraph/ldh-graphs> ;
lapp:branch "main" ;
lapp:pathPrefix "graphs" ;
lapp:overwriteOnConflict false . # Default: abort on conflictKey Considerations:
-
Performance: Adds one extra GitHub API
GETcall per commit (acceptable overhead) - Race Condition: Small window still exists between check and commit (inherent to distributed systems)
- Cold Start: Cache is empty on application restart - first commits after restart won't detect conflicts
- Persistence: Phase 2.2 can add cache persistence to disk for cold start coverage
- Alternative Approach: Use GitHub's built-in optimistic locking (SHA requirement) - already provides some protection
Testing:
- Manually edit file in GitHub web UI
- Attempt to update same graph in LinkedDataHub
- Verify
VersionConflictExceptionis thrown - Verify error message includes both SHAs
- Test overwrite configuration flag
Effort: 8 hours
File: src/main/java/com/atomgraph/linkeddatahub/server/service/VersioningRetryQueue.java
- Persistent queue (file-based or database)
- Exponential backoff retry logic
- Health check endpoint
Effort: 10 hours
File: src/main/java/com/atomgraph/linkeddatahub/resource/Graph.java
Modify GET method:
@GET
public Response get(@QueryParam("version") String commitSha) {
if (commitSha != null) {
// Retrieve from GitHub
Optional<Model> historical = versioningService.getGraphAtCommit(getURI(), commitSha);
if (historical.isPresent()) {
return Response.ok(historical.get())
.header("Memento-Datetime", getCommitDate(commitSha))
.build();
} else {
return Response.status(404).build();
}
}
// Normal flow - retrieve from triplestore
return super.get();
}Effort: 8 hours
- JMX beans for commit success/failure rates
- Logging with structured format
- Integration with existing LinkedDataHub monitoring
Effort: 6 hours
File: http-tests/versioning/ (new directory)
Test scenarios:
- Graph creation triggers commit
- Graph update triggers commit
- Version retrieval via
?version=sha - Versioning disabled per dataspace
- GitHub unavailable (graceful degradation)
Effort: 8 hours
Extend VersioningFilter to handle GET requests:
if (request.getMethod().equals("GET") && hasAcceptDatetime(request)) {
Date datetime = parseAcceptDatetime(request);
Optional<Model> memento = versioningService.getGraphAtDatetime(graphUri, datetime);
if (memento.isPresent()) {
// Replace response with historical version
response.setEntity(memento.get());
response.getHeaders().add("Memento-Datetime", formatHttpDate(datetime));
addMementoLinks(response, graphUri);
}
}Effort: 10 hours
File: src/main/java/com/atomgraph/linkeddatahub/resource/TimeGate.java
@Path("timegate")
public class TimeGate {
@GET
@Path("{path: .*}")
public Response timeGate(@PathParam("path") String path,
@HeaderParam("Accept-Datetime") String acceptDatetime) {
// Find memento closest to datetime
// Return 302 redirect to memento URI
}
}Effort: 8 hours
File: src/main/java/com/atomgraph/linkeddatahub/resource/TimeMap.java
List all versions in application/link-format:
<https://example.org/data/products?version=abc123>;
rel="memento"; datetime="Mon, 01 Jan 2024 12:00:00 GMT"
Effort: 10 hours
Add to all responses:
Link: <original-uri>; rel="original"
Link: <timegate-uri>; rel="timegate"
Link: <timemap-uri>; rel="timemap"
Memento-Datetime: Mon, 01 Jan 2024 12:00:00 GMT
ETag: "git-blob-<sha>"
Effort: 7 hours
@prefix lapp: <https://w3id.org/atomgraph/linkeddatahub/apps#> .
@prefix ldh: <https://w3id.org/atomgraph/linkeddatahub#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix ldt: <https://www.w3.org/ns/ldt#> .
@prefix doap: <http://usefulinc.com/ns/doap#> .
<urn:linkeddatahub:apps/end-user> a lapp:Application, lapp:EndUserApplication ;
dct:title "LinkedDataHub" ;
lapp:origin <https://localhost:4443> ;
ldt:ontology <https://localhost:4443/ns#> ;
ldt:service <urn:linkeddatahub:services/end-user> ;
# Versioning configuration - links to repository resource
lapp:versioningRepository <urn:linkeddatahub:versioning/graphs-repo> .
# Separate repository resource using DOAP vocabulary
<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
doap:location <https://github.com/AtomGraph/ldh-production-graphs> ;
lapp:branch "main" ;
lapp:pathPrefix "graphs" .
# Admin app typically has versioning disabled
<urn:linkeddatahub:apps/admin> a lapp:Application, lapp:AdminApplication ;
dct:title "LinkedDataHub admin" ;
lapp:origin <https://admin.localhost:4443> ;
# No versioning configuration = disabled
.# GitHub personal access token
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
# Or via Java system property
java -Dgithub.token=ghp_xxxxxxxxxxxxxxxxxxxx ...src/main/java/com/atomgraph/linkeddatahub/
├── client/
│ └── GitHubClient.java [NEW - 350 lines]
├── exception/
│ └── VersionConflictException.java [NEW Phase 2.1 - 10 lines]
├── resource/
│ ├── Graph.java [MODIFY - add ~20 lines]
│ ├── TimeGate.java [NEW Phase 3 - 150 lines]
│ └── TimeMap.java [NEW Phase 3 - 200 lines]
├── server/
│ ├── filter/response/
│ │ └── VersioningFilter.java [NEW - 250 lines]
│ └── service/
│ ├── GraphVersioningService.java [NEW - 400 lines (includes conflict detection)]
│ └── VersioningRetryQueue.java [NEW Phase 2.2 - 200 lines]
├── vocabulary/
│ ├── LAPP.java [MODIFY - add 3 properties]
│ └── DOAP.java [NEW - wrapper for DOAP vocabulary]
└── Application.java [MODIFY - add DI config ~30 lines]
http-tests/
└── versioning/ [NEW Phase 2.5]
├── create-versioned-graph.sh
├── update-versioned-graph.sh
└── retrieve-version.sh
config/
└── system.trig [MODIFY - add versioning config with DOAP]
pom.xml [NO CHANGES - uses existing dependencies]
# External Vocabularies Referenced:
# - DOAP (Description of a Project): http://usefulinc.com/ns/doap#
# Used for GitRepository modeling (doap:location, doap:GitRepository)
Total New Code: ~1,350 lines Modified Code: ~50 lines
| Phase | Description | Hours | Weeks |
|---|---|---|---|
| Phase 1 | Core infrastructure (MVP) | 48 | 2 |
| Phase 2 | Production features (includes conflict detection) | 40 | 2 |
| Phase 3 | Full Memento protocol | 35 | 2 |
| Total | Complete implementation | 123 | 6-7 |
Minimum Viable Product: Phase 1 only = 2 weeks
| Risk | Impact | Mitigation |
|---|---|---|
| GitHub API rate limits (5000/hour) | Service degradation | Async commits reduce frequency; implement request caching |
| Large graphs (>100MB) | GitHub rejects | Document size limits; consider chunking or Git LFS |
| Network failures to GitHub | Commits fail | Retry queue; best-effort design; don't block responses |
| Token exposure | Security breach | Environment variables only; never in RDF; rotate regularly |
| Concurrent modifications | Race conditions | GitHub API handles via SHA requirement; optimistic locking |
| Jersey client issues | HTTP errors | Robust error handling; retry logic; detailed logging |
| Manual GitHub edits | Graph version drift | Conflict detection (Phase 2.1); document GitHub as append-only |
Approach: Test GitHubClient and GraphVersioningService in isolation using mocks.
GitHubClient Tests:
- Use WireMock to mock GitHub REST API endpoints
- Test all 4 REST methods:
putFile(),getFileAtCommit(),findCommitAtDatetime(),deleteFile() - Verify Base64 encoding/decoding correctness
- Test SHA requirement for updates (must
GETfirst, thenPUTwith SHA) - Test error scenarios:
404,403(rate limit),401(auth),500(server error) - Verify proper HTTP headers (
Authorization: Bearer,Accept: application/vnd.github+json)
GraphVersioningService Tests:
- Mock GitHubClient and Application dependencies
- Test URI-to-path mapping (e.g.,
https://localhost:4443/data/products/→graphs/data/products.nt) - Verify N-Triples serialization correctness
- Test versioning enabled/disabled logic
- Test async execution (verify ExecutorService is used)
- Test error handling (GitHub failures don't throw to caller)
Key Considerations:
- WireMock allows testing without real GitHub API calls (fast CI, no rate limits)
- Need to make
API_BASEconfigurable in GitHubClient for testing - Test both create (no existing SHA) and update (existing SHA) paths
- Verify commit messages include timestamp and agent info
Files:
src/test/java/com/atomgraph/linkeddatahub/client/GitHubClientTest.javasrc/test/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningServiceTest.java
Estimated Effort: 16 hours
Approach: Test against real GitHub API using dedicated test repository.
Setup Requirements:
- Create test repository:
AtomGraph/ldh-test-versioning - Use environment variable
GITHUB_TOKENfor authentication - Maven profile
integration-testswith Failsafe plugin - Tests run with
mvn verify -P integration-tests - Use JUnit
@Tag("integration")andassumeTrue()to skip if no token
Test Scenarios:
- Full workflow: Create → Update → Retrieve v1 → Retrieve v2 → Delete
- Concurrent updates: 5 threads simultaneously updating same file (test GitHub's optimistic locking)
- Large files: Test with graphs >1MB to verify Base64 encoding doesn't break
- Pagination: If we later support listing all commits (TimeMap), test repos with >100 commits
Key Considerations:
- Tests hit real GitHub API (slower, uses rate limit)
- Each test run should use unique file paths (timestamp-based)
- Cleanup after tests to avoid filling test repository
- Verify GitHub's SHA requirement prevents lost updates
- Test datetime resolution (finding commits by timestamp)
Files:
src/test/java/com/atomgraph/linkeddatahub/integration/VersioningIntegrationTest.java
Estimated Effort: 12 hours
Approach: End-to-end tests following LinkedDataHub's existing http-tests/ patterns.
Test Scenarios:
-
Create versioned graph -
PUTto end-user app, verify GitHub commit appears -
Update versioned graph -
PUTagain, verify 2nd commit exists -
Retrieve historical version -
GETwith?version=<sha>, verify old content -
Versioning disabled -
PUTto admin app, verify NO GitHub commit -
GitHub unavailable - Invalid token, verify
HTTP 200still returned (graceful degradation) - Performance test - Create 50 graphs in parallel, measure throughput
Key Considerations:
- Use
ghCLI to verify commits in GitHub (easier than curl + JSON parsing) - Wait 5-10 seconds after operations for async commits to complete
- Store expected commit count before tests, compare after
- Test both end-user app (versioning ON) and admin app (versioning OFF)
- Verify Varnish doesn't cache version responses incorrectly
- Test commit messages contain graph URI, HTTP method, agent, timestamp
Setup:
- Configure versioning in
system.trig(requires app restart) - Set
GITHUB_TOKENenvironment variable - Use
http-tests/versioning/setup.shfor common test environment
Files:
-
http-tests/versioning/setup.sh- Environment setup http-tests/versioning/01-create-versioned-graph.shhttp-tests/versioning/02-update-versioned-graph.shhttp-tests/versioning/03-retrieve-historical-version.shhttp-tests/versioning/04-versioning-disabled.sh-
http-tests/versioning/05-github-unavailable.sh(manual test) http-tests/versioning/performance-test.shhttp-tests/versioning/cleanup.sh
Estimated Effort: 10 hours
Approach: Measure async commit throughput and identify bottlenecks.
Test Scenarios:
- Create 50-100 graphs in rapid succession (parallel curl)
- Measure graphs/second throughput
- Verify all async commits eventually complete (wait 30-60 seconds, check GitHub)
- Monitor ExecutorService queue size
- Test with large graphs (1000+ triples) to measure serialization time
Key Metrics:
- HTTP response time (should be <100ms, not blocked by GitHub)
- GitHub commit time (measured separately, expected 1-2 seconds per commit)
- Success rate (% of commits that succeed vs fail)
- Memory usage with large queue of pending commits
Key Considerations:
- GitHub API rate limit: 5000 requests/hour = ~83/minute
- Our async design means HTTP responses shouldn't be affected by GitHub speed
- Need retry queue for failed commits (Phase 2)
- Large graphs may need chunking or Git LFS (future enhancement)
Files:
http-tests/versioning/performance-test.sh
Estimated Effort: 6 hours
| Test Type | Purpose | Tools | Effort |
|---|---|---|---|
| Unit Tests | Isolated component testing | JUnit, Mockito, WireMock | 16h |
| Integration Tests | Real GitHub API validation | JUnit, real GitHub repo | 12h |
| HTTP Tests | End-to-end LinkedDataHub testing | Bash, curl, gh CLI | 10h |
| Performance Tests | Throughput and scalability | Bash, parallel curl | 6h |
| Total | 44h |
GitHub Actions Workflow:
- Run unit tests on every push (fast, no GitHub needed)
- Run integration tests on push to main branch only (uses secrets.VERSIONING_TEST_TOKEN)
- HTTP tests require full LinkedDataHub deployment (manual or separate CI job)
Maven Profiles:
- Default: Unit tests only (
mvn test) - Integration:
mvn verify -P integration-tests(requiresGITHUB_TOKENenv var)
New Maven test dependencies needed:
-
com.github.tomakehurst:wiremock-jre8:2.35.0(test scope) - JUnit 5 and Mockito (likely already present)
- Memento protocol compliance testing (
Accept-Datetimeheader) - TimeGate redirect behavior
- TimeMap link format validation
-
ETagmatching (git blob SHA) - Cross-browser XSLT rendering with historical versions
- ✅ Graph
PUT/POST/PATCH/DELETEtriggers GitHub commit - ✅ N-Triples files appear in GitHub repository
- ✅ Commit messages include timestamp and agent info
- ✅ Failed GitHub commits don't block HTTP responses
- ✅ Configuration via RDF in system.trig works
- ✅ Per-dataspace enable/disable works
- ✅ Historical versions retrievable via
?version=sha - ✅ Failed commits retry automatically
- ✅ HTTP tests validate complete workflow
- ✅ Monitoring shows commit success rates
- ✅
Accept-Datetimereturns historical versions - ✅ TimeGate redirects to appropriate Memento
- ✅ TimeMap lists all versions
- ✅ All Memento headers present and correct
- ✅
ETag= git blob SHA
Beyond the initial implementation:
- GitHub Apps Authentication - More secure than personal tokens
- Differential Commits - Commit only triples that changed (requires SPARQL diff)
- Git LFS Support - For graphs >100MB
- Batch Commits - Group multiple small changes into single commits
- Branch-per-Dataspace - Isolate different applications
- Webhook Integration - GitHub → LinkedDataHub sync for external changes
- RDF-based TimeMap - Return TimeMap as RDF (not just link-format)
- Provenance Integration - Link to existing ProvenanceFilter metadata
- Signed Commits - GPG signatures for audit trail
- Multi-Repository Support - Different repos per graph category
- GitHub REST API: https://docs.github.com/en/rest
- GitHub Contents API: https://docs.github.com/en/rest/repos/contents
- Memento Protocol: RFC 7089 - https://tools.ietf.org/html/rfc7089
- SPARQL Graph Store Protocol: https://www.w3.org/TR/sparql11-http-rdf-update/
- JAX-RS Client API: https://jakarta.ee/specifications/restful-ws/3.1/jakarta-restful-ws-spec-3.1.html#client-api
- DOAP Vocabulary: Description of a Project - http://usefulinc.com/ns/doap#
-
LinkedDataHub Architecture:
/Users/martynas/WebRoot/LinkedDataHub/architecture.svg -
Original Proposal:
/Users/martynas/WebRoot/LinkedDataHub/versioning.md
Document Version: 1.2 Last Updated: 2026-01-07 Status: Ready for Implementation
Changelog:
- v1.2: Added DOAP vocabulary for repository configuration; added conflict detection (Phase 2.1); renumbered Phase 2 items
- v1.1: Replaced kohsuke GitHub library with custom Jersey client; updated to use existing dependencies
- v1.0: Initial implementation plan