Skip to content

Graph Versioning Implementation Plan

Martynas Jusevičius edited this page Jan 7, 2026 · 1 revision

GitHub-Based Versioning Implementation Plan for LinkedDataHub

Version: 1.2 Date: 2026-01-07 Author: Implementation Plan (AI-assisted)


Executive Summary

This document outlines the implementation plan for adding GitHub-based RDF graph versioning to LinkedDataHub with Memento protocol support, as specified in versioning.md.

Design Decisions:

  • ✅ Per-dataspace RDF configuration (in config/system.trig)
  • ✅ Asynchronous background commits (non-blocking)
  • ✅ Best-effort versioning (degrades gracefully if GitHub unavailable)
  • ✅ Basic versioning first, Memento protocol in phase 2
  • ✅ Custom Jersey-based GitHub client (no external dependencies)

Technology Stack

GitHub Client Approach

Decision: Custom Jersey Client Implementation

Why roll our own instead of using a library (e.g., kohsuke:github-api):

Zero new dependencies

  • LinkedDataHub already has JAX-RS Jersey Client fully configured
  • No additional library bloat (~1MB+ for external GitHub libraries)
  • One less dependency to maintain/update/audit

Perfect architectural fit

  • Reuse existing HTTP client infrastructure (noCertClient)
  • Leverage existing retry logic patterns (see GraphStoreClient.java)
  • Integration with LinkedDataHub's logging/monitoring
  • Consistent error handling across the codebase

Minimal GitHub API usage

  • We only need 4 REST endpoints (PUT/GET/DELETE file, GET commits)
  • External libraries provide 100+ methods we don't need
  • GitHub REST API is straightforward JSON over HTTP

Already have JSON processing

  • LinkedDataHub uses jersey-media-json-processing
  • Simple POJO mapping for GitHub API responses
  • No new serialization framework needed

Educational value & maintainability

  • Clear understanding of exactly what's happening
  • No "magic" from external library
  • Easier debugging and troubleshooting
  • Future-proof against library deprecation

Trade-off:

  • More code to write (~350-400 lines vs ~200 with library)
  • Need to handle GitHub API specifics (Base64 encoding, SHA requirements, pagination)

Conclusion: For our limited use case (4 endpoints), a custom client is lighter, cleaner, and more maintainable.

Maven Dependencies: NONE required - all functionality uses existing dependencies


Architecture Overview

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                    LinkedDataHub                             │
│                                                              │
│  ┌────────────┐      ┌──────────────────┐                  │
│  │   Graph    │──┬──▶│ VersioningFilter │                  │
│  │ (JAX-RS)   │  │   └─────────┬────────┘                  │
│  └────────────┘  │             │ async                     │
│                  │             ▼                            │
│  ┌────────────┐  │   ┌──────────────────┐                  │
│  │ Graph      │  └──▶│ GraphVersioning  │                  │
│  │ Store      │      │    Service       │                  │
│  │ Client     │      └─────────┬────────┘                  │
│  └────────────┘                │                            │
│       │                        │                            │
│       │                        ▼                            │
│       │              ┌──────────────────┐                  │
│       │              │  GitHubClient    │                  │
│       │              │ (Jersey-based)   │                  │
│       │              └─────────┬────────┘                  │
│       ▼                        │                            │
│  ┌────────────┐                │                            │
│  │  Fuseki    │                │                            │
│  │ Triplestore│                │                            │
│  └────────────┘                │                            │
└────────────────────────────────┼────────────────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │  GitHub Repo    │
                        │  (graphs/*.nt)  │
                        └─────────────────┘

Data Flow

Graph Modification (PUT/POST/PATCH/DELETE):

  1. Request → Graph.java → GraphStoreClient → Fuseki (write to triplestore)
  2. Response → VersioningFilter (intercepts)
  3. Filter checks if dataspace has versioning enabled
  4. If enabled: Submit async task to GraphVersioningService
  5. Service: Serialize Model → N-Triples → GitHub commit
  6. Return HTTP 200/201 (doesn't wait for GitHub)

Historical Retrieval (GET with ?version=sha):

  1. Request → Graph.java detects version parameter
  2. Call GraphVersioningService.getVersionAtCommit()
  3. Fetch from GitHub → Parse N-Triples → Return Model

Phase 1: Core Infrastructure (~40-48 hours)

1.1 No New Maven Dependencies Required

File: pom.xml

No changes needed. All required dependencies already present:

  • ✅ Jersey Client API (org.glassfish.jersey.core:jersey-client)
  • ✅ JSON Processing (org.glassfish.jersey.media:jersey-media-json-processing)
  • ✅ Apache Jena for RDF (org.apache.jena:jena-arq)

Effort: 0 hours


1.2 RDF Vocabulary Extension

File: src/main/java/com/atomgraph/linkeddatahub/vocabulary/LAPP.java

Add versioning-related properties:

public static final Property versioningRepository = property("versioningRepository");
public static final Property branch = property("branch");         // For doap:GitRepository
public static final Property pathPrefix = property("pathPrefix"); // For doap:GitRepository

Design Decision:

  • Use doap:GitRepository for repository metadata (DOAP vocabulary at http://usefulinc.com/ns/doap#)
  • Application links to repository via lapp:versioningRepository property
  • Repository has standard doap:location plus custom lapp:branch and lapp:pathPrefix
  • Benefits: Standard vocabulary, reusable repository resources, cleaner RDF structure

Effort: 1 hour


1.3 Configuration in system.trig

File: config/system.trig

Example Configuration:

@prefix doap: <http://usefulinc.com/ns/doap#> .

<urn:linkeddatahub:apps/end-user> a lapp:Application, lapp:EndUserApplication ;
    dct:title "LinkedDataHub" ;
    lapp:origin <https://localhost:4443> ;
    ldt:ontology <https://localhost:4443/ns#> ;
    ldt:service <urn:linkeddatahub:services/end-user> ;
    ac:stylesheet <static/xsl/layout.xsl> ;
    lapp:adminApplication <urn:linkeddatahub:apps/admin> ;
    lapp:frontendProxy <http://varnish-frontend:6060/> ;
    lapp:public true ;

    # Versioning configuration - links to repository resource
    lapp:versioningRepository <urn:linkeddatahub:versioning/graphs-repo> .

# Separate repository resource using DOAP vocabulary
<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
    doap:location <https://github.com/AtomGraph/ldh-graphs> ;
    lapp:branch "main" ;
    lapp:pathPrefix "graphs" .

GitHub Token Configuration:

Since tokens shouldn't be in RDF, use one of:

Option A: Environment Variable (recommended)

export GITHUB_TOKEN="ghp_xxxxxxxxxxxxx"

Option B: Java System Property

-Dgithub.token=ghp_xxxxxxxxxxxxx

Option C: Encrypted in RDF (advanced)

lapp:versioningToken "encrypted:base64encodedvalue"

Effort: 1 hour (documentation)


1.4 Custom GitHub Client (Jersey-based)

File: src/main/java/com/atomgraph/linkeddatahub/client/GitHubClient.java

package com.atomgraph.linkeddatahub.client;

import jakarta.json.Json;
import jakarta.json.JsonArray;
import jakarta.json.JsonObject;
import jakarta.ws.rs.client.Client;
import jakarta.ws.rs.client.Entity;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.Instant;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Base64;
import java.util.Date;
import java.util.Optional;

/**
 * Custom Jersey-based GitHub API client for graph versioning.
 * Implements only the 4 REST endpoints we need.
 */
public class GitHubClient {

    private static final Logger log = LoggerFactory.getLogger(GitHubClient.class);
    private static final String API_BASE = "https://api.github.com";
    private static final String ACCEPT_HEADER = "application/vnd.github+json";

    private final Client httpClient;
    private final String token;
    private final String owner;
    private final String repo;
    private final String branch;

    public GitHubClient(Client httpClient, String token, String owner,
                        String repo, String branch) {
        this.httpClient = httpClient;
        this.token = token;
        this.owner = owner;
        this.repo = repo;
        this.branch = branch;
    }

    /**
     * PUT /repos/{owner}/{repo}/contents/{path}
     * Create or update a file.
     *
     * @param path File path (e.g., "graphs/data/products.nt")
     * @param content Raw file content
     * @param message Commit message
     * @return Commit SHA
     */
    public String putFile(String path, byte[] content, String message) {
        // Encode content to Base64 (GitHub requirement)
        String encodedContent = Base64.getEncoder().encodeToString(content);

        // Try to get existing file SHA (needed for updates)
        Optional<String> existingSha = getFileSha(path);

        // Build request body
        var bodyBuilder = Json.createObjectBuilder()
            .add("message", message)
            .add("content", encodedContent)
            .add("branch", branch);

        if (existingSha.isPresent()) {
            bodyBuilder.add("sha", existingSha.get());
        }

        JsonObject body = bodyBuilder.build();

        // PUT request
        Response response = httpClient.target(API_BASE)
            .path("repos/{owner}/{repo}/contents/{path}")
            .resolveTemplate("owner", owner)
            .resolveTemplate("repo", repo)
            .resolveTemplate("path", path)
            .request(MediaType.APPLICATION_JSON)
            .header("Authorization", "Bearer " + token)
            .header("Accept", ACCEPT_HEADER)
            .put(Entity.json(body));

        if (response.getStatus() == 201 || response.getStatus() == 200) {
            JsonObject result = response.readEntity(JsonObject.class);
            String commitSha = result.getJsonObject("commit").getString("sha");
            log.info("GitHub file {} committed: {}", path, commitSha);
            return commitSha;
        } else {
            String error = response.readEntity(String.class);
            log.error("GitHub API error {}: {}", response.getStatus(), error);
            throw new RuntimeException("GitHub API error: " + response.getStatus());
        }
    }

    /**
     * GET /repos/{owner}/{repo}/contents/{path}
     * Get file SHA (for detecting existence and getting update token).
     */
    private Optional<String> getFileSha(String path) {
        try {
            Response response = httpClient.target(API_BASE)
                .path("repos/{owner}/{repo}/contents/{path}")
                .queryParam("ref", branch)
                .resolveTemplate("owner", owner)
                .resolveTemplate("repo", repo)
                .resolveTemplate("path", path)
                .request(MediaType.APPLICATION_JSON)
                .header("Authorization", "Bearer " + token)
                .header("Accept", ACCEPT_HEADER)
                .get();

            if (response.getStatus() == 200) {
                JsonObject result = response.readEntity(JsonObject.class);
                return Optional.of(result.getString("sha"));
            } else {
                return Optional.empty();  // File doesn't exist
            }
        } catch (Exception e) {
            log.debug("File {} not found: {}", path, e.getMessage());
            return Optional.empty();
        }
    }

    /**
     * GET /repos/{owner}/{repo}/contents/{path}?ref={commitSha}
     * Get file content at specific commit.
     */
    public byte[] getFileAtCommit(String path, String commitSha) {
        Response response = httpClient.target(API_BASE)
            .path("repos/{owner}/{repo}/contents/{path}")
            .queryParam("ref", commitSha)
            .resolveTemplate("owner", owner)
            .resolveTemplate("repo", repo)
            .resolveTemplate("path", path)
            .request(MediaType.APPLICATION_JSON)
            .header("Authorization", "Bearer " + token)
            .header("Accept", ACCEPT_HEADER)
            .get();

        if (response.getStatus() == 200) {
            JsonObject result = response.readEntity(JsonObject.class);
            String encodedContent = result.getString("content");

            // Decode Base64 content (remove newlines first)
            String cleaned = encodedContent.replace("\n", "").replace("\r", "");
            return Base64.getDecoder().decode(cleaned);
        } else {
            throw new RuntimeException("Failed to get file at commit " + commitSha);
        }
    }

    /**
     * GET /repos/{owner}/{repo}/commits?path={path}&until={date}
     * Find commit closest to datetime for specific file.
     */
    public Optional<String> findCommitAtDatetime(String path, Date datetime) {
        // Format date for GitHub API (ISO 8601)
        String until = DateTimeFormatter.ISO_INSTANT.format(
            datetime.toInstant()
        );

        Response response = httpClient.target(API_BASE)
            .path("repos/{owner}/{repo}/commits")
            .queryParam("path", path)
            .queryParam("until", until)
            .queryParam("per_page", 1)  // We only need the first one
            .resolveTemplate("owner", owner)
            .resolveTemplate("repo", repo)
            .request(MediaType.APPLICATION_JSON)
            .header("Authorization", "Bearer " + token)
            .header("Accept", ACCEPT_HEADER)
            .get();

        if (response.getStatus() == 200) {
            JsonArray commits = response.readEntity(JsonArray.class);

            if (commits.isEmpty()) {
                return Optional.empty();
            }

            JsonObject firstCommit = commits.getJsonObject(0);
            return Optional.of(firstCommit.getString("sha"));
        } else {
            log.error("Failed to find commit at datetime {}", datetime);
            return Optional.empty();
        }
    }

    /**
     * DELETE /repos/{owner}/{repo}/contents/{path}
     * Delete a file.
     */
    public void deleteFile(String path, String message) {
        // Need SHA to delete
        Optional<String> sha = getFileSha(path);
        if (sha.isEmpty()) {
            log.warn("Cannot delete file {} - not found", path);
            return;
        }

        JsonObject body = Json.createObjectBuilder()
            .add("message", message)
            .add("sha", sha.get())
            .add("branch", branch)
            .build();

        Response response = httpClient.target(API_BASE)
            .path("repos/{owner}/{repo}/contents/{path}")
            .resolveTemplate("owner", owner)
            .resolveTemplate("repo", repo)
            .resolveTemplate("path", path)
            .request(MediaType.APPLICATION_JSON)
            .header("Authorization", "Bearer " + token)
            .header("Accept", ACCEPT_HEADER)
            .method("DELETE", Entity.json(body));

        if (response.getStatus() != 200) {
            log.error("Failed to delete file {}", path);
        }
    }
}

Effort: 14 hours (including error handling, Base64 encoding/decoding, testing)


1.5 Graph Versioning Service

File: src/main/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningService.java

package com.atomgraph.linkeddatahub.server.service;

import com.atomgraph.linkeddatahub.client.GitHubClient;
import com.atomgraph.linkeddatahub.apps.model.Application;
import com.atomgraph.linkeddatahub.vocabulary.LAPP;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFDataMgr;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import jakarta.inject.Inject;
import java.io.ByteArrayOutputStream;
import java.io.ByteArrayInputStream;
import java.net.URI;
import java.util.Date;
import java.util.Optional;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * Service for versioning RDF graphs to GitHub.
 */
public class GraphVersioningService {

    private static final Logger log = LoggerFactory.getLogger(GraphVersioningService.class);

    private final GitHubClient githubClient;
    private final Application application;
    private final ExecutorService executor;
    private final String pathPrefix;

    @Inject
    public GraphVersioningService(Application application, Client httpClient, String githubToken) {
        this.application = application;

        // Read repository configuration from linked doap:GitRepository resource
        Resource repoResource = application.getPropertyResourceValue(LAPP.versioningRepository);

        if (repoResource == null) {
            log.info("No versioning repository configured");
            this.githubClient = null;
            this.pathPrefix = null;
            this.executor = null;
            return;
        }

        // Parse GitHub URL from doap:location
        String location = repoResource.getProperty(DOAP.location).getString();
        // Extract owner/repo from https://github.com/owner/repo
        String[] parts = location.replace("https://github.com/", "").split("/");
        String owner = parts[0];
        String repoName = parts[1];

        String branch = repoResource.getProperty(LAPP.branch).getString();
        this.pathPrefix = repoResource.getProperty(LAPP.pathPrefix).getString();

        // Initialize custom Jersey-based GitHub client
        this.githubClient = new GitHubClient(httpClient, githubToken, owner, repoName, branch);

        // Background executor for async commits
        this.executor = Executors.newFixedThreadPool(4);
    }

    /**
     * Check if versioning is enabled for this application.
     */
    public boolean isVersioningEnabled() {
        return githubClient != null;  // Enabled if repository configured
    }

    /**
     * Map graph URI to file path.
     * Example: https://localhost:4443/data/products/ → graphs/data/products.nt
     */
    private String graphUriToPath(URI graphUri) {
        URI baseUri = application.getURI();
        String relativePath = baseUri.relativize(graphUri).getPath();

        // Remove trailing slash
        relativePath = relativePath.replaceAll("/$", "");

        return pathPrefix + "/" + relativePath + ".nt";
    }

    /**
     * Commit a graph version asynchronously.
     */
    public void commitGraphVersionAsync(URI graphUri, Model model, String message) {
        if (!isVersioningEnabled()) {
            log.debug("Versioning not enabled for application");
            return;
        }

        executor.submit(() -> {
            try {
                commitGraphVersion(graphUri, model, message);
            } catch (Exception e) {
                log.error("Failed to commit graph version for " + graphUri, e);
                // TODO: Add to retry queue
            }
        });
    }

    /**
     * Commit a graph version synchronously.
     */
    private String commitGraphVersion(URI graphUri, Model model, String message) {
        // Serialize to N-Triples
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        RDFDataMgr.write(baos, model, Lang.NTRIPLES);
        byte[] content = baos.toByteArray();

        // Map URI to path
        String path = graphUriToPath(graphUri);

        // Commit to GitHub
        String commitSha = githubClient.putFile(path, content, message);

        log.info("Committed graph {} to GitHub at {}", graphUri, commitSha);

        return commitSha;
    }

    /**
     * Retrieve graph at specific commit.
     */
    public Optional<Model> getGraphAtCommit(URI graphUri, String commitSha) {
        try {
            String path = graphUriToPath(graphUri);
            byte[] content = githubClient.getFileAtCommit(path, commitSha);

            // Parse N-Triples back to Model
            ByteArrayInputStream bais = new ByteArrayInputStream(content);
            Model model = ModelFactory.createDefaultModel();
            RDFDataMgr.read(model, bais, Lang.NTRIPLES);

            return Optional.of(model);

        } catch (Exception e) {
            log.error("Failed to retrieve graph at commit " + commitSha, e);
            return Optional.empty();
        }
    }

    /**
     * Retrieve graph at specific datetime (Memento).
     */
    public Optional<Model> getGraphAtDatetime(URI graphUri, Date datetime) {
        try {
            String path = graphUriToPath(graphUri);
            Optional<String> commitSha = githubClient.findCommitAtDatetime(path, datetime);

            if (commitSha.isEmpty()) {
                return Optional.empty();
            }

            return getGraphAtCommit(graphUri, commitSha.get());

        } catch (Exception e) {
            log.error("Failed to find commit at datetime " + datetime, e);
            return Optional.empty();
        }
    }

    /**
     * Shutdown executor on application shutdown.
     */
    public void shutdown() {
        executor.shutdown();
    }
}

Effort: 16 hours (including integration, testing, error handling)


1.6 Versioning Response Filter

File: src/main/java/com/atomgraph/linkeddatahub/server/filter/response/VersioningFilter.java

package com.atomgraph.linkeddatahub.server.filter.response;

import com.atomgraph.linkeddatahub.server.service.GraphVersioningService;
import com.atomgraph.linkeddatahub.resource.Graph;
import org.apache.jena.rdf.model.Model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;
import java.net.URI;
import java.util.Date;

/**
 * Filter that intercepts graph modifications and commits versions to GitHub.
 */
@Provider
@Priority(Priorities.USER + 400)
public class VersioningFilter implements ContainerResponseFilter {

    private static final Logger log = LoggerFactory.getLogger(VersioningFilter.class);

    @Inject
    private GraphVersioningService versioningService;

    @Override
    public void filter(ContainerRequestContext request,
                       ContainerResponseContext response) {

        // Only process Graph resource responses
        if (!isGraphResource(request)) {
            return;
        }

        // Only process successful modifications
        if (!isSuccessfulModification(request, response)) {
            return;
        }

        try {
            // Get graph URI
            URI graphUri = request.getUriInfo().getAbsolutePath();

            // Get modified model from request attribute
            // (Set by Graph.java during PUT/POST/PATCH)
            Model model = (Model) request.getProperty("ldh.graph.model");

            if (model == null) {
                log.warn("No model found in request attributes for {}", graphUri);
                return;
            }

            // Generate commit message
            String method = request.getMethod();
            String agent = getAuthenticatedAgent(request);
            String message = String.format("%s %s by %s at %s",
                method, graphUri, agent, new Date());

            // Commit asynchronously
            versioningService.commitGraphVersionAsync(graphUri, model, message);

            // Add basic Link header for future Memento use
            // Link: <commit-sha>; rel="version"
            // (Will be enhanced in Phase 3)

        } catch (Exception e) {
            log.error("Error in versioning filter", e);
            // Don't fail the response - versioning is best-effort
        }
    }

    private boolean isGraphResource(ContainerRequestContext request) {
        // Check if resource info matches Graph class
        return request.getUriInfo().getMatchedResources().stream()
            .anyMatch(r -> r instanceof Graph);
    }

    private boolean isSuccessfulModification(ContainerRequestContext request,
                                              ContainerResponseContext response) {
        String method = request.getMethod();
        int status = response.getStatus();

        return (method.equals("PUT") || method.equals("POST") ||
                method.equals("PATCH") || method.equals("DELETE")) &&
               (status >= 200 && status < 300);
    }

    private String getAuthenticatedAgent(ContainerRequestContext request) {
        // Extract from AgentContext or SecurityContext
        // TODO: Implement based on LinkedDataHub's auth mechanism
        return "system";
    }
}

Effort: 12 hours (including integration with Graph.java to pass Model)


1.7 Dependency Injection Registration

File: src/main/java/com/atomgraph/linkeddatahub/Application.java

Add to the HK2 binder configuration:

// In Application.java constructor, add to AbstractBinder:

bind(GraphVersioningService.class)
    .to(GraphVersioningService.class)
    .in(Singleton.class);

// GitHub token from environment variable
bindFactory(new Factory<String>() {
    @Override
    public String provide() {
        String token = System.getenv("GITHUB_TOKEN");
        if (token == null) {
            token = System.getProperty("github.token");
        }
        if (token == null) {
            log.warn("No GitHub token configured - versioning will be disabled");
            return "";
        }
        return token;
    }

    @Override
    public void dispose(String instance) {}
})
.named("githubToken")
.to(String.class);

Effort: 4 hours (testing DI, handling edge cases)


1.8 Modify Graph.java to Pass Model

File: src/main/java/com/atomgraph/linkeddatahub/resource/Graph.java

Add model to request attributes after successful operations:

// In put() method, after model is persisted:
getUriInfo().getRequestContext().setProperty("ldh.graph.model", model);

// Similarly in post() and patch() methods

Effort: 2 hours


Phase 1 Total: ~48 hours (2 weeks)


Phase 2: Production Features (~38 hours)

2.1 External Edit Detection (Conflict Detection)

Objective: Detect when GitHub files have been manually edited outside of LinkedDataHub, causing drift from Fuseki state.

Design Approach:

Manual edits to GitHub repository files (though not expected in normal operation) would create version history inconsistency. This feature detects such drift and fails the operation with a clear error message.

Implementation Strategy:

  1. Store Last Known SHA:

    • In-memory cache (ConcurrentHashMap) mapping file path → last committed SHA
    • Populated after each successful putFile() operation
    • Persisted to disk on shutdown (optional for Phase 2.2)
  2. Pre-Commit Verification:

    • Before each putFile(), fetch current file SHA from GitHub API
    • Compare with cached last known SHA
    • If different: external edit detected → throw VersionConflictException
    • If same or no cache entry: proceed with commit
  3. Error Handling:

    • VersionConflictException includes:
      • Graph URI
      • Expected SHA (from cache)
      • Actual SHA (from GitHub)
      • Error message for operators
    • Log warning with details for debugging
    • Don't attempt auto-resolution/merge - require manual intervention
    • Optional: Configuration flag lapp:versioningOverwriteOnConflict to force overwrite

Code Changes:

File: src/main/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningService.java

Add to class:

private final ConcurrentHashMap<String, String> lastKnownShas = new ConcurrentHashMap<>();

private void detectExternalEdits(String path) throws VersionConflictException {
    String cachedSha = lastKnownShas.get(path);
    if (cachedSha == null) {
        return;  // First commit, no conflict possible
    }

    Optional<String> currentSha = githubClient.getFileSha(path);
    if (currentSha.isEmpty()) {
        return;  // File deleted externally, will be recreated
    }

    if (!cachedSha.equals(currentSha.get())) {
        String msg = String.format(
            "External edit detected for %s: expected SHA %s but found %s",
            path, cachedSha, currentSha.get()
        );
        log.error(msg);
        throw new VersionConflictException(msg);
    }
}

// In commitGraphVersion(), before githubClient.putFile():
detectExternalEdits(path);
String commitSha = githubClient.putFile(path, content, message);
lastKnownShas.put(path, commitSha);  // Update cache

New Exception:

File: src/main/java/com/atomgraph/linkeddatahub/exception/VersionConflictException.java

package com.atomgraph.linkeddatahub.exception;

public class VersionConflictException extends RuntimeException {
    public VersionConflictException(String message) {
        super(message);
    }
}

Configuration Option (Optional):

<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
    doap:location <https://github.com/AtomGraph/ldh-graphs> ;
    lapp:branch "main" ;
    lapp:pathPrefix "graphs" ;
    lapp:overwriteOnConflict false .  # Default: abort on conflict

Key Considerations:

  • Performance: Adds one extra GitHub API GET call per commit (acceptable overhead)
  • Race Condition: Small window still exists between check and commit (inherent to distributed systems)
  • Cold Start: Cache is empty on application restart - first commits after restart won't detect conflicts
  • Persistence: Phase 2.2 can add cache persistence to disk for cold start coverage
  • Alternative Approach: Use GitHub's built-in optimistic locking (SHA requirement) - already provides some protection

Testing:

  • Manually edit file in GitHub web UI
  • Attempt to update same graph in LinkedDataHub
  • Verify VersionConflictException is thrown
  • Verify error message includes both SHAs
  • Test overwrite configuration flag

Effort: 8 hours


2.2 Retry Queue for Failed Commits

File: src/main/java/com/atomgraph/linkeddatahub/server/service/VersioningRetryQueue.java

  • Persistent queue (file-based or database)
  • Exponential backoff retry logic
  • Health check endpoint

Effort: 10 hours


2.3 Historical Version Retrieval via Query Parameter

File: src/main/java/com/atomgraph/linkeddatahub/resource/Graph.java

Modify GET method:

@GET
public Response get(@QueryParam("version") String commitSha) {
    if (commitSha != null) {
        // Retrieve from GitHub
        Optional<Model> historical = versioningService.getGraphAtCommit(getURI(), commitSha);
        if (historical.isPresent()) {
            return Response.ok(historical.get())
                .header("Memento-Datetime", getCommitDate(commitSha))
                .build();
        } else {
            return Response.status(404).build();
        }
    }

    // Normal flow - retrieve from triplestore
    return super.get();
}

Effort: 8 hours


2.4 Monitoring & Metrics

  • JMX beans for commit success/failure rates
  • Logging with structured format
  • Integration with existing LinkedDataHub monitoring

Effort: 6 hours


2.5 HTTP Tests

File: http-tests/versioning/ (new directory)

Test scenarios:

  • Graph creation triggers commit
  • Graph update triggers commit
  • Version retrieval via ?version=sha
  • Versioning disabled per dataspace
  • GitHub unavailable (graceful degradation)

Effort: 8 hours


Phase 2 Total: ~40 hours (2 weeks)


Phase 3: Full Memento Protocol (~35 hours)

3.1 Accept-Datetime Handler

Extend VersioningFilter to handle GET requests:

if (request.getMethod().equals("GET") && hasAcceptDatetime(request)) {
    Date datetime = parseAcceptDatetime(request);
    Optional<Model> memento = versioningService.getGraphAtDatetime(graphUri, datetime);

    if (memento.isPresent()) {
        // Replace response with historical version
        response.setEntity(memento.get());
        response.getHeaders().add("Memento-Datetime", formatHttpDate(datetime));
        addMementoLinks(response, graphUri);
    }
}

Effort: 10 hours


3.2 TimeGate Endpoint

File: src/main/java/com/atomgraph/linkeddatahub/resource/TimeGate.java

@Path("timegate")
public class TimeGate {

    @GET
    @Path("{path: .*}")
    public Response timeGate(@PathParam("path") String path,
                             @HeaderParam("Accept-Datetime") String acceptDatetime) {
        // Find memento closest to datetime
        // Return 302 redirect to memento URI
    }
}

Effort: 8 hours


3.3 TimeMap Endpoint

File: src/main/java/com/atomgraph/linkeddatahub/resource/TimeMap.java

List all versions in application/link-format:

<https://example.org/data/products?version=abc123>;
  rel="memento"; datetime="Mon, 01 Jan 2024 12:00:00 GMT"

Effort: 10 hours


3.4 Complete Memento Headers

Add to all responses:

Link: <original-uri>; rel="original"
Link: <timegate-uri>; rel="timegate"
Link: <timemap-uri>; rel="timemap"
Memento-Datetime: Mon, 01 Jan 2024 12:00:00 GMT
ETag: "git-blob-<sha>"

Effort: 7 hours


Phase 3 Total: ~35 hours (2 weeks)


Configuration Example

system.trig

@prefix lapp: <https://w3id.org/atomgraph/linkeddatahub/apps#> .
@prefix ldh: <https://w3id.org/atomgraph/linkeddatahub#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix ldt: <https://www.w3.org/ns/ldt#> .
@prefix doap: <http://usefulinc.com/ns/doap#> .

<urn:linkeddatahub:apps/end-user> a lapp:Application, lapp:EndUserApplication ;
    dct:title "LinkedDataHub" ;
    lapp:origin <https://localhost:4443> ;
    ldt:ontology <https://localhost:4443/ns#> ;
    ldt:service <urn:linkeddatahub:services/end-user> ;

    # Versioning configuration - links to repository resource
    lapp:versioningRepository <urn:linkeddatahub:versioning/graphs-repo> .

# Separate repository resource using DOAP vocabulary
<urn:linkeddatahub:versioning/graphs-repo> a doap:GitRepository ;
    doap:location <https://github.com/AtomGraph/ldh-production-graphs> ;
    lapp:branch "main" ;
    lapp:pathPrefix "graphs" .

# Admin app typically has versioning disabled
<urn:linkeddatahub:apps/admin> a lapp:Application, lapp:AdminApplication ;
    dct:title "LinkedDataHub admin" ;
    lapp:origin <https://admin.localhost:4443> ;
    # No versioning configuration = disabled
    .

Environment Variables

# GitHub personal access token
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

# Or via Java system property
java -Dgithub.token=ghp_xxxxxxxxxxxxxxxxxxxx ...

File Structure

src/main/java/com/atomgraph/linkeddatahub/
├── client/
│   └── GitHubClient.java                      [NEW - 350 lines]
├── exception/
│   └── VersionConflictException.java         [NEW Phase 2.1 - 10 lines]
├── resource/
│   ├── Graph.java                             [MODIFY - add ~20 lines]
│   ├── TimeGate.java                          [NEW Phase 3 - 150 lines]
│   └── TimeMap.java                           [NEW Phase 3 - 200 lines]
├── server/
│   ├── filter/response/
│   │   └── VersioningFilter.java             [NEW - 250 lines]
│   └── service/
│       ├── GraphVersioningService.java       [NEW - 400 lines (includes conflict detection)]
│       └── VersioningRetryQueue.java         [NEW Phase 2.2 - 200 lines]
├── vocabulary/
│   ├── LAPP.java                              [MODIFY - add 3 properties]
│   └── DOAP.java                              [NEW - wrapper for DOAP vocabulary]
└── Application.java                           [MODIFY - add DI config ~30 lines]

http-tests/
└── versioning/                                [NEW Phase 2.5]
    ├── create-versioned-graph.sh
    ├── update-versioned-graph.sh
    └── retrieve-version.sh

config/
└── system.trig                                [MODIFY - add versioning config with DOAP]

pom.xml                                        [NO CHANGES - uses existing dependencies]

# External Vocabularies Referenced:
# - DOAP (Description of a Project): http://usefulinc.com/ns/doap#
#   Used for GitRepository modeling (doap:location, doap:GitRepository)

Total New Code: ~1,350 lines Modified Code: ~50 lines


Effort Summary

Phase Description Hours Weeks
Phase 1 Core infrastructure (MVP) 48 2
Phase 2 Production features (includes conflict detection) 40 2
Phase 3 Full Memento protocol 35 2
Total Complete implementation 123 6-7

Minimum Viable Product: Phase 1 only = 2 weeks


Risks & Mitigations

Risk Impact Mitigation
GitHub API rate limits (5000/hour) Service degradation Async commits reduce frequency; implement request caching
Large graphs (>100MB) GitHub rejects Document size limits; consider chunking or Git LFS
Network failures to GitHub Commits fail Retry queue; best-effort design; don't block responses
Token exposure Security breach Environment variables only; never in RDF; rotate regularly
Concurrent modifications Race conditions GitHub API handles via SHA requirement; optimistic locking
Jersey client issues HTTP errors Robust error handling; retry logic; detailed logging
Manual GitHub edits Graph version drift Conflict detection (Phase 2.1); document GitHub as append-only

Testing Strategy

1. Unit Tests (JUnit + Mockito + WireMock)

Approach: Test GitHubClient and GraphVersioningService in isolation using mocks.

GitHubClient Tests:

  • Use WireMock to mock GitHub REST API endpoints
  • Test all 4 REST methods: putFile(), getFileAtCommit(), findCommitAtDatetime(), deleteFile()
  • Verify Base64 encoding/decoding correctness
  • Test SHA requirement for updates (must GET first, then PUT with SHA)
  • Test error scenarios: 404, 403 (rate limit), 401 (auth), 500 (server error)
  • Verify proper HTTP headers (Authorization: Bearer, Accept: application/vnd.github+json)

GraphVersioningService Tests:

  • Mock GitHubClient and Application dependencies
  • Test URI-to-path mapping (e.g., https://localhost:4443/data/products/graphs/data/products.nt)
  • Verify N-Triples serialization correctness
  • Test versioning enabled/disabled logic
  • Test async execution (verify ExecutorService is used)
  • Test error handling (GitHub failures don't throw to caller)

Key Considerations:

  • WireMock allows testing without real GitHub API calls (fast CI, no rate limits)
  • Need to make API_BASE configurable in GitHubClient for testing
  • Test both create (no existing SHA) and update (existing SHA) paths
  • Verify commit messages include timestamp and agent info

Files:

  • src/test/java/com/atomgraph/linkeddatahub/client/GitHubClientTest.java
  • src/test/java/com/atomgraph/linkeddatahub/server/service/GraphVersioningServiceTest.java

Estimated Effort: 16 hours


2. Integration Tests (Real GitHub Test Repository)

Approach: Test against real GitHub API using dedicated test repository.

Setup Requirements:

  • Create test repository: AtomGraph/ldh-test-versioning
  • Use environment variable GITHUB_TOKEN for authentication
  • Maven profile integration-tests with Failsafe plugin
  • Tests run with mvn verify -P integration-tests
  • Use JUnit @Tag("integration") and assumeTrue() to skip if no token

Test Scenarios:

  • Full workflow: Create → Update → Retrieve v1 → Retrieve v2 → Delete
  • Concurrent updates: 5 threads simultaneously updating same file (test GitHub's optimistic locking)
  • Large files: Test with graphs >1MB to verify Base64 encoding doesn't break
  • Pagination: If we later support listing all commits (TimeMap), test repos with >100 commits

Key Considerations:

  • Tests hit real GitHub API (slower, uses rate limit)
  • Each test run should use unique file paths (timestamp-based)
  • Cleanup after tests to avoid filling test repository
  • Verify GitHub's SHA requirement prevents lost updates
  • Test datetime resolution (finding commits by timestamp)

Files:

  • src/test/java/com/atomgraph/linkeddatahub/integration/VersioningIntegrationTest.java

Estimated Effort: 12 hours


3. HTTP Tests (Shell Scripts)

Approach: End-to-end tests following LinkedDataHub's existing http-tests/ patterns.

Test Scenarios:

  1. Create versioned graph - PUT to end-user app, verify GitHub commit appears
  2. Update versioned graph - PUT again, verify 2nd commit exists
  3. Retrieve historical version - GET with ?version=<sha>, verify old content
  4. Versioning disabled - PUT to admin app, verify NO GitHub commit
  5. GitHub unavailable - Invalid token, verify HTTP 200 still returned (graceful degradation)
  6. Performance test - Create 50 graphs in parallel, measure throughput

Key Considerations:

  • Use gh CLI to verify commits in GitHub (easier than curl + JSON parsing)
  • Wait 5-10 seconds after operations for async commits to complete
  • Store expected commit count before tests, compare after
  • Test both end-user app (versioning ON) and admin app (versioning OFF)
  • Verify Varnish doesn't cache version responses incorrectly
  • Test commit messages contain graph URI, HTTP method, agent, timestamp

Setup:

  • Configure versioning in system.trig (requires app restart)
  • Set GITHUB_TOKEN environment variable
  • Use http-tests/versioning/setup.sh for common test environment

Files:

  • http-tests/versioning/setup.sh - Environment setup
  • http-tests/versioning/01-create-versioned-graph.sh
  • http-tests/versioning/02-update-versioned-graph.sh
  • http-tests/versioning/03-retrieve-historical-version.sh
  • http-tests/versioning/04-versioning-disabled.sh
  • http-tests/versioning/05-github-unavailable.sh (manual test)
  • http-tests/versioning/performance-test.sh
  • http-tests/versioning/cleanup.sh

Estimated Effort: 10 hours


4. Performance & Load Tests

Approach: Measure async commit throughput and identify bottlenecks.

Test Scenarios:

  • Create 50-100 graphs in rapid succession (parallel curl)
  • Measure graphs/second throughput
  • Verify all async commits eventually complete (wait 30-60 seconds, check GitHub)
  • Monitor ExecutorService queue size
  • Test with large graphs (1000+ triples) to measure serialization time

Key Metrics:

  • HTTP response time (should be <100ms, not blocked by GitHub)
  • GitHub commit time (measured separately, expected 1-2 seconds per commit)
  • Success rate (% of commits that succeed vs fail)
  • Memory usage with large queue of pending commits

Key Considerations:

  • GitHub API rate limit: 5000 requests/hour = ~83/minute
  • Our async design means HTTP responses shouldn't be affected by GitHub speed
  • Need retry queue for failed commits (Phase 2)
  • Large graphs may need chunking or Git LFS (future enhancement)

Files:

  • http-tests/versioning/performance-test.sh

Estimated Effort: 6 hours


Testing Summary

Test Type Purpose Tools Effort
Unit Tests Isolated component testing JUnit, Mockito, WireMock 16h
Integration Tests Real GitHub API validation JUnit, real GitHub repo 12h
HTTP Tests End-to-end LinkedDataHub testing Bash, curl, gh CLI 10h
Performance Tests Throughput and scalability Bash, parallel curl 6h
Total 44h

CI/CD Integration

GitHub Actions Workflow:

  • Run unit tests on every push (fast, no GitHub needed)
  • Run integration tests on push to main branch only (uses secrets.VERSIONING_TEST_TOKEN)
  • HTTP tests require full LinkedDataHub deployment (manual or separate CI job)

Maven Profiles:

  • Default: Unit tests only (mvn test)
  • Integration: mvn verify -P integration-tests (requires GITHUB_TOKEN env var)

Test Dependencies

New Maven test dependencies needed:

  • com.github.tomakehurst:wiremock-jre8:2.35.0 (test scope)
  • JUnit 5 and Mockito (likely already present)

Future Test Enhancements (Phase 3)

  • Memento protocol compliance testing (Accept-Datetime header)
  • TimeGate redirect behavior
  • TimeMap link format validation
  • ETag matching (git blob SHA)
  • Cross-browser XSLT rendering with historical versions

Success Criteria

Phase 1 (MVP)

  • ✅ Graph PUT/POST/PATCH/DELETE triggers GitHub commit
  • ✅ N-Triples files appear in GitHub repository
  • ✅ Commit messages include timestamp and agent info
  • ✅ Failed GitHub commits don't block HTTP responses
  • ✅ Configuration via RDF in system.trig works
  • ✅ Per-dataspace enable/disable works

Phase 2 (Production-Ready)

  • ✅ Historical versions retrievable via ?version=sha
  • ✅ Failed commits retry automatically
  • ✅ HTTP tests validate complete workflow
  • ✅ Monitoring shows commit success rates

Phase 3 (Full Memento)

  • Accept-Datetime returns historical versions
  • ✅ TimeGate redirects to appropriate Memento
  • ✅ TimeMap lists all versions
  • ✅ All Memento headers present and correct
  • ETag = git blob SHA

Future Enhancements

Beyond the initial implementation:

  1. GitHub Apps Authentication - More secure than personal tokens
  2. Differential Commits - Commit only triples that changed (requires SPARQL diff)
  3. Git LFS Support - For graphs >100MB
  4. Batch Commits - Group multiple small changes into single commits
  5. Branch-per-Dataspace - Isolate different applications
  6. Webhook Integration - GitHub → LinkedDataHub sync for external changes
  7. RDF-based TimeMap - Return TimeMap as RDF (not just link-format)
  8. Provenance Integration - Link to existing ProvenanceFilter metadata
  9. Signed Commits - GPG signatures for audit trail
  10. Multi-Repository Support - Different repos per graph category

References


Document Version: 1.2 Last Updated: 2026-01-07 Status: Ready for Implementation

Changelog:

  • v1.2: Added DOAP vocabulary for repository configuration; added conflict detection (Phase 2.1); renumbered Phase 2 items
  • v1.1: Replaced kohsuke GitHub library with custom Jersey client; updated to use existing dependencies
  • v1.0: Initial implementation plan

Clone this wiki locally