Skip to content

CI CD Pipelines

Christian Lück edited this page Sep 19, 2025 · 4 revisions

CI/CD Pipelines

After configuring the tooling environment to get a release of xtriples-micro, it can be used in CI/CD pipeline.

gitlab

Here's how the CI/CD pipelines at SCDH Münster look like. The directory structure is shown here.

Setup and Tooling

This is the beginning of a .gitlab-ci.yml. It defines some variables first, telling which container image to use for getting Java etc. The Tooling environment is set up only once per pipeline run in the tooling job, and the artifacts (tools) in $TOOLING/target are stored as an artifact for the subsequent jobs in the pipeline.

variables:
  JAVA_DOCKER_IMAGE: eclipse-temurin:17-alpine
  MAVEN_OPTS: >-
    -Dhttps.protocols=TLSv1.2
    -Dmaven.repo.local=$CI_PROJECT_DIR/.m2/repository
    -Dorg.slf4j.simpleLogger.showDateTime=true
    -Djava.awt.headless=true
  MAVEN_CLI_OPTS: >-
    --batch-mode
    --errors
    --fail-at-end
    --show-version
    --no-transfer-progress
    -DinstallAtEnd=true
    -DdeployAtEnd=true
  TOOLING: resources
  ANT_CMD: $TOOLING/target/bin/ant.sh

cache:
  paths:
    - $CI_PROJECT_DIR/.m2/repository

stages:
  - setup
  - test
  - build
  - deploy

# Set up tooling and store target folder as artifact for subsequent
# jobs.
tooling:
  stage: setup
  image: $JAVA_DOCKER_IMAGE
  script:
    - cd $TOOLING
    - ./mvnw $MAVEN_CLI_OPTS clean package
    - ls -l target/bin
  artifacts:
    paths:
      - $TOOLING/target
    expire_in: 1 week

Validation

There's only other job in .gitlab-ci.yml that is run on every push validates the TEI-XML source files. This job generates a human readable validation report and lists of the non-well-formed and invalid files in the repository. These lists are important for all optional downstream jobs, which are only run on valid files.

This validation is encapsulated in its own generic project which is installed as a dependency just like xtriples-micro. It will be made available soon as TEI Validation Reports.

Notice, that the tooling setup is available in this job, since it declares that it needs the artifacts from tooling!

# Use the maven dependency
# de.uni-muenster.scdh.tei:tei-validation-reports for validation
# against the ODD and for generating a validation report. The
# project-specific properties are defined in resources/ci-validation.properties
validate:
  stage: test
  image: $JAVA_DOCKER_IMAGE
  needs:
    - tooling
  script:
    - $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml info
    - $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml validate || $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml report
    - $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml report
    - $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml bad-files
    - ls -l validation-report
  artifacts:
    paths:
      - validation-report/report.html
      - validation-report/non-wellformed.txt
      - validation-report/bad-files.txt
    expire_in: 1 week

Build

And there's only one other job in the pipeline definition in .gitlab-ci.yml: build. It is only triggered manually and then calls sub-pipelines from the .gitlab/ci/ directory.

build:
  stage: build
  trigger:
    include: .gitlab/ci/build.yaml
    strategy: depend
  variables:
    INITIAL_PIPELINE_ID: $CI_PIPELINE_ID
  when: manual

The sub-pipeline in .gitlab/ci/build.yaml looks like this:

dataset:
  stage: build
  image: $JAVA_DOCKER_IMAGE
  needs:
    - pipeline: $INITIAL_PIPELINE_ID
      job: tooling
    - pipeline: $INITIAL_PIPELINE_ID
      job: validate
  before_script:
    - cd $TOOLING
    - ls -l target/bin
    - apk update
    - apk add libc6-compat # required for running native executables in alpine
  script:
    - $ANT_CMD -propertyfile ci-prod.properties tei-dist
    - $ANT_CMD -propertyfile ci-prod.properties html-dist
    - $ANT_CMD -propertyfile ci-prod.properties knowledge-graph
    - $ANT_CMD -propertyfile ci-prod.properties solr
    - $ANT_CMD -propertyfile ci-prod.properties labels
  artifacts:
    paths:
      - dist
      - graph/knowledge-graph.n3
      - graph/knowledge-graph.json
      - solr/merged.json
      - labels
      - validation-report/report.html
      - validation-report/non-wellformed.txt
      - validation-report/bad-files.txt
    expire_in: 1 day


staging:
  stage: deploy
  trigger:
    include: .gitlab/ci/staging.yaml
    strategy: depend
  variables:
    BUILD_PIPELINE_ID: $CI_PIPELINE_ID
  when: manual

    
production:
  stage: deploy
  trigger:
    include: .gitlab/ci/production.yaml
    strategy: depend
  variables:
    BUILD_PIPELINE_ID: $CI_PIPELINE_ID
  when: manual

The job dataset is the workhorse of the whole pipeline. Again, it needs the tooling from the tooling job of the initial pipeline and also needs the lists of invalid files from the validation job. You can see several Ant runs in the scripts section; targets are tei-dist, html-dist, knowledge-graph, solr, labels.

As you can guess, knowledge-graph is the Ant target for extracting the RDF-based knowledge graph from the editions TEI files using xtriples-micro.

The two other jobs, staging and production, take the generated datasets and first deploy them to the staging environment, which is a duplicate of the productive systems. Only if this was successful, the datasets are deployed to the productive systems.

Productive systems include web servers (gitlab pages) for the TEI sources and HTML derivates, web APIs (DTS, TextAPI), access to the knowledge graph, a Solr search engine.

Clone this wiki locally