-
Notifications
You must be signed in to change notification settings - Fork 0
CI CD Pipelines
After configuring the tooling environment to get a release of
xtriples-micro, it can be used in CI/CD
pipeline.
Here's how the CI/CD pipelines at SCDH Münster look like. The directory structure is shown here.
This is the beginning of a .gitlab-ci.yml. It defines some variables
first, telling which container image to use for getting Java etc. The
Tooling environment is set up only once per pipeline run in the
tooling job, and the artifacts (tools) in $TOOLING/target are
stored as an artifact for the subsequent jobs in the pipeline.
variables:
JAVA_DOCKER_IMAGE: eclipse-temurin:17-alpine
MAVEN_OPTS: >-
-Dhttps.protocols=TLSv1.2
-Dmaven.repo.local=$CI_PROJECT_DIR/.m2/repository
-Dorg.slf4j.simpleLogger.showDateTime=true
-Djava.awt.headless=true
MAVEN_CLI_OPTS: >-
--batch-mode
--errors
--fail-at-end
--show-version
--no-transfer-progress
-DinstallAtEnd=true
-DdeployAtEnd=true
TOOLING: resources
ANT_CMD: $TOOLING/target/bin/ant.sh
cache:
paths:
- $CI_PROJECT_DIR/.m2/repository
stages:
- setup
- test
- build
- deploy
# Set up tooling and store target folder as artifact for subsequent
# jobs.
tooling:
stage: setup
image: $JAVA_DOCKER_IMAGE
script:
- cd $TOOLING
- ./mvnw $MAVEN_CLI_OPTS clean package
- ls -l target/bin
artifacts:
paths:
- $TOOLING/target
expire_in: 1 weekThere's only other job in .gitlab-ci.yml that is run on every push
validates the TEI-XML source files. This job generates a human
readable validation report and lists of the non-well-formed and
invalid files in the repository. These lists are important for all
optional downstream jobs, which are only run on valid files.
This validation is encapsulated in its own generic project which is
installed as a dependency just like xtriples-micro. It will be made
available soon as TEI Validation
Reports.
Notice, that the tooling setup is available in this job, since it
declares that it needs the artifacts from tooling!
# Use the maven dependency
# de.uni-muenster.scdh.tei:tei-validation-reports for validation
# against the ODD and for generating a validation report. The
# project-specific properties are defined in resources/ci-validation.properties
validate:
stage: test
image: $JAVA_DOCKER_IMAGE
needs:
- tooling
script:
- $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml info
- $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml validate || $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml report
- $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml report
- $ANT_CMD -propertyfile $TOOLING/ci-validation.properties -f $TOOLING/target/dependencies/tei-validation-reports/build.xml bad-files
- ls -l validation-report
artifacts:
paths:
- validation-report/report.html
- validation-report/non-wellformed.txt
- validation-report/bad-files.txt
expire_in: 1 weekAnd there's only one other job in the pipeline definition in
.gitlab-ci.yml: build. It is only triggered manually and then calls
sub-pipelines from the .gitlab/ci/ directory.
build:
stage: build
trigger:
include: .gitlab/ci/build.yaml
strategy: depend
variables:
INITIAL_PIPELINE_ID: $CI_PIPELINE_ID
when: manualThe sub-pipeline in .gitlab/ci/build.yaml looks like this:
dataset:
stage: build
image: $JAVA_DOCKER_IMAGE
needs:
- pipeline: $INITIAL_PIPELINE_ID
job: tooling
- pipeline: $INITIAL_PIPELINE_ID
job: validate
before_script:
- cd $TOOLING
- ls -l target/bin
- apk update
- apk add libc6-compat # required for running native executables in alpine
script:
- $ANT_CMD -propertyfile ci-prod.properties tei-dist
- $ANT_CMD -propertyfile ci-prod.properties html-dist
- $ANT_CMD -propertyfile ci-prod.properties knowledge-graph
- $ANT_CMD -propertyfile ci-prod.properties solr
- $ANT_CMD -propertyfile ci-prod.properties labels
artifacts:
paths:
- dist
- graph/knowledge-graph.n3
- graph/knowledge-graph.json
- solr/merged.json
- labels
- validation-report/report.html
- validation-report/non-wellformed.txt
- validation-report/bad-files.txt
expire_in: 1 day
staging:
stage: deploy
trigger:
include: .gitlab/ci/staging.yaml
strategy: depend
variables:
BUILD_PIPELINE_ID: $CI_PIPELINE_ID
when: manual
production:
stage: deploy
trigger:
include: .gitlab/ci/production.yaml
strategy: depend
variables:
BUILD_PIPELINE_ID: $CI_PIPELINE_ID
when: manualThe job dataset is the workhorse of the whole pipeline. Again, it
needs the tooling from the tooling job of the initial pipeline and
also needs the lists of invalid files from the validation job. You
can see several Ant runs in the scripts section; targets are
tei-dist, html-dist, knowledge-graph, solr, labels.
As you can guess, knowledge-graph is the Ant target for extracting
the RDF-based knowledge graph from the editions TEI files using
xtriples-micro.
The two other jobs, staging and production, take the generated
datasets and first deploy them to the staging environment, which is a
duplicate of the productive systems. Only if this was successful, the
datasets are deployed to the productive systems.
Productive systems include web servers (gitlab pages) for the TEI sources and HTML derivates, web APIs (DTS, TextAPI), access to the knowledge graph, a Solr search engine.