This README describes the available endpoints of the DIMAG IngestList API Wrapper and shows how to quickly try them out.
Tip: The examples are designed so you can copyβpaste and run them directly in your terminal.
A quick, skimmable summary of what this project provides.
- π API for digital preservation jobs:
- Validate or Identify files via a single endpoint (
POST /api/create). - Works with existing server files (remote) and direct uploads (multipart).
- Inspect processing results via Jobs (
GET /api/jobs,GET /api/job/{id}).
- Validate or Identify files via a single endpoint (
- π Authentication with JWT: simple login (
POST /api/login) returns a token for subsequent calls. - π§© Two operation types: Validate and Identify, unified request model.
- π§ Developerβfriendly docs: copyβpasteable curl snippets and endβtoβend workflows.
- βοΈ Configuration via YAML with sensible defaults and autoβcreation of
config.yamlif missing. - ποΈ Database via GORM: SQLite for local/dev; PostgreSQL for production/Kubernetes.
- π§΅ Background processing:
- robfig/cronβbased scheduler (configurable interval) to process queued tasks.
- Worker pool with configurable parallelism and automatic cleanup of finished work.
- ποΈ File storage management: local storage directory for uploads and processing artifacts.
- π Prometheus metrics endpoint
/metricswith custom business and storage metrics, plus HTTP counters/histograms. - π¦ Containerized application: multiβstage Docker/Podman build and readyβtoβrun images.
- βΈοΈ Kubernetesβready: manifests for PostgreSQL and guidance for deploying the app (Deployment/Service/Ingress).
- π Clear project structure and major dependencies (Gin, GORM, robfig/cron, JWT, Prometheus, YAML, UUID).
- π¬ Feedback & contributions welcome: open issues/PRs; MITβlicensed for hassleβfree adoption.
- [Project Structure](#project structure)
- Dependencies
- Notes
- Configuration
- Metrics (Prometheus)
- API
- Container Image (Docker/Podman)
- Kubernetes
- Feedback & Contributions
- License
High-level overview of the repository layout:
.
βββ backend # Go backend service
β βββ cmd/server # Main entry point
β βββ internal # Application layers (services, infra, worker, scheduler)
β βββ pkg/config # YAML config loader and defaults
β βββ Dockerfile # Multi-stage image build
βββ conf # Example or local configuration (mount into container)
βββ frontend # (Reserved for UI, if applicable)
βββ k8s # Kubernetes manifests (Postgres)
βββ scripts # Helper scripts
βββ README.md # This documentation
Major libraries used by the backend (see backend/go.mod for exact versions):
- gin-gonic/gin β HTTP web framework (routing, middleware)
- gin-contrib/cors β CORS middleware for Gin
- gorm.io/gorm β ORM used for persistence
- gorm.io/driver/sqlite β SQLite driver (default dev mode)
- gorm.io/driver/postgres β PostgreSQL driver (for production/K8s option)
- robfig/cron/v3 β Cron-like scheduler used for background workers (
task_scheduler.interval) - golang-jwt/jwt/v5 β JWT creation and validation for auth
- prometheus/client_golang β Prometheus metrics instrumentation
- gopkg.in/yaml.v3 β YAML configuration parsing
- google/uuid β UUID generation for entities/jobs
Notes:
- Go toolchain is defined via the container image (
golang:1.25-alpine) andbackend/go.mod(go 1.24.x). - To update or add a dependency locally:
cd backend && go get <module>@latest && go mod tidy.
- β±οΈ JWT tokens have an expiration (
expclaim). After expiry, log in again to obtain a new token. - π Both operation types (Validate and Identify) use the same endpoint
/api/create; the operation type is controlled via thetypeparameter. - π§ͺ In all examples, replace variables like
$BASE_URLand$TOKENwith your values if you don't use environment variables.
- Login returns
401 Unauthorized:- Check credentials (
email,password). - Ensure
Content-Type: application/jsonis set.
- Check credentials (
- Upload fails (
400 Bad Request):- Make sure you use
multipart/form-dataand the fieldfile=@/path/to/fileexists. - Verify that
typeis exactlyValidateorIdentify(caseβsensitive!).
- Make sure you use
- Remote operations return
404:- Is the server path in
filenamecorrect? Does the file exist on the server?
- Is the server path in
This application reads configuration from a YAML file. If no file exists, a default file is created with sane defaults.
- Search order for
config.yaml:./config.yaml./config/config.yaml/etc/IngestListApiWrapper/config.yaml
Default values are used when the file is created:
server:
host: "localhost"
port: "8080"
database:
driver: "sqlite" # use "postgres" to connect to Postgres
dsn: "tasks.db" # for Postgres: e.g. "host=... user=... password=... dbname=... sslmode=disable"
max_open: 10
max_idle: 5
task_scheduler:
interval: "*/30 * * * * *" # every 30 seconds
file_storage_path: "data"
max_workers: 3
security:
secret_key: "your-secret-key"Tips:
- For PostgreSQL, set
database.driver: postgresand provide a properdsn. - In containers, mount your config to
/app/config/config.yamlor/etc/IngestListApiWrapper/config.yaml. - The HTTP server listens on
server.host:server.port(defaultlocalhost:8080).
The service exposes Prometheus metrics at the endpoint /metrics. Besides standard Go/HTTP metrics, it publishes custom business and storage metrics.
- Endpoint:
GET /metrics - Behavior: right before responding, the server refreshes business metrics from the database and filesystem via the
MetricsService, so values are up-to-date at scrape time.
Custom metrics exposed:
ingestlist_tasks_total(gauge) β Total number of tasks in the databaseingestlist_tasks_by_status{status}(gauge) β Number of tasks grouped by statusingestlist_storage_files_count(gauge) β Count of files in the storage directoryingestlist_storage_size_bytes(gauge) β Total size of files in the storage directory (bytes)ingestlist_http_requests_total{method,endpoint,status}(counter) β Total HTTP requestsingestlist_http_request_duration_seconds{method,endpoint}(histogram) β HTTP request duration; exposed as_bucket,_sum,_count
Quick check with curl:
curl -sS http://localhost:8080/metrics | head -n 40Example Prometheus scrape configuration:
scrape_configs:
- job_name: "ilwrapper"
static_configs:
- targets: ["ilwrapper:8080"] # or "localhost:8080" when running locally
metrics_path: /metrics
scrape_interval: 15sExample Grafana/PromQL queries:
- Requests per second by endpoint (last 5m):
sum by (endpoint) (rate(ingestlist_http_requests_total[5m])) - Tasks by status:
sum by (status) (ingestlist_tasks_by_status)
# 1) Set the base URL (adapt to your environment if needed)
export BASE_URL="http://dimagapps-ilwrapper"
# 2) Log in and store the token in $TOKEN
TOKEN=$(curl -sS -X POST "$BASE_URL/api/login" \
-H "Content-Type: application/json" \
-d '{"email":"user","password":"password"}' | jq -r '.token')
# 3) Example: run RemoteβValidate
curl -sS -X POST "$BASE_URL/api/create" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"filename":"data/example.pdf","type":"Validate"}' | jq .Itβs convenient to set an environment variable for the base URL:
export BASE_URL="http://dimagapps-ilwrapper"- curl (for example requests)
- optional: jq (for pretty JSON output and easy token parsing)
The API uses Bearer token authentication (JWT). After logging in, youβll receive a JWT token that must be sent with subsequent
requests in the Authorization header: Authorization: Bearer <TOKEN>.
Authenticate a user and receive a JWT token.
Endpoint: POST /api/login
Headers:
Content-Type: application/json
Request Body:
{
"email": "user",
"password": "password"
}
Response (example 200):
{
"token": "<JWT>"
}
Status codes: 200, 400, 401
# cURL example:
curl -sS -X POST "$BASE_URL/api/login" \
-H "Content-Type: application/json" \
-d '{"email":"user","password":"password"}' | jq .
# Store token in a variable (if jq is available):
TOKEN=$(curl -sS -X POST "$BASE_URL/api/login" \
-H "Content-Type: application/json" \
-d '{"email":"user","password":"password"}' | jq -r '.token')Validate a file that already exists on the server.
Endpoint: POST /api/create
Headers:
Authorization: Bearer <TOKEN>
Content-Type: application/json
Request Body:
{
"filename": "data/path/to/file.pdf",
"type": "Validate"
}
Status codes: 202 (Accepted), 400, 401, 404
# cURL example:
curl -sS -X POST "$BASE_URL/api/create" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"filename":"data/example.pdf","type":"Validate"}' | jq .Identify the file format of a file that already exists on the server.
Endpoint: POST /api/create
Headers:
Authorization: Bearer <TOKEN>
Content-Type: application/json
Request Body:
{
"filename": "data/path/to/file.any",
"type": "Identify"
}
Status codes: 202 (Accepted), 400, 401, 404
# cURL example:
curl -sS -X POST "$BASE_URL/api/create" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"filename":"data/example.pdf","type":"Identify"}' | jq .Validate an uploaded file.
Endpoint: POST /api/create
Headers:
Authorization: Bearer <TOKEN>
Content-Type: multipart/form-data
Form Data:
type=Validate
file=@/path/to/file.pdf
Status codes: 202 (Accepted), 400, 401
# cURL example:
curl -sS -X POST "$BASE_URL/api/create" \
-H "Authorization: Bearer $TOKEN" \
-F "type=Validate" \
-F "file=@/path/to/your/file.pdf" | jq .Identify the file format of an uploaded file.
Endpoint: POST /api/create
Headers:
Authorization: Bearer <TOKEN>
Content-Type: multipart/form-data
Form Data:
type=Identify
file=@/path/to/file.any
Status codes: 202 (Accepted), 400, 401
# cURL example:
curl -sS -X POST "$BASE_URL/api/create" \
-H "Authorization: Bearer $TOKEN" \
-F "type=Identify" \
-F "file=@/path/to/your/file.html" | jq .Fetch all jobs.
Endpoint: GET /api/jobs
Headers:
Authorization: Bearer <TOKEN>
Status codes: 200, 401
# cURL example:
curl -sS -X GET "$BASE_URL/api/jobs" \
-H "Authorization: Bearer $TOKEN" | jq .Fetch a specific job by its ID.
Endpoint: GET /api/job/{id}
Headers:
Authorization: Bearer <TOKEN>
Path Parameter:
id (integer): Job ID
Status codes: 200, 401, 404
# cURL example:
curl -sS -X GET "$BASE_URL/api/job/5" \
-H "Authorization: Bearer $TOKEN" | jq .The API supports the following operation types:
- Validate: Validate a file
- Identify: Identify the file format
- Log in (get token)
TOKEN=$(curl -sS -X POST "$BASE_URL/api/login" \
-H "Content-Type: application/json" \
-d '{"email":"user","password":"password"}' | jq -r '.token')- Validate file (remote)
curl -sS -X POST "$BASE_URL/api/create" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"filename":"data/example.pdf","type":"Validate"}' | jq .- Fetch jobs
curl -sS -X GET "$BASE_URL/api/jobs" \
-H "Authorization: Bearer $TOKEN" | jq .- Log in
TOKEN=$(curl -sS -X POST "$BASE_URL/api/login" \
-H "Content-Type: application/json" \
-d '{"email":"user","password":"password"}' | jq -r '.token')- Upload and validate file
curl -sS -X POST "$BASE_URL/api/create" \
-H "Authorization: Bearer $TOKEN" \
-F "type=Validate" \
-F "file=@/path/to/file.pdf" | jq .- Fetch a specific job
curl -sS -X GET "$BASE_URL/api/job/5" \
-H "Authorization: Bearer $TOKEN" | jq .- The file must already exist on the server
- Use
Content-Type: application/json - File path is passed in the
filenameparameter
- The file is uploaded with the request
- Use
Content-Type: multipart/form-data - File is provided as the
fileform field
This project is a containerized application. A multi-stage build is provided at backend/Dockerfile.
Build (Docker):
# from repository root
docker build -f backend/Dockerfile -t ilwrapper:latest .
# Optional: Configure HTTP(S) proxy for bundled tools at build-time
docker build -f backend/Dockerfile \
--build-arg PROXY_HOST="your.proxy" \
--build-arg PROXY_PORT="3128" \
-t ilwrapper:latest .Build (Podman):
podman build -f backend/Dockerfile -t ilwrapper:latest .Run:
# Minimal run; maps 8080
docker run --rm -p 8080:8080 --name ilwrapper ilwrapper:latest
# With mounted config and data directory
docker run --rm -p 8080:8080 \
-v "$(pwd)/conf/config.yaml:/app/config/config.yaml:ro" \
-v "$(pwd)/data:/app/data" \
--name ilwrapper ilwrapper:latestRun with Podman (analogous):
podman run --rm -p 8080:8080 --name ilwrapper ilwrapper:latestNotes:
- The image exposes port
8080and runs the server binary. - The container working directory is
/app.
The k8s/ directory currently contains manifests to provision a PostgreSQL database:
k8s/1-database-secret.yml(credentials as a Kubernetes Secret)k8s/1-database-statefulset.yml(PostgreSQL StatefulSet and Service)
Quick start:
# Create namespace (once)
kubectl create namespace ilwrapper
# Apply database manifests
kubectl apply -n ilwrapper -f k8s/1-database-secret.yml
kubectl apply -n ilwrapper -f k8s/1-database-statefulset.yml
# Check
kubectl get all -n ilwrapperNext steps:
- Create and apply a Deployment and Service for ILWrapper pointing to your built image
ilwrapper:latest(or a pushed registry tag). - Configure the application to use Postgres by setting
database.driver: postgresand a Postgresdsnvia config (see Configuration above). Consider mounting the config with a ConfigMap or Secret. - Optionally add an Ingress or LoadBalancer Service for external access.
Feedback, ideas, and contributions are very welcome!
- Open an issue for bugs or feature requests.
- Fork the repo and submit a pull request for improvements.
- If unsure where to start, feel free to propose changes first in an issue.
Please keep changes small and focused where possible. Add context to PRs (what, why) and include examples or screenshots if relevant.
This project is licensed under the MIT License. See the LICENSE file for details.