This is a prototype Train Coordinator service designed for orchestrating workflows and enforcing access control in privacy-preserving data analysis for federated healthcare environments.
This project extends the FAIR Data Train (FDT) framework by integrating:
- Common Workflow Language (CWL) for multi-step workflow execution.
- Open Digital Rights Language (ODRL) for fine-grained data access control.
- Automated multi-step analysis — chain multiple trains with conditional execution.
- Policy-based access control — enforce data retrieval policies usage with ODRL.
- Reusable workflows — built using CWL for portability and reproducibility.
| Endpoint | Method | Description |
|---|---|---|
http://localhost:6060/api/analysis/execute |
POST |
Executes a CWL workflow and returns the result. |
http://localhost:6060/api/analysis/odrl-execute |
POST |
Executes an ODRL policy request and returns the corresponding SPARQL query. (Note: CWL workflows internally call this endpoint when verifying train permissions.) |
This endpoint expects the request body in raw format, describing an ODRL request.
Example payload:
@prefix odrl: <http://www.w3.org/ns/odrl/2/> .
@prefix ex: <http://example.org/> .
<http://example.org/request:se-query>
a odrl:Request ;
odrl:uid <https://www.wikidata.org/wiki/Q25670> ;
odrl:profile <https://www.wikidata.org/wiki/Q4382010> ;
odrl:permission [
odrl:target <http://example.org/graph/extract_data> ;
odrl:assignee ex:researcher ;
odrl:action odrl:read
] .
The prototype stores its workflows and local RDF-based data stations in: src/main/resources/workflow
This folder contains:
- CWL Workflow files — define the steps, inputs, and conditional execution logic for the analysis.
- RDF files — serve as local data stations from which the workflow extracts data using SPARQL queries.
The ODRL engine configuration and policies used for access control are located in: src/main/resources/ODRL
This folder contains:
- Policies — define permissions, prohibitions, and obligations for data access.
- Engine — policy engine that validates access requests against internal policies and if allowed, returns the corresponding SPARQL query file to execute.
- Data - holds a sparql query for the requested data when the an data access request is accepted.
The output folder stores the resulting data generated by the workflow execution.
The ui folder contains a Streamlit-based interface for visualizing the data produced in the output/ folder as a chart.
Make sure you have Python 3.9+ installed, then install the required libraries:
pip install streamlit streamlit-agraph rdflib pandasAfter installation of the required libraries you can use the command below to run the script.
streamlit run ui/sideEffect_chart.pyNote:
On some Windows setups, running streamlit run directly may fail with an error.
If that happens, try the following alternative command:
python -m streamlit run ui/sideEffect_chart.py- Java JDK: 17+
- Maven: 3.6+
- Python: 3.9+
- Docker: latest
git clone to/do
cd TrainCoordinatorcd scripts
# build each image
docker build -t extract-adr .
docker build -t extract-lareb-data .
docker build -t extract-vigi-data .
docker build -t extract-sideeff-data . python3 -m pip install --upgrade pip
pip install cwltoolmvn clean package
mvn spring-boot:run