Skip to content

FAIRDataTeam/TrainOrchestrator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Train Coordinator 🚆 (Prototype)

This is a prototype Train Coordinator service designed for orchestrating workflows and enforcing access control in privacy-preserving data analysis for federated healthcare environments.

This project extends the FAIR Data Train (FDT) framework by integrating:


📌 Features

  • Automated multi-step analysis — chain multiple trains with conditional execution.
  • Policy-based access control — enforce data retrieval policies usage with ODRL.
  • Reusable workflows — built using CWL for portability and reproducibility.
Endpoint Method Description
http://localhost:6060/api/analysis/execute POST Executes a CWL workflow and returns the result.
http://localhost:6060/api/analysis/odrl-execute POST Executes an ODRL policy request and returns the corresponding SPARQL query. (Note: CWL workflows internally call this endpoint when verifying train permissions.)

/api/analysis/odrl-execute Payload Format

This endpoint expects the request body in raw format, describing an ODRL request.
Example payload:

@prefix odrl:   <http://www.w3.org/ns/odrl/2/> .
@prefix ex:     <http://example.org/> .

<http://example.org/request:se-query>
    a odrl:Request ;
    odrl:uid <https://www.wikidata.org/wiki/Q25670> ;
    odrl:profile <https://www.wikidata.org/wiki/Q4382010> ;

    odrl:permission [
        odrl:target <http://example.org/graph/extract_data> ;
        odrl:assignee ex:researcher ;
        odrl:action odrl:read
    ] .

📂 File Paths

The prototype stores its workflows and local RDF-based data stations in: src/main/resources/workflow

This folder contains:

  • CWL Workflow files — define the steps, inputs, and conditional execution logic for the analysis.
  • RDF files — serve as local data stations from which the workflow extracts data using SPARQL queries.

ODRL Engine

The ODRL engine configuration and policies used for access control are located in: src/main/resources/ODRL

This folder contains:

  • Policies — define permissions, prohibitions, and obligations for data access.
  • Engine — policy engine that validates access requests against internal policies and if allowed, returns the corresponding SPARQL query file to execute.
  • Data - holds a sparql query for the requested data when the an data access request is accepted.

Output Directory

The output folder stores the resulting data generated by the workflow execution.

UI Directory

The ui folder contains a Streamlit-based interface for visualizing the data produced in the output/ folder as a chart.
Make sure you have Python 3.9+ installed, then install the required libraries:

pip install streamlit streamlit-agraph rdflib pandas

After installation of the required libraries you can use the command below to run the script.

streamlit run ui/sideEffect_chart.py

Note: On some Windows setups, running streamlit run directly may fail with an error. If that happens, try the following alternative command:

python -m streamlit run ui/sideEffect_chart.py

🧩 Installation & Setup (Prototype)

Note:
This prototype only runs on Linux based OS (Debian/Ubuntu)

1) Prerequisites

  • Java JDK: 17+
  • Maven: 3.6+
  • Python: 3.9+
  • Docker: latest

2) Clone the repository

git clone to/do
cd TrainCoordinator

3) Dockerize python scripts

cd scripts
# build each image
docker build -t extract-adr .           
docker build -t extract-lareb-data .    
docker build -t extract-vigi-data .     
docker build -t extract-sideeff-data . 

4) install 'cwltool'

python3 -m pip install --upgrade pip
pip install cwltool

5) Run the service

mvn clean package
mvn spring-boot:run

About

Component to orchestrate more complex FAIR Data Train runs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 78.8%
  • Java 11.2%
  • Common Workflow Language 8.4%
  • Dockerfile 1.6%