An end-to-end scientific ML job orchestration platform developed as part of my bachelor’s thesis and Alisher Layik’s bachelor’s thesis in CTU FIT (ČVUT FIT), designed to ingest, preprocess, analyze, and manage large volumes of astronomical spectra (LAMOST FITS files) through human-in-the-loop machine learning workflows.
The core goal of this system is to streamline and automate the full lifecycle of spectroscopic data processing, from raw file ingestion to active-learning–driven model refinement, while providing a unified API and web UI for monitoring, controlling, and labeling ML jobs.
In particular, this platform addresses the following needs and research objectives:
-
💾 High-Throughput FITS Ingestion & Preprocessing
- LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope)
releases millions of raw
FITS
files containing stellar spectra. Before any classification or analysis can occur, these spectra must be read, normalized, and interpolated onto a uniform wavelength grid. The “Data Preprocessing” pipeline in the system handles:- Reading raw
FITS
headers and data arrays with Astropy. - Interpolating flux values across a common wavelength range (e.g., 3800 Å to 9000 Å) with Scikit-Learn.
- Min–max scaling of flux measurements with Scikit-Learn.
- Writing the preprocessed spectra into a single, consolidated
HDF5
file for downstream tasks with h5py.
- Reading raw
- LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope)
releases millions of raw
-
💠 Active Learning for Spectral Classification
- Even with large-scale labeled datasets, manual labeling of rare or ambiguous spectral classes remains
labor-intensive. The “Active ML” pipeline:
- Trains a 1D-CNN (built in TensorFlow & Keras) on any existing labeled spectra.
- Uses uncertainty metrics (entropy of softmax outputs) to identify spectra that should be sent to an expert “oracle” for manual labeling.
- Computes performance est. sets and candidate sets based on user-defined classes.
- Iteratively refines the training corpus by integrating newly labeled spectra, retraining, and selecting the next batch for expert review.
- Outputs intermediate artifacts (
HDF5
,JSON
) for visualization and for use by the frontend with h5py.
- Even with large-scale labeled datasets, manual labeling of rare or ambiguous spectral classes remains
labor-intensive. The “Active ML” pipeline:
-
👩🏼💻 Human-in-the-Loop Labelling Workflow
- To ensure high classification accuracy on edge‐case spectra (e.g., double-peak emission lines), the system supports:
- Automatic batch initialization of labeling jobs (via an HTTPX callback to the back-end API).
- A React-based web UI to display selected spectra (with flux vs. wavelength plots) and collect “oracle” labels.
- Tracking of labeling status and iteration counts so that domain experts can focus on the spectra most beneficial for model improvement.
- To ensure high classification accuracy on edge‐case spectra (e.g., double-peak emission lines), the system supports:
-
🖥️ API & Web UI for Job Management
- Researcher user needs a centralized way to:
- Submit new data preprocessing or active-learning jobs (via REST endpoints) with FastAPI and React.
- Monitor job statuses, timestamps, and logs in real time.
- Browse raw and preprocessed spectra metadata, view plots, and download files with FastAPI, aiofiles, Astropy, and Plotly.
- Researcher user needs a centralized way to:
-
🪛 Scalable, Containerized DevOps Stack
- To simplify deployment and reproducibility across environments:
- The entire platform is containerized with Docker: PostgreSQL for metadata persistence, RabbitMQ as the job message broker, FastAPI for backend, Celery workers for computations, and React for frontend.
- A single
docker-compose.yml
file spins up all services with a one-line command. - Environment‐driven configuration (via
.env
files and Pydantic settings) allows seamless switching between local development, testing, and production clusters.
- To simplify deployment and reproducibility across environments:
By integrating these components, ML Job Manager provides a robust research prototype and platform implementation for anyone working on large‐scale astronomical spectroscopy, active learning in scientific contexts, or end-to-end ML workflow orchestration skeleton/basement for another scientific projects.
Monorepo containing 3 components for an end-to-end ML job orchestration platform:
ml-job-manager/
├── images/
├── ml-job-api/ ← ML Job API FastAPI microservice (REST API)
├── ml-job-ui/ ← ML Job UI React frontend (Web Interface + Spectra Visualizations)
├── ml-job-worker/ ← ML Job Worker Celery Worker (Data Preprocessing & Active ML)
├── docker-compose.yml
├── LICENSE
└── README.md
-
ML Job API:
– CRUD endpoints for jobs, labellings, spectra, file storage.
– Async
PostgreSQL
persistence,Alembic
migrations.–
Celery
integration for dispatching jobs. -
ML Job Worker:
–
Celery
jobs: Data Preprocessing & Active ML pipelines.–
TensorFlow
CNN,Scikit-Learn
utilities (SMOTE, t-SNE).–
HTTPX
callbacks to ML Job API. -
ML Job UI:
–
React
+Tailwind
Web frontend.– Live job status, spectra view, labelling workflow.
-
DevOps:
–
Docker
&Docker Compose
for full-stack local development.– Environment-driven configuration via
.env
and Pydantic.
-
Docker
&Docker Compose
≥ v2.0. -
Nvidia GPU for ML Job Worker
TensorFlow
CNN computations.
Clone the repo:
git clone https://github.com/bursasha/ml-job-manager.git
cd ml-job-manager
Create a .env
in the project root (see .env.example
for all keys):
DEBUG=True
FILES_DIR_PATH=...
SPECTRA_DIR_PATH=...
JOB_QUEUE=jobs
#
UI_PORT=10000
#
API_PORT=10100
ENGINE_CONNECTION_TIMEOUT=3
#
WORKER_PORT=10200
BROKER_CONNECTION_TIMEOUT=3
API_CONNECTION_TIMEOUT=3
#
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_DB=...
#
RABBITMQ_MANAGEMENT_PORT=10300
RABBITMQ_DEFAULT_USER=...
RABBITMQ_DEFAULT_PASS=...
Bring up the entire stack:
docker compose up
This will launch following services:
-
ML Job UI (
React
) -
ML Job API (
FastAPI
) -
ML Job Worker (
Celery
) -
ML Job Queue (
RabbitMQ
) -
ML Job DB (
PostgreSQL
)
You can now:
-
Visit the UI at http://localhost:10000
-
Browse API docs at http://localhost:10100/docs
Stop:
docker compose stop
Remove:
docker compose down
docker compose logs -f ml-job-ui
docker compose logs -f ml-job-api
docker compose logs -f ml-job-worker
docker compose logs -f ml-job-queue
docker compose logs -f ml-job-db
This work is made available under the terms of a non‐exclusive authorization per Act No. 121/2000 Coll., Copyright Act, and Section 2373(2) of Act No. 89/2012 Coll., Civil Code (see ML Job Manager LICENSE for full text).