Satellite Fleet Health Management System
Real-time ISS telemetry monitoring with ML-powered diagnostics
Continuous Operations Network for Satellite Telemetry Evaluation, Life-cycle Analysis, Tracking, Intelligence, Operations, and Notification
CONSTELLATION monitors the International Space Station's attitude control and communications subsystems using real-time telemetry from NASA's public Lightstreamer feed. The system provides:
Real-Time Telemetry Processor
- AWS Lambda function subscribing to NASA Lightstreamer
- Filters for attitude control and communications subsystems
- Writes to DynamoDB for real-time access
- Archives to S3 for historical analysis
Parameters Monitored:
Attitude Control (Reaction Wheels & CMGs):
USLAB000084: Reaction Wheel Assembly (RWA) speedUSLAB000085: RWA bearing temperatureUSLAB000086: RWA current drawUSLAB000087: CMG (Control Moment Gyroscope) momentum- Attitude quaternions (pitch, roll, yaw)
- Rate gyro outputs
Communications:
- S-band transponder power levels
- Ku-band signal strength
- Antenna pointing accuracy
- Data throughput metrics
- Ground station contact windows
- Communication link quality indicators
Cross-Cutting:
- Power system voltage/current (affects both subsystems)
- Thermal readings (reaction wheel bearings, transmitter temps)
- Time-on-orbit (cumulative degradation tracking)
Time Series Features:
- Rolling statistics (mean, std, min, max) over multiple windows (1hr, 6hr, 24hr, 7day)
- Rate of change calculations
- Autocorrelation features
- Fourier transform for periodic patterns
- Lag features (t-1, t-6, t-24 for hourly data)
Domain-Specific Features:
- Reaction wheel friction coefficient (derived from speed vs. current)
- Thermal cycling count (number of orbital day/night transitions)
- Momentum accumulation rate
- Communication link budget margin
- Signal degradation trends
- Anomaly persistence scores
Engineering Calculations:
- Power efficiency ratios
- Thermal dissipation rates
- Bearing wear indicators
- Transmitter efficiency
- Pointing error accumulation
Model 1: Anomaly Detection (Isolation Forest + LSTM Autoencoder)
Purpose: Real-time detection of unusual telemetry patterns
Approach:
- Isolation Forest for fast, lightweight anomaly flagging
- LSTM Autoencoder for complex temporal anomaly detection
- Ensemble voting for final anomaly score
Training Data:
- Nominal operational periods (confirmed healthy operation)
- Labeled anomalies from NASA incident reports
Metrics: Precision, Recall, F1-Score, False Positive Rate
Model 2: Degradation Forecasting (Temporal Fusion Transformer)
Purpose: Predict subsystem performance degradation over time
Targets:
- Reaction wheel bearing temperature trend
- Solar panel output decline
- Battery capacity fade
- Communication signal strength degradation
Features: Time series telemetry + orbital mechanics (radiation exposure, thermal cycling)
Output: Forecasted parameter values with confidence intervals (7, 30, 90 days ahead)
Model 3: Survival Analysis (Cox Proportional Hazards)
Purpose: Estimate time-to-failure for critical components
Approach:
- Cox model for component-level survival curves
- Censored data handling for components still operational
- Hazard ratios for risk factors (high temps, usage patterns)
Output: Probability of failure within time windows (30d, 60d, 90d, 180d)
Model 4: Fault Classification (XGBoost)
Purpose: Diagnose root cause when anomalies occur
Classes:
- Thermal stress
- Mechanical wear (bearings, gimbals)
- Electrical fault
- Software/command error
- External disturbance (debris impact, space weather)
- Normal operational variation
Features: Anomaly signatures, subsystem interactions, environmental context
Output: Ranked list of probable causes with confidence scores
Constraint Satisfaction Problem:
Variables:
- Maintenance task list (derived from predictions)
- Available maintenance windows
- Crew availability (for ISS; ground station access for unmanned satellites)
- Orbital position constraints
- Mission priority levels
Constraints:
- Ground station contact requirements
- Crew schedule conflicts
- Tool/equipment availability
- Task dependencies (some maintenance requires others first)
- Safety margins (don't defer critical items)
Objective Function:
- Minimize risk-weighted maintenance delay
- Balance urgency vs. operational disruption
- Optimize crew time utilization
Algorithm: Mixed Integer Programming (MIP) using PuLP or Google OR-Tools
┌─────────────────────────────────────────────────────────────────┐
│ DATA INGESTION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ NASA Lightstreamer → Lambda (Real-time) → DynamoDB │
│ Historical Archive → S3 Data Lake │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ FEATURE ENGINEERING LAYER │
├─────────────────────────────────────────────────────────────────┤
│ • Time series windowing │
│ • Statistical feature extraction │
│ • Subsystem correlation analysis │
│ • Degradation rate calculation │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ ML MODEL SUITE │
├─────────────────────────────────────────────────────────────────┤
│ Anomaly Detection → Isolation Forest / Autoencoder │
│ Degradation Forecast → LSTM / Temporal Fusion Transformer │
│ Survival Analysis → Cox Proportional Hazards / Weibull │
│ Fault Classification → Random Forest / XGBoost │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ OPERATIONAL LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Health Scoring → Maintenance Scheduling → Alert Generation │
│ Dashboard (Streamlit) → CloudWatch Monitoring │
└─────────────────────────────────────────────────────────────────┘
### Model Training Strategy
Local Development:
- Jupyter notebooks for experimentation
- GPU-enabled local training for initial model development
- Small data samples for rapid iteration
Production Training:
- AWS SageMaker for full dataset training
- Hyperparameter tuning with SageMaker Automatic Model Tuning
- Distributed training for large models
- Model versioning with MLflow
Anomaly Detection:
- Precision, Recall, F1-Score
- False Positive Rate (critical for operational systems)
- Detection latency
- ROC-AUC, PR-AUC
Degradation Forecasting:
- RMSE, MAE, MAPE
- Prediction interval coverage
- Directional accuracy (did we predict the trend correctly?)
- Forecast horizon performance (7d vs 30d vs 90d)
Survival Analysis:
- Concordance index (C-index)
- Brier score
- Calibration plots (predicted vs observed survival)
- Time-dependent AUC
Fault Classification:
- Accuracy, Precision, Recall per class
- Confusion matrix
- Top-k accuracy (are correct diagnoses in top 3 predictions?)
# Clone repository
git clone <your-repo-url>
cd constellation
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Start telemetry collection
python -m src.ingestion.collect_telemetryData will be saved to data/raw/ in date-partitioned Parquet files.
This project demonstrates production-grade ML engineering capabilities including distributed training infrastructure, experiment management, and systematic research methodology. All code and documentation available for technical review.
Data Science | ML Engineering
Demonstrating institutional-quality quantitative finance capabilities through production-grade portfolio optimization