🧬 GNN Challenge: cfRNA → Placenta Inductive GNN for Maternal-Fetal Health Prediction

Scientific Focus

Inductive graph learning across cfRNA and placental transcriptomics to detect maternal-fetal health issues.
Learn transferable representations that generalize to unseen samples and domains rather than treating each dataset independently.

Alignment with BASIRA Lab's Mission

Prioritizes robust generalization across heterogeneous datasets.
Uses compute-efficient, non–data-hungry graph learning methods that can run on standard hardware.

Inspiration from GNN Literature

Draws from studies on inductive learning, message passing, and representation transfer.
Model design follows DGL Lectures 1.1–4.6, covering:
- Graph construction from tabular data
- Node feature encoding
- Neighborhood aggregation (GraphSAGE-style inductive updates)
- Mini-batch training via neighborhood sampling
- Inductive inference on unseen nodes

Dataset Source and Description

Source

Publicly available on Gene Expression Omnibus (GEO), maintained by the NIH.

Datasets Used

Maternal plasma cfRNA data: GSE192902
Placental RNA-seq data: GSE234729
Features: 6,000 harmonized gene expression features across two cell types

Training and Test Data

Training Data: 209–210 cfRNA samples (balanced)
Test Data: 123–124 placenta samples (inductive, unseen during training)
Classes: 0 = Control, 1 = Preeclampsia

Purpose and Integration Goal

Identify and validate cfRNA biomarkers for early prediction of preeclampsia, often before clinical symptoms appear.
Support research in maternal-fetal health and early detection of preeclampsia.
Integrate gene expression and clinical metadata to capture subtle risk patterns while handling noisy and imbalanced data for robust and equitable predictions.

Dataset Construction and Preprocessing (`build_dataset.ipynb`) and Kaggle

Objective: Ensure structural compatibility for graph construction and inductive learning by Hnadling Expression Data, Parsing and Cleaning Metedata, and Expression-Metadata Fusion

Steps Implemented:

Expression Data Handling: Load and align expression matrices (sample × gene).
Metadata Parsing and Cleaning: Normalize clinical and biological attributes; clean string-based metadata.
Expression–Metadata Fusion: Merge expression and metadata tables using sample IDs to form node-level feature matrices.

Dataset Properties and Complexity:

Small enough for local download yet challenging: high-dimensional features, rich but noisy metadata, biological heterogeneity.

Constraints:

No external data
Fixed feature space
Inductive setting: test samples unseen during training
Feasible on standard hardware

Deliberate Complexity:

Noise and missingness in metadata
Unbalanced disease labels
Predictive patterns emerge only through neighborhood aggregation
Large feature space vs. sample size requiring inductive bias
Cross-dataset domain shift (cfRNA vs. placenta) requiring generalizable representations

Advanced GNN Implementation (`advanced_GNN_model.py`)

Objective: Implement an advanced inductive GNN for cfRNA → placenta prediction, ensuring generalizable node representations and inductive learning.

Key Components:

Graph Construction: Build hetero-graphs using similarity and ancestry edges.
Node Feature Encoding: Integrate gene expression and metadata into node-level features.
Neighborhood Aggregation: GraphSAGE-style layers with BatchNorm and ReLU for neighbor information propagation.
Mini-Batch Training: Use neighborhood sampling** for efficient training on large graphs.
Inductive Inference: Generate predictions for unseen placenta nodes without label leakage.

📝 Citation

If you use this challenge or dataset in your research, please cite:

@dataset{gnn_challenge_2026,
  title={GNN Challenge: cfRNA → Placenta Inductive GNN for Maternal-Fetal Health Prediction},
  author={Mubaraq Onipede},
  year={2026},
  url={https://github.com/Mubarraqqq/gnn-challenge}
}

📄 License

See LICENSE file for details.

Challenge Status: ✅ Active
Leaderboard: Live & Auto-updating
Submissions: Open via GitHub PRs
Last Updated: January 7, 2026

Good luck! 🚀 We look forward to your submissions!

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
data		data
images		images
organizer_scripts		organizer_scripts
starter_code		starter_code
submissions		submissions
.DS_Store		.DS_Store
.gitignore		.gitignore
CHECKLIST.md		CHECKLIST.md
CONTRIBUTING.md		CONTRIBUTING.md
INDEX.md		INDEX.md
LICENSE		LICENSE
README.md		README.md
SETUP_COMPLETE.md		SETUP_COMPLETE.md
SUBMISSION_SETUP.md		SUBMISSION_SETUP.md
evaluate_predictions.py		evaluate_predictions.py
leaderboard.md		leaderboard.md
scoring_script.py		scoring_script.py
test_submission_infrastructure.py		test_submission_infrastructure.py
update_leaderboard.py		update_leaderboard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 GNN Challenge: cfRNA → Placenta Inductive GNN for Maternal-Fetal Health Prediction

Scientific Focus

Alignment with BASIRA Lab's Mission

Inspiration from GNN Literature

Dataset Source and Description

Source

Datasets Used

Training and Test Data

Purpose and Integration Goal

Dataset Construction and Preprocessing (`build_dataset.ipynb`) and Kaggle

Advanced GNN Implementation (`advanced_GNN_model.py`)

📝 Citation

📄 License

About

Uh oh!

Releases

Packages

Languages

License

Mubarraqqq/gnn-challenge

Folders and files

Latest commit

History

Repository files navigation

🧬 GNN Challenge: cfRNA → Placenta Inductive GNN for Maternal-Fetal Health Prediction

Scientific Focus

Alignment with BASIRA Lab's Mission

Inspiration from GNN Literature

Dataset Source and Description

Source

Datasets Used

Training and Test Data

Purpose and Integration Goal

Dataset Construction and Preprocessing (build_dataset.ipynb) and Kaggle

Advanced GNN Implementation (advanced_GNN_model.py)

📝 Citation

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Dataset Construction and Preprocessing (`build_dataset.ipynb`) and Kaggle

Advanced GNN Implementation (`advanced_GNN_model.py`)

Packages