omicsGAN is the generative adversarial network based framework that can integrate two omics data along with their interaction network to generate one synthetic data corresponding to each omics profile that can result in a better phenotype prediction.
- Numpy (>=1.17.2)
- Pandas (>=0.25.1)
- sklearn (>=0.21.3)
- PyTorch (pytorch version >=1.5.0, torchvision version >=0.6.0)
Sample datasets for breast cancer phenotype prediction are available below.
mRNA expression: https://drive.google.com/file/d/1u-tmptVnm9yAjYGiby1FWAIsire3g_QF/view?usp=sharing
miRNA expression: https://drive.google.com/file/d/18c2efgsuYm2GZu9XxqpwrnGqZLvcFXIB/view?usp=sharing
interaction network: https://drive.google.com/file/d/13AssxLZQdta4O-9bQhHaSgSuaslnJceO/view?usp=sharing
label data: https://drive.google.com/file/d/10SWmhoRVb_8sIw2JGeSHorHJiMMVy7n_/view?usp=sharing
omicsGAN.py Users only need to run this code for generating synthetic data through omicsGAN using command line arguments mentioned below.
omics1.py Called from omicsGAN.py and updates the first omics data
omics2.py Called from omicsGAN.py and updates the second omics data
Users need to download all data necessary for a cancer analysis into the same folder as the three codes. Updated omics datasets will be saved in the same folder as well.
Input data All input data is in csv format.
omics data: Omics datasets should be in feature by sample format with first column being the names of the features and first row being names of the samples. Example images of omics data are attached below.
Interaction network: Interaction netowrk should be in first omics data by second omics data format. First column should be the feature names of first omics data and first row is the feature names of second omics data.
Label: Label data should be a column vector with each row corresponding to a sample. The classifier is designed for binary classification only. For multi-class classification, SVM can be modified accordingly.
Command
omicsGAN.py, total number of update(K), first omics dataset, second omics dataset, interaction network, label
For example, to generate synthtic mRNA and miRNA expression using our provided dataset, users have to use the following command
Sample command: omicsGAN.py 5 mRNA.csv miRNA.csv bipartite_targetscan_gene.csv label.csv
For any concern or further assistance, contact t.ahmed@knights.ucf.edu