Skip to content

bnusss/inClust

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

This repository contains the official Keras implementation of:

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

Requirements

  • Python 3.6
  • conda 4.4.10
  • keras 2.2.4
  • tensorflow 1.11.0

1. Model training

  • Input
Input1: num_samples * num_genes (--inputdata)
Input2: num_samples * num_covarites (--input_covariates)
Input3: num_samples * num_cellytpe (--inputcelltype)
        supervised: one-hot vector for each sample
        semi-supervised: the num_celltype is set to be larger than the cell types in the reference dataset
                         one-hot vector for sample in the reference dataset
                         zero vector for sample in the query dataset
        unsupervised: the num_celltype is a hyperparameter,  and it is the number of cluster categories set by the user
                      zero vector for each sample
  • About this article
#Augments:
#'--inputdata', type=str, default='data/training_data/Fig3_scgen_count7000r.npz', help='address for input data')
#'--input_covariates', type=str, default='data/training_data/Fig3_scgen_study_condition.npy', help='address for covariate (e.g. batch)')
#'--inputcelltype', type=str, default='data/training_data/Fig3_scgen_cell_type.npy', help='address for celltype label')
#'--randoms', type=int, default=30, help='random number to split dataset')
#'--permute_input', type=str, default='T', help='whether permute the input')

#'--dim_latent', type=int, default=50, help='dimension of latent space')
#'--dim_intermediate', type=int, default=200, help='dimension of intermediate layer')
#'--activation', type=str, default='relu', help='activation function: relu or tanh')
#'--arithmetic', type=str, default='minus', help='arithmetic: minus or plus')

#'--batch_size', type=int, default=500, help='training parameters_batch_size')
#'--epochs', type=int, default=50, help='training parameters_epochs')

#'--training', type=str, default='T', help='training model(T) or loading model(F) ')
#'--weights', type=str, default='data/weights_and_results/Fig3_demo.weight', help='trained weights')

#'--mode', type=str, default='supervised', help='mode: supervised, semi_supervised, unsupervised, user_defined')

#'--reconstruction_loss', type=int, default=5, help='The reconstruction loss for VAE')
#'--kl_cross_loss', type=int, default=1, help='')
#'--prior_distribution_loss', type=int, default=1, help='The assumption that prior distribution is uniform distribution')
#'--label_cross_loss', type=int, default=50, help='Loss for integrating label information into the model')


For supervised mode
python inClust.py --inputdata=data/training_data/Fig3_PBMC_count7000r.npz --input_covariates=data/training_data/Fig3_PBMC_study_condition.npy --inputcelltype=data/training_data/Fig3_PBMC_cell_type.npy --mode=supervised

For semi_supervised mode
python inClust.py --inputdata=data/training_data/Fig5_heart_count.npz --input_covariates=data/training_data/Fig5_heart_batch.npy --inputcelltype=data/training_data/Fig5_heart_label_semi.npy --mode=semi_supervised --permute_input=F

For unsupervised mode
python inClust.py --inputdata=data/training_data/Fig7_10X.npy --input_covariates=data/training_data/Fig7_10X_neighbor.npy --inputcelltype=data/training_data/Fig7_10X_label.npy --mode=unsupervised  --activation=tanh --arithmetic=plus --independent_embed=F --training=F --weights=data/weights_and_results/Fig7_10X_demo.weight
  • Further Explore
Testing your own dataset
python inClust.py --inputdata=your_data --input_covariates=your_inputcelltype --inputcelltype=your_inputcelltype --mode=your_mode
  • training weight
results/training.weight

2. Analysis

  • Demo -- About this article

The following codes could generate analysis data in the main text.

For supervised mode
python inClust.py --inputdata=data/training_data/Fig3_PBMC_count7000r.npz --input_covariates=data/training_data/Fig3_PBMC_study_condition.npy --inputcelltype=data/training_data/Fig3_PBMC_cell_type.npy --mode=supervised --training=F --weights=data/weights_and_results/Fig3_PBMC_demo.weight

For semi_supervised mode
python inClust.py --inputdata=data/training_data/Fig5_heart_count.npz --input_covariates=data/training_data/Fig5_heart_batch.npy --inputcelltype=data/training_data/Fig5_heart_label_semi.npy --mode=semi_supervised --training=F --weights=data/weights_and_results/Fig5_heart_demo.weight

For unsupervised mode

python inClust.py --inputdata=data/training_data/Fig7_10X.npy --input_covariates=data/training_data/Fig7_10X_neighbor.npy --inputcelltype=data/training_data/Fig7_10X_label.npy --mode=unsupervised  --arithmetic=plus --independent_embed=F --training=F --weights=data/weights_and_results/Fig7_10X_demo.weight
  • Output

The output is in the results folder, including

  • two latent space representation
mean_vector.npy
Low_dimnesion_vector.npy
batch_vector.npy
  • predictd label for each sample
predict_labels.csv

3. reproduction figures

demo_reproducing_figures.ipynb

VAE and its variant implementation https://github.com/bojone/vae

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.5%
  • Python 0.5%