Skip to content

Variables in the training data missing in newdata #1

Open
@karamveerverma37

Description

Hi,
I am trying to run reptile on pre-trained model mm_model_coreMarks.reptile using methylation data. Is there any issue with bw generation, I have methylation base call bed files containing chr no, start, end, methylation rate. I convereted it into bw file using the following commands:
awk '{printf "%s\t%d\t%d\t%2.3f\n" , $1,$2,$3,$4}' myBed.bed > myFile.bedgraph
sort -k1,1 -k2,2n myFile.bedgraph > myFile_sorted.bedgraph
bedGraphToBigWig myFile_sorted.bedgraph myChrom.sizes myBigWig.bw

I tried alone Meth epimark as well as all four H3K4me1 etc given for mm_model_coreMarks.reptile model. The output of REPTILE_preprocess.py is preprocessed.region_with_epimark.tsv file and look like this:
chr start end id Meth_E4 H3K4me1_E4 H3K4me3_E4 H3K27ac_E4
chr1 0 2000 bin_0 0.0 0.0 0.0 0.0
chr1 100 2100 bin_1 0.0 0.0 0.0 0.0
chr1 200 2200 bin_2 0.0 0.0 0.0 0.0
chr1 300 2300 bin_3 0.0 0.0 0.0 0.0
chr1 400 2400 bin_4 0.0 0.0 0.0 0.0
chr1 500 2500 bin_5 0.0 0.0 0.0 0.0
chr1 600 2600 bin_6 0.0 0.0 0.0 0.0
chr1 700 2700 bin_7 0.0 0.0 0.0 0.0
chr1 800 2800 bin_8 0.0 0.0 0.0 0.0
chr1 900 2900 bin_9 0.0 0.0 0.0 0.0
chr1 1000 3000 bin_10 0.0 0.0 0.0 0.0
.
.
chr1 3211200 3213200 bin_32112 5.0 5.0 5.0 5.0
chr1 3211300 3213300 bin_32113 5.0 5.0 5.0 5.0
chr1 3211400 3213400 bin_32114 5.0 5.0 5.0 5.0
chr1 3211500 3213500 bin_32115 4.0 4.0 4.0 4.0
chr1 3211600 3213600 bin_32116 3.3 3.3 3.3 3.3
chr1 3211700 3213700 bin_32117 2.54545 2.54545 2.54545 2.54545
chr1 3211800 3213800 bin_32118 2.69231 2.69231 2.69231 2.69231
chr1 3211900 3213900 bin_32119 3.0 3.0 3.0 3.0
chr1 3212000 3214000 bin_32120 2.85714 2.85714 2.85714 2.85714

Now when I run the compute score command:
REPTILE_compute_score.R -i data_info_file2 -m mm_model_coreMarks.reptile -a tmp/mm39_w2kb_s100bp_preprocessed.region_with_epimark.tsv -s E4 -o tmp/E4__compute_pred

I get the following error:
Error in predict.randomForest(reptile_classifier, epimark, type = "prob") :
variables in the training data missing in newdata
Calls: reptile_predict_genome_wide ... reptile_predict_one_mode -> predict -> predict.randomForest
Execution halted
Are there any specific trained model available for only DNA methylation data to predict enhancers.
Note: I tried with both genome wide and region specific.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions