Description
Hi,
I am trying to run reptile on pre-trained model mm_model_coreMarks.reptile using methylation data. Is there any issue with bw generation, I have methylation base call bed files containing chr no, start, end, methylation rate. I convereted it into bw file using the following commands:
awk '{printf "%s\t%d\t%d\t%2.3f\n" , $1,$2,$3,$4}' myBed.bed > myFile.bedgraph
sort -k1,1 -k2,2n myFile.bedgraph > myFile_sorted.bedgraph
bedGraphToBigWig myFile_sorted.bedgraph myChrom.sizes myBigWig.bw
I tried alone Meth epimark as well as all four H3K4me1 etc given for mm_model_coreMarks.reptile model. The output of REPTILE_preprocess.py is preprocessed.region_with_epimark.tsv file and look like this:
chr start end id Meth_E4 H3K4me1_E4 H3K4me3_E4 H3K27ac_E4
chr1 0 2000 bin_0 0.0 0.0 0.0 0.0
chr1 100 2100 bin_1 0.0 0.0 0.0 0.0
chr1 200 2200 bin_2 0.0 0.0 0.0 0.0
chr1 300 2300 bin_3 0.0 0.0 0.0 0.0
chr1 400 2400 bin_4 0.0 0.0 0.0 0.0
chr1 500 2500 bin_5 0.0 0.0 0.0 0.0
chr1 600 2600 bin_6 0.0 0.0 0.0 0.0
chr1 700 2700 bin_7 0.0 0.0 0.0 0.0
chr1 800 2800 bin_8 0.0 0.0 0.0 0.0
chr1 900 2900 bin_9 0.0 0.0 0.0 0.0
chr1 1000 3000 bin_10 0.0 0.0 0.0 0.0
.
.
chr1 3211200 3213200 bin_32112 5.0 5.0 5.0 5.0
chr1 3211300 3213300 bin_32113 5.0 5.0 5.0 5.0
chr1 3211400 3213400 bin_32114 5.0 5.0 5.0 5.0
chr1 3211500 3213500 bin_32115 4.0 4.0 4.0 4.0
chr1 3211600 3213600 bin_32116 3.3 3.3 3.3 3.3
chr1 3211700 3213700 bin_32117 2.54545 2.54545 2.54545 2.54545
chr1 3211800 3213800 bin_32118 2.69231 2.69231 2.69231 2.69231
chr1 3211900 3213900 bin_32119 3.0 3.0 3.0 3.0
chr1 3212000 3214000 bin_32120 2.85714 2.85714 2.85714 2.85714
Now when I run the compute score command:
REPTILE_compute_score.R -i data_info_file2 -m mm_model_coreMarks.reptile -a tmp/mm39_w2kb_s100bp_preprocessed.region_with_epimark.tsv -s E4 -o tmp/E4__compute_pred
I get the following error:
Error in predict.randomForest(reptile_classifier, epimark, type = "prob") :
variables in the training data missing in newdata
Calls: reptile_predict_genome_wide ... reptile_predict_one_mode -> predict -> predict.randomForest
Execution halted
Are there any specific trained model available for only DNA methylation data to predict enhancers.
Note: I tried with both genome wide and region specific.