-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes for future users of monorail #10
Comments
thanks @davemcg for the thorough notes on your instantiation of monorail and the input into recount3! |
how is it possible I only get read counts on the F006 gene sums files and not in all the rest of them? Can you help me, please? |
@davemcg Just need to say thank you for these notes! I was having tremendous trouble with an error like this
and your clue about:
also seems to apply to study and sample names - a sample id like |
Thanks @lonsbio If it helps, I have a standalone Snakefile which I use now. For my own purposes, so it is very much not "plug and play". But it can be adapted for personal use without a ton of trouble (if you are semi-familiar with Snakemake). |
@davemcg I went ahead and linked these thorough notes and your Snakerail repo in the main README. Please let me know if that's an issue for you. |
Current link (the dir also contains the shell scripts I used): https://github.com/davemcg/metaRPE/blob/main/recount3/README.md
Reposted with a touch of tweaks to make a bit more general:
How to run monorail pump and unify
Observations, some unfounded and poorly understand
- e.g. A1_1_R1.fastq.gz and A1_2_R2.fastq.gz seem to cause issues with
unify
aspump
uses the last two of the name (in this case _1) as a subfolder. Which seems to cause issues with some concatenation steps laterunify
will use ALL THE FOLDERS/SAMPLES in thepump
output as input forunify
. So you cannot "mix and match" custom unify output by just messing with the sample metadata file.Workflow
cd /data/mcgaugheyd/projects/nei/bharti/metaRPE
cp /home/mcgaugheyd/metaRPE/recount3/get_image.sh . ; bash get_image.sh # get both pump and unify singularity images
bash ~/git/monorail-external/get_unify_refs.sh hg38 # unify refs
bash ~/git/monorail-external/get_human_ref_indexes.sh
mkdir references; mv hg38 references/; mv hg38_unify references/ # mv pump and unify ref folders to subfolder
cp /home/mcgaugheyd/metaRPE/recount3/run_pump.sh . ;cp /home/mcgaugheyd/metaRPE/recount3/pump_commands.sh .
bash pump_commands.sh # runs all pump jobs (invoking the run_pump.sh script)
mkdir pump_output; rsync --progress -rav pump/*/output/ pump_output # consolidate the pump outputs to one directory
mkdir unify_output; mv recount-unify_1.0.9.sif unify_output; cd unify_output; cp /home/mcgaugheyd/git/metaRPE/data/recount_sample_metadata.tsv . ; cp /home/mcgaugheyd/metaRPE/recount3/run_unify.sh .
sbatch --cpus-per-task 6 --mem=32G run_unify.sh
How to munge monorail-unify output into recount3 format
Why bother? Well, the unify output is not straight counts, but rather the sum of the base pair level coverage. recount3
has tooling to do the conversion. I briefly looked into whether I could "directly" do the transform but after a
few minutes of traversing through their code, it seemed easier to just make the RangedSummerizedExperiment (RSE)
data structure by moving the unify outputs around. Plus the RSE is
a more portable and useful format should someone else want to use the data.
OK, so super briefly you have to move the unify outputs around a bit (to my local computer in my example) so recount3 can import the data and make the RSE
and then you can run (in R)
recount3::transform_counts
to get the straight counts for downstream use.Their example directory is here: http://snaptron.cs.jhu.edu/data/temp/recount3test/
Fairly complete instructions: https://github.com/langmead-lab/monorail-external#loading-custom-unifier-runs-into-recount3
My directory structure (gaps are where I've deleted lines to make a touch shorter to view):
Notes:
annotation
folder bywget
ing the files from Langmead and cowget http://duffel.rail.bio/recount3/human/new_annotations/gene_sums/human.gene_sums.G026.gtf.gz
metaRPE
is my project name. monorail/recount tend to usesra
as theirs.data_sources
folder is filled byrsync
ing various unify output folders/data/mcgaugheyd/projects/nei/bharti/metaRPE/unify_output
base_sums
is currently empty as recount3 doesn't need the bigWig files (which are in the pump outputs)exon_sums_per_study
renamed toexon_sums
gene_sums_per_study
renamed togene_sums
junction_counts_per_study
renamed tojunctions
metadata
doesn't need to be renamedmetaRPE.recount_project.MD.gz
zcat metadata/*/*/*recount_project.* | head -n 1 | gzip > metadata/metaRPE.recount_project.MD.gz # this grabs just the header
zcat metadata/*/*/*recount_project.* | grep -v rail_id | gzip >> metadata/metaRPE.recount_project.MD.gz # copy the rest of the meta sans the headers
home_index
file is just a text file withdata_sources/metaRPE
in itmetaRPE
with whatever your "project" name is (again,sra
is commonly used by the monorail/recount team)The text was updated successfully, but these errors were encountered: