Description
I am phasing a large (n=172k) sample of parents and offspring (some duos, some trios), but I only need the phased genotypes for the offspring.
I start by running the phasing jobs (with the chromosomes in chunks) using shapeit5 with --pedigree on a HPC cluster. The initial whole-sample output files are written to a local scratch filesystem, where they will be deleted immediately after the job finishes. I then use bcftools view -S
to subset these results to just the offspring, and save that smaller results chunk on the cluster's network filesystem where files will persist past the end of the job.
After all jobs have finished running, I try to use ligate
with the --pedigree
flag on the offspring-only results chunks. Despite using --pedigree
, it detects the offspring samples as non-scaffolded, haplotype order gets swapped, and sometimes chunks from the maternal and paternal haplotypes are incorrectly combined as if they were in phase.
Is the behavior of ligate
for a file where 100% of the samples are scaffolded just the same as bcftools concat -a -d all
, or would there still be a reason to prefer ligate
? If there's still a reason to prefer ligate
, then is there a way to get it to treat offspring as scaffolded (eg. refrain from swapping haplotypes around) even when parents are no longer in the data?