Skip to content

bedtools coverage fails when gff file and genome file are not sorted the same way #1037

Closed

Description

Check Documentation

I have checked the following places for your error:

Description of the bug

Bedtools coverage fails when there's a mismatch between the sorting order of the genome and the gff files.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Use the following flags in an Eager run:
--fasta "https://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa.gz" \
--anno_file "https://ftp.ensembl.org/pub/grch37/current/gff3/homo_sapiens/Homo_sapiens.GRCh37.87.gff3.gz" \
--run_bedtools_coverage
  1. See error: Please provide your error message
Error executing process > 'bedtools (AB_libmerged)'

Caused by:
  Process `bedtools (AB_libmerged)` terminated with an error exit status (1)

Command executed:

  ## Create genome file from bam header
  samtools view -H AB_udghalf_libmerged_rmdup.bam | grep '@SQ' | sed 's#@SQ	SN:\|LN:##g' > genome.txt
  
  ##  Run bedtools
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam | pigz -p 1 > "AB_udghalf_libmerged_rmdup".breadth.gz
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam -mean | pigz -p 1 > "AB_udghalf_libmerged_rmdup".depth.gz

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: Sorted input specified, but the file Homo_sapiens.GRCh37.87.gff3 has the following record with a different sort order than the genomeFile genome.txt
  GL000192.1	GRCh37	supercontig	1	547496	.	.	.	ID=supercontig:GL000192.1;Alias=NT_167207.1

Expected behaviour

I expect bedtools coverage to complete successfully.
I was able to overcome this by removing the -sorted flag and letting bedtools sort the files when running the command.

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file
  • The exact error:

System

  • Hardware: HPC
  • Executor: PBSpro
  • OS: RHEL
  • Version 7.6

Nextflow Installation

  • Version: 21.10.6 build 5660

Container engine

  • Engine: Singularity
  • version: 3.5.3
  • Image tag: nfcore/eager:2.5.0

Additional context

Fixed the problem by removing the -sorted flag from the command, see #1036
A similar change should be done to the DLS2 version of the bedtools module in line 24 of modules/nf-core/bedtools/main.nf, but I haven't tested it.

Thanks, Ido

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions