Skip to content

Error generating CE flat file - Empty Divergent Region Bed #8

@mckeowr1

Description

@mckeowr1

The nf pipeline terminated during the make_flat_file process. The Rscript failed and threw this erorr

  Joining, by = c("CHROM", "POS", "REF", "ALT", "ANNOTATION")
  Warning message:
  Expected 7 pieces. Missing pieces filled with `NA` in 427215 rows [3, 4, 5, 6, 67, 72, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 199, 200, 201, 356, ...]. 
  Error in `[.data.frame`(x, i, j, drop) : undefined columns selected
  Calls: %>% ... <Anonymous> -> fuzzy_join -> [ -> [.data.table -> [.data.frame
  In addition: Warning message:
  In data.table::fread(args[4], col.names = c("chrom", "start", "end")) :
    File 'div.bed' has size 0. Returning a NULL data.table.
  Execution halted

It looks like its happening here when we add the divergent region status for CE. This sense that this failure is CE specific.

if(args[5] == "c_elegans") {
    divergent <- data.table::fread(args[4], 
                                   col.names = c("chrom", "start", "end"))
    
    #Condense Overlapping Regions (If portion of regions shared by >1 strian
    condensed <- divergent %>%
      dplyr::mutate(DIVERGENT = "D") # Add marker to Divergent Region
    
    #Join the data - if a position is within a divergent region Divergent tag is added
    
    test <- table_ready %>%
      dplyr::mutate(start = POS, end = POS)
    join <- fuzzyjoin::genome_join(test, condensed, by = c(
      "CHROM" = "chrom", 
      "start" = "start", 
      "end" = "end"), 
      mode = "left") %>%
      dplyr::select(-start.x, -start.y, -end.x, -end.y, -chrom)
    
    # join <- fuzzyjoin::genome_join(table_ready, condensed,  by = c(
    #   "CHROM" = "chrom",
    #   "POS" = "start",
    #   "POS" = "end"),
    #   mode = "left") %>%
    #   select(-chrom, -start, -end)
} else {
    join <- table_ready %>%
        dplyr::mutate(DIVERGENT = NA)
}

It seems that we are just tring to join a NULL data.table (whatever that is) with our annotation and that just doens't work. This is probably a problem with the channel-inputs.

Input tuple for make_flat_file:

tuple file(bcsq_samples_parsed), file(div_file), file(bcsq_scores), file(wbgene_names)

Current channel wrangling

bcsq_parse_samples.out
.combine(bcsq_parse_scores.out)
.combine(Channel.fromPath("${reference_dir}/csq/${params.species}.gff")) | make_flat_file

output for bcsq_parse_samples

output:
    tuple file("BCSQ_samples_parsed.tsv"), file("div.bed")

output for bcsq_parse_scores

output:
    path "BCSQ_scores_parsed.tsv"

Our current input tuple should be:

file("BCSQ_samples_parsed.tsv"), file("div.bed") path "BCSQ_scores_parsed.tsv" Channel.fromPath("${reference_dir}/csq/${params.species}.gff"

So it looks like the input cardinality matches.

The other potential source of the issue is that we are not loading the divergent region file. Lets check if it was the correct file in the bcsq_parse_samples process....

Its also blank. Well thats because I didn't specify one. This is a bug that we should fix. The make_flat_file script automatically tries to join the divergent region bed, even if the file is blank when the species ID is C. elegans. We should just have the script check if it can load the data or not. If not then print an erorr message and append NA for the divergent column.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions