Error generating CE flat file - Empty Divergent Region Bed

The nf pipeline terminated during the `make_flat_file` process. The Rscript failed and threw this erorr 

```
  Joining, by = c("CHROM", "POS", "REF", "ALT", "ANNOTATION")
  Warning message:
  Expected 7 pieces. Missing pieces filled with `NA` in 427215 rows [3, 4, 5, 6, 67, 72, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 199, 200, 201, 356, ...]. 
  Error in `[.data.frame`(x, i, j, drop) : undefined columns selected
  Calls: %>% ... <Anonymous> -> fuzzy_join -> [ -> [.data.table -> [.data.frame
  In addition: Warning message:
  In data.table::fread(args[4], col.names = c("chrom", "start", "end")) :
    File 'div.bed' has size 0. Returning a NULL data.table.
  Execution halted
```
It looks like its happening here when we add the divergent region status for CE. This sense that this failure is CE specific. 

```
if(args[5] == "c_elegans") {
    divergent <- data.table::fread(args[4], 
                                   col.names = c("chrom", "start", "end"))
    
    #Condense Overlapping Regions (If portion of regions shared by >1 strian
    condensed <- divergent %>%
      dplyr::mutate(DIVERGENT = "D") # Add marker to Divergent Region
    
    #Join the data - if a position is within a divergent region Divergent tag is added
    
    test <- table_ready %>%
      dplyr::mutate(start = POS, end = POS)
    join <- fuzzyjoin::genome_join(test, condensed, by = c(
      "CHROM" = "chrom", 
      "start" = "start", 
      "end" = "end"), 
      mode = "left") %>%
      dplyr::select(-start.x, -start.y, -end.x, -end.y, -chrom)
    
    # join <- fuzzyjoin::genome_join(table_ready, condensed,  by = c(
    #   "CHROM" = "chrom",
    #   "POS" = "start",
    #   "POS" = "end"),
    #   mode = "left") %>%
    #   select(-chrom, -start, -end)
} else {
    join <- table_ready %>%
        dplyr::mutate(DIVERGENT = NA)
}
```
It seems that we are just tring to join a NULL data.table (whatever that is) with our annotation and that just doens't work. This is probably a problem with the channel-inputs.

Input tuple for `make_flat_file`:
```
tuple file(bcsq_samples_parsed), file(div_file), file(bcsq_scores), file(wbgene_names)
```

Current channel wrangling

```
bcsq_parse_samples.out
.combine(bcsq_parse_scores.out)
.combine(Channel.fromPath("${reference_dir}/csq/${params.species}.gff")) | make_flat_file
```

output for `bcsq_parse_samples` 
```
output:
    tuple file("BCSQ_samples_parsed.tsv"), file("div.bed")
```
output for `bcsq_parse_scores`
```
output:
    path "BCSQ_scores_parsed.tsv"
```
Our current input tuple should be: 
```
file("BCSQ_samples_parsed.tsv"), file("div.bed") path "BCSQ_scores_parsed.tsv" Channel.fromPath("${reference_dir}/csq/${params.species}.gff"
```
So it looks like the input cardinality matches. 

The other potential source of the issue is that we are not loading the divergent region file. Lets check if it was the correct file in the `bcsq_parse_samples` process....

Its also blank. Well thats because I didn't specify one. This is a bug that we should fix. The `make_flat_file` script automatically tries to join the divergent region bed, even if the file is blank when the species ID is C. elegans. We should just have the script check if it can load the data or not. If not then print an erorr message and append NA for the divergent column.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error generating CE flat file - Empty Divergent Region Bed #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error generating CE flat file - Empty Divergent Region Bed #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions