Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] sqanti3_filter.py is not filtering according to my custom rules #322

Open
2 tasks done
dudududu12138 opened this issue Sep 3, 2024 · 1 comment
Open
2 tasks done
Labels
rules filter SQ3 rules filter related issues

Comments

@dudududu12138
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Have you loaded the SQANTI3.env conda environment?

  • I have loaded the SQANTI3.env conda environment

Problem description

Hi, thanks for your useful tool. But I met a wired problem recently. I ran sqanti3_qc.py firstly and then ran sqanti3_filter.py to filter out transcripts that don't match my rules.
Below is my custom filtering rules:

{
    "full-splice_match": [
        {
            "perc_A_downstream_TTS":[0,59]
        }
    ],
    "rest":[
        {
            "perc_A_downstream_TTS":[0,59],
            "RTS_stage":"FALSE",
            "all_canonical":"canonical",
            "predicted_NMD":"FALSE"
        },
        {
            "perc_A_downstream_TTS":[0,59],
            "RTS_stage":"FALSE",
            "all_canonical":"canonical",
            "predicted_NMD":"NA"
        },
        {
            "perc_A_downstream_TTS":[0,59],
            "RTS_stage":"FALSE",
            "all_canonical":"NA",
            "predicted_NMD":"FALSE"
        },
        {
            "perc_A_downstream_TTS":[0,59],
            "RTS_stage":"FALSE",
            "all_canonical":"NA",
            "predicted_NMD":"NA"
        }           
    ]
}

Code sample

This is my code:

sqanti3_filter.py rules ${input}_classification.txt \
 -j filtering.json \
 --isoforms ${input}_corrected.fasta    \
 --gtf ${input}_corrected.gtf \
 --faa ${input}_corrected.faa \
 --skip_report \
 -d $output -o syfilter 

Error

Some transcripts fulfil my criteria and should be recognised as isoform, but are filtered out by sqanti3. Here is an example:
This isoform was filtered because of out all_canonical.
1725346759126
But I checked classification.txt and junxtions.txt, all junctions in this isoform are canonical.
Below are the information of this isoform in the two files:

  • classification.txt:
    1725346961176

  • junctions.txt:
    1725347133908

So why did such isoforms be filtered out?

Anything else?

No response

@dudududu12138 dudududu12138 added the triage For developers to check label Sep 3, 2024
@dudududu12138 dudududu12138 changed the title [BUG] sqanti3_filter.py doesn't match my filtering rules [BUG] sqanti3_filter.py is not filtering according to my custom rules Sep 3, 2024
@dudududu12138
Copy link
Author

dudududu12138 commented Sep 4, 2024

Hi, I have checked your source code utilities/filter/rules_filter_functions.R and I found the reason. You checked whether the column of the isoform_info was NA firstly (if (! is.na(isoform_info[rules[i, "column"]]))). If it was NA, it was classified as Artifact. But in my custom rules, I defined some NA rules. Such as the all_canonical column could be NA so that the mono-exon isoform can be checked. Also the predicted-NMD column. I think your codes prefer to coding transcripts while ncRNA will be incorrectly filtered out. Below is the section of your code:

if (! is.na(isoform_info[rules[i, "column"]])){
          if (rules[i, "type"] == "Min_Threshold"){
            if (as.numeric(isoform_info[rules[i, "column"]]) < as.numeric(rules[i, "rule"])){
              is_isoform=FALSE
              break
            }
          }else if (rules[i, "type"] == "Max_Threshold"){
            if (as.numeric(isoform_info[rules[i, "column"]]) > as.numeric(rules[i, "rule"])){
              is_isoform=FALSE
              break
            }
          }else if (rules[i, "type"] == "Category"){
            cat_rules <- rules[rules$column == rules[i, "column"], ]
            if ( ! tolower(isoform_info[rules[i, "column"]]) %in% cat_rules[,"rule"]){
              is_isoform=FALSE
              break
            }
          }
        }else{
         is_isoform=FALSE
         break
        }

@carolinamonzo carolinamonzo added rules filter SQ3 rules filter related issues and removed triage For developers to check labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rules filter SQ3 rules filter related issues
Projects
None yet
Development

No branches or pull requests

2 participants