Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiQC does not recognize raw data (txt) from FastQC when in a collection #114

Open
jennaj opened this issue May 7, 2018 · 27 comments · Fixed by galaxyproject/galaxy#6300
Assignees
Labels
bug fixathon 06/18 Fixathon June 2018 functionality usegalaxy.org tool/dependency/function fix usegalaxy.org test/retest-fail failed retest

Comments

@jennaj
Copy link
Member

jennaj commented May 7, 2018

Problem: The select menu does not find the FastQC collection input. However, it will find other collections individual dataset with the txt datatype. Both tool versions impacted and was not a problem before 18.05 pre-release afaik.

Test history: https://usegalaxy.org/u/jen/h/test-history-trimmomatic-trim-galore

dataset 46 should be in the select list:

screen shot 2018-05-07 at 11 32 34 am

tool form with other txt collection data in the history: Drilling down into dataset 46 shows that the correct datatype is assigned.

screen shot 2018-05-07 at 11 37 08 am

screen shot 2018-05-07 at 11 38 05 am

@jennaj jennaj added the bug label May 7, 2018
@jennaj jennaj mentioned this issue May 11, 2018
57 tasks
@jennaj jennaj added the install usegalaxy.org tool install usegalaxy.org requested label May 16, 2018
@jennaj
Copy link
Member Author

jennaj commented May 16, 2018

Retesting

@jennaj jennaj added the test/retest-do active tests label May 16, 2018
@martenson martenson removed the install usegalaxy.org tool install usegalaxy.org requested label May 21, 2018
@jennaj jennaj added the functionality usegalaxy.org tool/dependency/function fix usegalaxy.org label May 21, 2018
@nekrut nekrut added the fixathon 06/18 Fixathon June 2018 label Jun 4, 2018
@martenson martenson self-assigned this Jun 4, 2018
@jennaj
Copy link
Member Author

jennaj commented Jun 4, 2018

Still a problem with retest. Change back tags when fixed please and I'll test again

@jennaj jennaj added test/retest-fail failed retest and removed test/retest-do active tests labels Jun 4, 2018
@martenson
Copy link
Member

You made it a list of pairs (hid 46 on first picture) instead of a list. I think multiqc has no understanding of a list of pairs.

@martenson martenson added test/retest-fail failed retest and removed test/retest-fail failed retest labels Jun 4, 2018
@blankenberg
Copy link
Member

Which seems like a collection-in-tool framework problem, because you'll want to run a set of paired end datasets through fastQC and then through multiQC.

from multiqc:

                        <param name="input" type="data" format="txt" multiple="true" label="FastQC output">
                            <validator type="expression" message="MultiQC does not accept the HTML report generated by FastQC, only the Raw Data">value is not None and value.extension != "html"</validator>
                        </param>

it should accept multiple datasets, which I'd claim that a list:paired contains. Of course, from a framework level, its a bit ambiguous on how to handle this, e.g. batch for each sub-list/pair separately, or one tool run for all data -- probably should have multiple runtime-configurable options.

fastQC works because it doesn't have multiple="true" and it allows access to batch mode.

Current work around is to flatten the collection manually into a list with collection operations, but this shouldn't really be necessary.

@jennaj
Copy link
Member Author

jennaj commented Jun 4, 2018

Also reported here: galaxyproject/galaxy#5875

@natefoo
Copy link
Member

natefoo commented Jun 5, 2018

Looks like it's fixed in galaxyproject/galaxy#6255, it just needs to be backported to 18.05. Test was updated yesterday so we should be able to test there.

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

So with that fix it's going to map over list:pairs if you have list of pairs, meaning you get one multiqc report per pair.
Most likely you still need to unpack your list of pairs in most circumstances. This is because fastqc, the software (not the wrapper), is not paired-end aware. A paired-end aware QC tool would take in a pair and produce a single dataset.

@natefoo
Copy link
Member

natefoo commented Jun 5, 2018

Assuming I did this right, it still doesn't see this collection: https://test.galaxyproject.org/u/nate/h/multiqc-input-test

@natefoo
Copy link
Member

natefoo commented Jun 5, 2018

@mvdbeek Hrm, should the collection be selectable as an input now though?

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

Yes, did you rebuild the client ?

@martenson
Copy link
Member

@mvdbeek the client builds on-deploy for Test, it should be up to date

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

The input datasets are deleted for me, so that may be an issue:
screen shot 2018-06-05 at 16 29 26

@martenson
Copy link
Member

they are not deleted for me, both before and after import... 😭

screenshot 2018-06-05 10 31 25
screenshot 2018-06-05 10 31 06

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

It just switched for me as well.

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

there were some more related fixes, though they should all be in dev

@martenson
Copy link
Member

Test is on galaxyproject/galaxy@6911153

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

Yeah, something isn't working locally either

@mvdbeek
Copy link
Member

mvdbeek commented Jun 5, 2018

Also there is a boolean parameter at the bottom that is being swallowed:
screen shot 2018-06-05 at 16 49 31

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Jun 5, 2018
This was hardcoded to work for lists, instead we pick
the type to reduce based on the most deeeply nested collection.
This addresses galaxyproject/usegalaxy-playbook#114
@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

Just re-tested at test.galaxyproject.org, still a problem, might be expected.

Test history: https://test.galaxyproject.org:/u/jenjackson/h/test-history-multiqc-at-testgalaxyproject

Tested all three collection types, none have the .txt FastQC recognized by the tool yet.

screen shot 2018-06-05 at 11 17 35 am

So not just "paired list" or "pair" are not detected by MultiQC, but also collection type "list".

screen shot 2018-06-05 at 11 22 12 am

@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

Added tags to the tests so others can better understand what those are.

I tried to drag and drop out of a collection. One worked Ok (forward), the other shows up wierd in the select list. Is that a known or just something buggy on Test? I'll see if can reproduce on org.

screen shot 2018-06-05 at 11 32 24 am

@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

At Main/org, same issues with dropped datasets, actually worse, all show up in the select with "Dropped: XXXXXX" not the dataset name with (hidden). Does this need a different ticket or is it a known? @jmchilton

test history: https://usegalaxy.org/u/jen/h/test-history-multiqc-hidden-fastqc-rawdata, dataset 48 is what I tested with dragging the collection files over into MultiQC

screen shot 2018-06-05 at 11 41 40 am

@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

Related, should probably be open again and not closed. galaxyproject/tools-iuc#1658

@martenson
Copy link
Member

simple list of raw data from fastqc works fine for me with multiqc, https://usegalaxy.org/u/martenson/h/unnamed-history-2

@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

Interesting, dataset 32 is basically the same content and doesn't work. Yours was a new list collection of merged txt output, might was produced from a list of fastq datasets that was run through Fastqc as collection list input.

Not sure what the difference is under the hood, but is a really good test/comparison. Maybe will help figure out what is going wrong.

@mvdbeek
Copy link
Member

mvdbeek commented Jun 7, 2018

Can we create one issue per issue ? I see different servers with different galaxy versions with different inputs dragged or not dragged. This is very confusing.
That regular lists can't be selected with multidata inputs (as in the case for multiqc) on dev should be fixed by galaxyproject/galaxy#6300.

Selecting pairs or list of pairs in a multidata input can be made possible, but is probably not the right thing to do as it wouldn't do what you would like it to do. The correct thing is to use a QC tool that is paired-end aware and outputs a single report per fastq pair or to unzip the collection and use a regular list.

The dragging issues are separate, and if there's no issue on the main galaxy repo we should create an issue there.

@jennaj
Copy link
Member Author

jennaj commented Jun 19, 2018

This isn't on main yet (cannot see the collection txt files). Could we leave this open in usegalaxy-playbook until the changes are implemented, please? Just in case something else comes up during integration testing.

@jennaj jennaj reopened this Jun 19, 2018
@jennaj
Copy link
Member Author

jennaj commented Jun 19, 2018

Dragging and dropping, that will need a ticket. I'll make one if someone else hasn't. I wasn't sure if it was a main, test, or galaxy problem before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixathon 06/18 Fixathon June 2018 functionality usegalaxy.org tool/dependency/function fix usegalaxy.org test/retest-fail failed retest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants