Skip to content

Conversation

@charles-cowart
Copy link
Collaborator

Initial query and code for shotgun prep info file generation. My assumption is that we will be tweaking the columns, format of the output, etc. and getting approval from Gail, etc. before merging.

@charles-cowart
Copy link
Collaborator Author

Partial implementation of final output using DataFrames instead of CSV for munging. Will pick it up in the morning.

Charles Cowart added 9 commits May 9, 2019 13:17
Work in Progress
Partial implementation of munging using DataFrame.
All columns are present and filled out. There are some partial null
columns where rows represent control data; this will be resolved
in a later push.
@wasade
Copy link
Member

wasade commented May 9, 2019

@charles-cowart, note I think this needs to be rebased

@charles-cowart
Copy link
Collaborator Author

Sample output from the generate_metagenomics_prep_info_file() method. This is currently being used as the expected result for testing.

shotgun_prep_info_sample_output.txt

@wasade wasade mentioned this pull request May 10, 2019

for d in results:
# assert that study_id and orig_name are only None when both
# are None; making this a control entry and not a sample entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem is that this isn't necessarily true--at least not currently. We definitely DO have real, experimental samples coming through and getting plated that happen NOT to be in qiita (which is all that these two pieces of being None indicates). What is this field being used for? Should we name it more correct, like "is_not_in_qiita" or something? I wouldn't want to use it for anything that required that its contents really, truly be controls ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason this line works, at least for the test data, is that the raw results from the SQL query can be organized into two groups. The first group is 288 in number, looks like sample data, and has no missing values.

The second group number 92 and uniformly have NULL values for the following five columns:
['orig_name', 'study_id', 'sample_id', 'project_name', 'experiment_design_description']

Hence, rows where both study_id and 'orig_name' are NULL matches all of the second set and none of the first.

It appears I arrived at this by negating the condition we use to test for a 'sample row' in Amplicon prep info file generation:
if result['orig_name2'] is not None and study_id is not None:
Note that orig_name2 is the same as orig_name at this point in the code, and both orig_name and study_id have not been modified since coming off the DB. However, I got it wrong; the negation should be if orig_name is None OR study_id is None.

With respect to the test data, it's amusing that both statements will work equally well in this particular case. IMHO, we could have an is_control column if determining which row represents a control is potentially fuzzy. I could also determine what is a control (or at least not a sample) by parsing the content column, and targeting rows with content starting with 'vibrio' or 'blank' or another word in the table.

I'll leave this one alone until @AmandaBirmingham comments, and then I can make the change easily enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @charles-cowart

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please
(a) take the creation of the is_control field out. It is not clearly named and thus has the potential to lead to gnarly bugs down the road when someone inaccurately assumes it is what it says it is (especially since our existing test conditions don't exercise the ways in which it secretly isn't what it says it is). Since it is currently used in only one place, we have the option to just drop it until we come up with a better naming/handling of this weird "maybe controls but maybe just unloved experimental samples" situation.
(b) replace the current use of is_control to decide when to modify orig_name2 with the check we use for that in generate_amplicon_prep_info:

if result['orig_name2'] is not None and study_id is not None:

As I recall, it took you and me about a week of back-and-forth to get this condition right, and so I don't want to mess with it again (at least not till we do a fuller refactor someday).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too true! 😆

Fair enough, although it seems like we could really benefit from easily identifying states. It is something I hope we can get back to at some point.

Copy link
Collaborator

@AmandaBirmingham AmandaBirmingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking really good! I think the last noteworthy thing to put in is the small front-end change. Right now, users only see the option to download prep sheets for amplicon sequencing runs:

Screen Shot 2019-05-09 at 8 30 33 PM

This limitation is implemented here (look, we even remembered to reference this issue :D )

https://github.com/jdereus/labman/blob/7bb9106555143bb4877c49516e92c792ea9349a5/labcontrol/gui/templates/sequence_run_list.html#L27-L35

I suggest that we remove the if, replacing the entire block referenced above with

        var preparationSheets = "<a href='/process/sequencing/" +
            row[sequencing_process_id] +
            "/preparation_sheets' class='btn btn-success'>" +
            "<span class='glyphicon glyphicon-download'></span> " +
            "Download Preparation Sheets</a>";

@charles-cowart , can you make this change in your PR?

Note: this change this means that IF in the future we add a new kind of assay (like the BS new type of assay shown in the screenshot, which is of the test database :) AND we fail to implement support for it in SequencingProcess.generate_prep_information, THEN users will see a "Download Preparation Sheets" button for sequencing runs of that new assay type, but if they click the button they will get an error message (because SequencingProcess.generate_prep_information throws a ValueError for any unsupported assay type). However, there would be no possibility of generating garbage output (just an ugly error) and we really should be supporting prep sheets going forward for every new assay type, so I'm ok with this minimal risk.

@charles-cowart
Copy link
Collaborator Author

@AmandaBirmingham Thanks! Will do!

Added additional requested changes, as well as a few statements to write
both raw and final output to disk for debugging purposes. Will remove
from final version. Note that this comment in this issue needs to be
resolved to everyone's satisfaction:

biocore#502 (comment)
@wasade
Copy link
Member

wasade commented May 10, 2019

I'm 👍 pending the reply / edit to this comment

@charles-cowart
Copy link
Collaborator Author

@AmandaBirmingham @wasade Thanks for all your comments! I put the most recent request changes in, and it's currently going through Travis...

@AmandaBirmingham
Copy link
Collaborator

AmandaBirmingham commented May 10, 2019

This looks good to me! @charles-cowart can I just confirm that you've run LabControl locally and verified that you can now see and successfully use the "Download Preparation Sheets" button for the an existing shotgun sequencing run?

@charles-cowart
Copy link
Collaborator Author

Thanks @AmandaBirmingham ! Yup, downloads as expected. :)

@wasade wasade merged commit 8dda144 into biocore:master May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants