-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make note of places where certain column/etc. names can't be used #55
Comments
|
This should get things ready for #31 nicely. The code now has a good comment explaining it, but the tldr is that now we only associate metadata with the sample dataset. Feature counts and their column names are stored in another dataset within the sample plot JSON (and these dataset names have a prefix that should prevent them somehow being overwritten -- #55). Also! The sample plot now is just referred to as sample_plot or sample_chart, instead of sample_logratio_plot or sample_logratio_chart. the initial reason I did it the long way at first was to prevent confusion (e.g. to make clear that "no this isn't just a sample Vega plot I found somewhere, this is a plot of samples"), but I think that it's definitely clear by this point. ALSO! The sample ID is now referred to as just "Sample ID" (since we're only applying column mapping to the feature columns). This means that #23 should be a lot closer: now the tooltip will say something like "Sample ID: 333" or something instead of just "0: 333". I'd still like to make the headers in the rank plot clearer, but maybe that's for another day. In order to restore the prior available functionality (since the web interface is going to be broken right now due to not knowing where the counts are), I need to alter the web interface to: -Properly relate features from the rank plot to the sample plot's JSON's data -use above to fix feature searching -use above to fix select microbes (features) stuff And in order to complete #31, I need to alter the web visualization interface so that it: -Can detect all of the metadata columns (just look at the entries of a sample point) -Has controls for changing the x-axis metadata column -Has controls for changing the color metadata column
Similarly, if the user happens to have ranks in their input called "x" or "Classification" or "Feature ID", this will break. |
This might require some tweaking regarding implications for #55. Also the code is a bit ugly. Anyway, I just gotta adjust the viz interface to handle this stuff properly: -detect all given ranks -use signal to set up a bound input with options being all the given rank values, and set this signal to the y-axis field (and title? Maybe?) -if not already done: on signal change (we can use view.addSignalListener() or whatever), re-sort the feature ranks in ascending order. This could be as simple as just adjusting each feature's x value via change(). Once that's properly hammered out, #46 will be done -- there's a lot less complexity associated with that than there is with #31.
This helped me realize we were leaving NaNs in feature ranking/metadata DFs (see prev commit for context), and also now we're officially testing the corner case of non-unique ranking and fm ID cols (#55).
note for later -- there's also some stuff in the |
This allows users to have "Classification" fields in their feature rankings / metadata. I imagine that's a lot more likely than "qurro_classification" fields being there :) Next up: checking before inserting this column
Related to #55. I guess the best thing to do would be to do this sorta checking up front, honestly; probably in process_input(). That'd have the extra advantage of being more easy to test.
Previously, when the Classification field was literally just called "Classification", we didn't need to do anything with the title. However, now it makes sense to use a more human-readable title than "qurro_classification" (#55).
We can unit-test this function, which will be sufficient to be done with #55!
We're not done here yet. Things to consider doing before closing this issue: - update readme re these restrictions (not super important since an explanatory error will be given up front if the user supplies inputs with these invalid names -- also, i imagine most/all of these inputs are uncommon) - check feature metadata to make sure it doesn't contain weird stuff for our GNPS-reading code? Honestly, not sure this is possible to do at this point in the code; we might need to modify read_gnps_feature_metadata_file() to do these checks (and that code is old and probably-outdated anyway).
in particular, about the now-required cols (the restricted cols were already implied in the #55 description)
Summarizing this stuff
The user can't have sample metadata fields with the following names:
Sample ID
(except for the leftmost column, which we use as the index, of course)qurro_balance
Also, there can't be feature rankings (e.g. "Intercept", "Rank 1") with the following names:
Feature ID
(except for leftmost column)qurro_classification
qurro_x
LibraryID
(if-gnps
passed)And there can't be feature metadata fields with the following names:
Feature ID
(except for leftmost column)qurro_classification
qurro_x
qurro_trunc_feature_id
(if-gnps
passed)qurro_full_feature_id
(if-gnps
passed)Looks like, when we call
reset_index()
on the sample metadata and rank data, an error will be thrown if a column with that name already exists. SoFeature ID
andSample ID
should be ok -- these cases won't fail silently (although it's a good idea to verify this in a test).Also, my code checks for collisions w/r/t
qurro_balance
in the sample metadata, and between feature metadata/ranking field names (this is tested intest_df_utils.test_merge_feature_metadata
!). So the main thing left to worry about isClassification
not being overwritten, and then we should add in some tests that check ideally all of these cases to verify an error is thrown.Classification
withqurro_
, like I did forbalance
.Classification
will be overwrittenThe text was updated successfully, but these errors were encountered: