Make note of places where certain column/etc. names can't be used #55

fedarko · 2019-02-23T00:36:19Z

Summarizing this stuff

The user can't have sample metadata fields with the following names:

Sample ID (except for the leftmost column, which we use as the index, of course)
qurro_balance

Also, there can't be feature rankings (e.g. "Intercept", "Rank 1") with the following names:

Feature ID (except for leftmost column)
qurro_classification
qurro_x
LibraryID (if -gnps passed)
anything in the feature metadata (now that More sophisticated feature metadata annotation #132 is done)

And there can't be feature metadata fields with the following names:

Feature ID (except for leftmost column)
qurro_classification
qurro_x
anything in the feature rankings
qurro_trunc_feature_id (if -gnps passed)
qurro_full_feature_id (if -gnps passed)

Looks like, when we call reset_index() on the sample metadata and rank data, an error will be thrown if a column with that name already exists. So Feature ID and Sample ID should be ok -- these cases won't fail silently (although it's a good idea to verify this in a test).

Also, my code checks for collisions w/r/t qurro_balance in the sample metadata, and between feature metadata/ranking field names (this is tested in test_df_utils.test_merge_feature_metadata!). So the main thing left to worry about is Classification not being overwritten, and then we should add in some tests that check ideally all of these cases to verify an error is thrown.

Potentially prefix Classification with qurro_, like I did for balance.
Check if Classification will be overwritten
Add unit tests to verify that this sorta stuff will result in errors thrown
Make note of these naming restrictions in the README

The text was updated successfully, but these errors were encountered:

fedarko · 2019-02-23T02:14:26Z

~~This also applies when reset_index() is invoked, in the case that index is an extant column in the DataFrame for which this is called?~~ Update: yeah, but we rename the index first so this isn't a problem.

This should get things ready for #31 nicely. The code now has a good comment explaining it, but the tldr is that now we only associate metadata with the sample dataset. Feature counts and their column names are stored in another dataset within the sample plot JSON (and these dataset names have a prefix that should prevent them somehow being overwritten -- #55). Also! The sample plot now is just referred to as sample_plot or sample_chart, instead of sample_logratio_plot or sample_logratio_chart. the initial reason I did it the long way at first was to prevent confusion (e.g. to make clear that "no this isn't just a sample Vega plot I found somewhere, this is a plot of samples"), but I think that it's definitely clear by this point. ALSO! The sample ID is now referred to as just "Sample ID" (since we're only applying column mapping to the feature columns). This means that #23 should be a lot closer: now the tooltip will say something like "Sample ID: 333" or something instead of just "0: 333". I'd still like to make the headers in the rank plot clearer, but maybe that's for another day. In order to restore the prior available functionality (since the web interface is going to be broken right now due to not knowing where the counts are), I need to alter the web interface to: -Properly relate features from the rank plot to the sample plot's JSON's data -use above to fix feature searching -use above to fix select microbes (features) stuff And in order to complete #31, I need to alter the web visualization interface so that it: -Can detect all of the metadata columns (just look at the entries of a sample point) -Has controls for changing the x-axis metadata column -Has controls for changing the color metadata column

fedarko · 2019-02-25T21:36:16Z

Similarly, if the user happens to have ranks in their input called "x" or "Classification" or "Feature ID", this will break.

This might require some tweaking regarding implications for #55. Also the code is a bit ugly. Anyway, I just gotta adjust the viz interface to handle this stuff properly: -detect all given ranks -use signal to set up a bound input with options being all the given rank values, and set this signal to the y-axis field (and title? Maybe?) -if not already done: on signal change (we can use view.addSignalListener() or whatever), re-sort the feature ranks in ascending order. This could be as simple as just adjusting each feature's x value via change(). Once that's properly hammered out, #46 will be done -- there's a lot less complexity associated with that than there is with #31.

This helped me realize we were leaving NaNs in feature ranking/metadata DFs (see prev commit for context), and also now we're officially testing the corner case of non-unique ranking and fm ID cols (#55).

fedarko · 2019-07-07T04:21:09Z

note for later -- there's also some stuff in the read_gnps_feature_metadata_file function that requires certain column name(s)? to not be present in the metadata file. that may or may not be temporary.

This allows users to have "Classification" fields in their feature rankings / metadata. I imagine that's a lot more likely than "qurro_classification" fields being there :) Next up: checking before inserting this column

Related to #55. I guess the best thing to do would be to do this sorta checking up front, honestly; probably in process_input(). That'd have the extra advantage of being more easy to test.

Previously, when the Classification field was literally just called "Classification", we didn't need to do anything with the title. However, now it makes sense to use a more human-readable title than "qurro_classification" (#55).

We can unit-test this function, which will be sufficient to be done with #55!

We're not done here yet. Things to consider doing before closing this issue: - update readme re these restrictions (not super important since an explanatory error will be given up front if the user supplies inputs with these invalid names -- also, i imagine most/all of these inputs are uncommon) - check feature metadata to make sure it doesn't contain weird stuff for our GNPS-reading code? Honestly, not sure this is possible to do at this point in the code; we might need to modify read_gnps_feature_metadata_file() to do these checks (and that code is old and probably-outdated anyway).

in particular, about the now-required cols (the restricted cols were already implied in the #55 description)

fedarko added enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state labels Feb 23, 2019

fedarko self-assigned this Feb 23, 2019

fedarko added a commit that referenced this issue Feb 23, 2019

Use prefix for balance col & col_names dataset #55

adb4a1d

fedarko added a commit that referenced this issue May 8, 2019

MAINT: prefix rank "x" field w/ rankratioviz_ #55

dad104b

fedarko added this to the v0.2.0 milestone Jun 28, 2019

fedarko added a commit that referenced this issue Jul 8, 2019

MAINT: Add check_column_names() for #55 checking

dfde93c

We can unit-test this function, which will be sufficient to be done with #55!

fedarko added a commit that referenced this issue Jul 8, 2019

MAINT: Slightly alter messages for #55-type errors

ef21573

fedarko closed this as completed in 9dc36e7 Jul 8, 2019

fedarko added a commit that referenced this issue Jul 8, 2019

REL: update changelog re: GNPS stuff

b0031af

in particular, about the now-required cols (the restricted cols were already implied in the #55 description)

fedarko mentioned this issue Jul 10, 2019

qurro_x isn't checked for in the feature ranks/metadata restricted fields #183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make note of places where certain column/etc. names can't be used #55

Make note of places where certain column/etc. names can't be used #55

fedarko commented Feb 23, 2019 •

edited

Loading

fedarko commented Feb 23, 2019 •

edited

Loading

fedarko commented Feb 25, 2019

fedarko commented Jul 7, 2019

Make note of places where certain column/etc. names can't be used #55

Make note of places where certain column/etc. names can't be used #55

Comments

fedarko commented Feb 23, 2019 • edited Loading

Summarizing this stuff

fedarko commented Feb 23, 2019 • edited Loading

fedarko commented Feb 25, 2019

fedarko commented Jul 7, 2019

fedarko commented Feb 23, 2019 •

edited

Loading

fedarko commented Feb 23, 2019 •

edited

Loading