Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make note of places where certain column/etc. names can't be used #55

Closed
3 of 4 tasks
fedarko opened this issue Feb 23, 2019 · 3 comments
Closed
3 of 4 tasks

Make note of places where certain column/etc. names can't be used #55

fedarko opened this issue Feb 23, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state
Milestone

Comments

@fedarko
Copy link
Collaborator

fedarko commented Feb 23, 2019

Summarizing this stuff

The user can't have sample metadata fields with the following names:

  • Sample ID (except for the leftmost column, which we use as the index, of course)
  • qurro_balance

Also, there can't be feature rankings (e.g. "Intercept", "Rank 1") with the following names:

And there can't be feature metadata fields with the following names:

  • Feature ID (except for leftmost column)
  • qurro_classification
  • qurro_x
  • anything in the feature rankings
  • qurro_trunc_feature_id (if -gnps passed)
  • qurro_full_feature_id (if -gnps passed)

Looks like, when we call reset_index() on the sample metadata and rank data, an error will be thrown if a column with that name already exists. So Feature ID and Sample ID should be ok -- these cases won't fail silently (although it's a good idea to verify this in a test).

Also, my code checks for collisions w/r/t qurro_balance in the sample metadata, and between feature metadata/ranking field names (this is tested in test_df_utils.test_merge_feature_metadata!). So the main thing left to worry about is Classification not being overwritten, and then we should add in some tests that check ideally all of these cases to verify an error is thrown.

  • Potentially prefix Classification with qurro_, like I did for balance.
  • Check if Classification will be overwritten
  • Add unit tests to verify that this sorta stuff will result in errors thrown
  • Make note of these naming restrictions in the README
@fedarko fedarko added enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state labels Feb 23, 2019
@fedarko fedarko self-assigned this Feb 23, 2019
@fedarko
Copy link
Collaborator Author

fedarko commented Feb 23, 2019

This also applies when reset_index() is invoked, in the case that index is an extant column in the DataFrame for which this is called? Update: yeah, but we rename the index first so this isn't a problem.

fedarko added a commit that referenced this issue Feb 23, 2019
This should get things ready for #31 nicely.

The code now has a good comment explaining it, but the tldr is
that now we only associate metadata with the sample dataset. Feature
counts and their column names are stored in another dataset within
the sample plot JSON (and these dataset names have a prefix that
should prevent them somehow being overwritten -- #55).

Also! The sample plot now is just referred to as sample_plot or
sample_chart, instead of sample_logratio_plot or
sample_logratio_chart. the initial reason I did it the long way at
first was to prevent confusion (e.g. to make clear that "no this isn't
just a sample Vega plot I found somewhere, this is a plot of
samples"), but I think that it's definitely clear by this point.

ALSO! The sample ID is now referred to as just "Sample ID" (since
we're only applying column mapping to the feature columns). This means
that #23 should be a lot closer: now the tooltip will say something
like "Sample ID: 333" or something instead of just "0: 333". I'd still
like to make the headers in the rank plot clearer, but maybe that's
for another day.

In order to restore the prior available functionality (since the web
interface is going to be broken right now due to not knowing where
the counts are), I need to alter the web interface to:
  -Properly relate features from the rank plot to the sample plot's
   JSON's data
  -use above to fix feature searching
  -use above to fix select microbes (features) stuff

And in order to complete #31, I need to alter the web visualization
interface so that it:
  -Can detect all of the metadata columns (just look at the entries
   of a sample point)
  -Has controls for changing the x-axis metadata column
  -Has controls for changing the color metadata column
@fedarko
Copy link
Collaborator Author

fedarko commented Feb 25, 2019

Similarly, if the user happens to have ranks in their input called "x" or "Classification" or "Feature ID", this will break.

fedarko added a commit that referenced this issue Feb 25, 2019
This might require some tweaking regarding implications for #55.
Also the code is a bit ugly.

Anyway, I just gotta adjust the viz interface to handle this stuff
properly:

-detect all given ranks
-use signal to set up a bound input with options being all the
 given rank values, and set this signal to the y-axis field (and
 title? Maybe?)
-if not already done: on signal change (we can use
 view.addSignalListener() or whatever), re-sort the feature ranks
 in ascending order. This could be as simple as just adjusting each
 feature's x value via change().

Once that's properly hammered out, #46 will be done -- there's a lot
less complexity associated with that than there is with #31.
@fedarko fedarko added this to the v0.2.0 milestone Jun 28, 2019
fedarko added a commit that referenced this issue Jul 4, 2019
This helped me realize we were leaving NaNs in feature
ranking/metadata DFs (see prev commit for context), and also
now we're officially testing the corner case of non-unique ranking
and fm ID cols (#55).
@fedarko
Copy link
Collaborator Author

fedarko commented Jul 7, 2019

note for later -- there's also some stuff in the read_gnps_feature_metadata_file function that requires certain column name(s)? to not be present in the metadata file. that may or may not be temporary.

fedarko added a commit that referenced this issue Jul 8, 2019
This allows users to have "Classification" fields in their feature
rankings / metadata. I imagine that's a lot more likely than
"qurro_classification" fields being there :)

Next up: checking before inserting this column
fedarko added a commit that referenced this issue Jul 8, 2019
Related to #55.

I guess the best thing to do would be to do this sorta checking up
front, honestly; probably in process_input(). That'd have the extra
advantage of being more easy to test.
fedarko added a commit that referenced this issue Jul 8, 2019
Previously, when the Classification field was literally just called
"Classification", we didn't need to do anything with the title.
However, now it makes sense to use a more human-readable title than
"qurro_classification" (#55).
fedarko added a commit that referenced this issue Jul 8, 2019
We can unit-test this function, which will be sufficient to be
done with #55!
fedarko added a commit that referenced this issue Jul 8, 2019
We're not done here yet. Things to consider doing before closing this
issue:

- update readme re these restrictions (not super important since
  an explanatory error will be given up front if the user supplies
  inputs with these invalid names -- also, i imagine most/all of
  these inputs are uncommon)

- check feature metadata to make sure it doesn't contain weird stuff
  for our GNPS-reading code? Honestly, not sure this is possible to do
  at this point in the code; we might need to modify
  read_gnps_feature_metadata_file() to do these checks (and that code
  is old and probably-outdated anyway).
@fedarko fedarko closed this as completed in 9dc36e7 Jul 8, 2019
fedarko added a commit that referenced this issue Jul 8, 2019
in particular, about the now-required cols (the restricted cols
were already implied in the #55 description)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state
Projects
None yet
Development

No branches or pull requests

1 participant