Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the search feature field selectors, indicate which fields are metadata and which are rankings #191

Closed
fedarko opened this issue Jul 15, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@fedarko
Copy link
Collaborator

fedarko commented Jul 15, 2019

will help alleviate confusion, esp. with songbird ranking names that are just metadata fields (I don't want people thinking "oh I'm going to select all features with a pH > 5," not that that really makes much sense... it should be clear instead that "oh I'm going to select all samples with a differential for the pH field > 5").

@fedarko fedarko added the enhancement New feature or request label Jul 15, 2019
@fedarko fedarko self-assigned this Jul 15, 2019
@fedarko fedarko added this to the v0.3.0 milestone Jul 15, 2019
fedarko added a commit to fedarko/qurro that referenced this issue Jul 16, 2019
Just covering a silly corner case for the new functionality of
populateSelect() now that biocore#191 is done.

Happy with how this commit's code works -- should really change many
of the js tests to check the actual error messages thrown; chai makes
testing this super easy
@mortonjt
Copy link

mortonjt commented Jul 19, 2019

Something that could help with exploration is to enable searching by ranks in the Selecting Features window.

image

So if the user wants to focus on the top 10 species and the bottom 10 species, they should be able to do this via something like filtering by rank as follows (the actual querying terminology will need some work).

<10 or -10>

This could be applied to also filter by log fold change (the y axis / the actual coefficients)

On another note, the y axis in the rank plot should log-fold change, not Rank: Axis 1

image

The xaxis is the ranks - but sorted features is also fine.

@fedarko
Copy link
Collaborator Author

fedarko commented Jul 19, 2019

Thanks @mortonjt! Let me make sure we're on the same page here.

re: New Features

First off, have you tried out the newest version of Qurro yet (in the open PR I have, which closes this issue)? This supports searching by the rank plot's y-axis values -- so not the actual ranks yet (i.e. x = 1, x = 2, x = 3, ...), but you can at least do something like log(all features with a given coefficient > y0 / all features with a given coefficient < y1). I'd like to add this sort of relative searching in the future, though (which is why the linked PR doesn't fully address #97).

Here's a screenshot of what the searching interface looks like in Qurro v0.3.0 --

Screen Shot 2019-07-19 at 10 39 41 AM

In the future I'll probably add in something like Currently Selected Rank as a search option right below Feature ID, and then users can just do normal numeric searching on that (i.e. less than or equal to 10 to select the top 10 lowest features).

So I think we're both in agreement there. The other thing you've brought up is the terminology—and if anything I'm saying in the application is off/incorrect, I definitely want to fix that!

re: Terminology Stuff

Rank plot y-axis

In the earliest earliest version of "rankratioviz" :), I think the y-axis said something to the effect of "log-fold change". I ended up changing this title to make it clear that the actual y-values of each feature in the rank plot change when you select a different ranking. So, right now, this says something like Rank: ASDF where ASDF is just whatever the ID is of the currently selected ranking. (As mentioned in the first comment in this issue, I don't want to just display the ranking name by itself, since I don't want users getting confused and thinking that "oh this indicates the level of pH for each feature somehow")

  • This makes a bit more sense with differentials (where the ID of the ranking usually involves sample metadata somehow), since you'd get something like Rank: Temperature for the red sea dataset.
  • For DEICODE output (where the rankings are just feature loadings from the biplot), I used to just default to integer IDs for each PC of the loadings (so this would be something like Rank: 0, Rank: 1, etc.) I changed this a couple weeks back so that loadings are labelled Axis 1, Axis 2, etc. (similar to Emperor's naming conventions).
  • Since Qurro can accept arbitrary feature loadings/differentials as the input, I'm hesitant to just say "log-fold change" for every rank plot y-axis—if there's another tool that could ostensibly generate differentials in an idiosyncratic way, I'd still like to be able to render those differentials even if they don't necessarily correspond to log-fold changes.
  • one possible solution: instead of saying Rank: ASDF, say Loading: ASDF or Differential: ASDF depending on whatever the format of the input ranks was?

Rank plot x-axis

I guess I could make this say something like Ranked Features by ASDF, where again ASDF is the current ranking's ID. But it'd probably make sense to also update the rank plot title so it isn't super redundant.

tl;dr

Let me know what you think of this stuff, or if I'm looking at this the wrong way. Happy to adjust plans/change around wordings to make this more useful + correct.

@mortonjt
Copy link

ok - that is very cool, filtering by the coefficients should address the original issue

Regarding the rank plot y-axis, maybe we can have a note to say that this is just the default?
I understand your concern. For the original applications, the inputs are in units of log-fold change (i.e. DEICODE / songbird / aldex2 / stray / DEseq2, ...).

That being said, there are definitely exceptions to this rule. Both mmvec and LDA will not return log-fold units, and there are probably many others.

How about this - maybe we can change it to something like (<covariate> magnitude)? That way it'll be somewhat flexible. Because rank on the y-axis is a bit misleading.

In the meantime - we could also add in Before/After screenshots of how to change the title and y axis in the Vega editor. These screenshots could even be added to the wiki to show how plots can be customized.

@fedarko
Copy link
Collaborator Author

fedarko commented Jul 19, 2019

(sorry for taking a while to respond -- just got lunch)

Understood. It's good to know that for most of these tools, differentials/loadings are in log-fold change.

I like the suggestion of using Magnitude. Should be pretty trivial to make that change, at least for the y-axis title.

As a semirelated question: is saying "Ranking" (not "Rank") in terms of the differential/loading fields in general ok? Most of Qurro's code and docs refer to the differential/loading fields (i.e. columns in the Songbird differentials.tsv) as "Feature Rankings". Can adjust this if it's not the ideal verbiage (better now than later :)

@fedarko
Copy link
Collaborator Author

fedarko commented Jul 19, 2019

Update: just pushed a commit to the PR (140a57e) -- this changes the y-axis to now say e.g. Magnitude: Axis 1 or Magnitude: Temperature instead of Rank: Axis 1 or Rank: Temperature. Thanks Jamie!

@fedarko fedarko closed this as completed in b5349ba Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants