Gosset / Tricot - Describe. #9442
lilyclements
started this conversation in
Tricot Analysis
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@rdstern here is a full script for the Gosset vignette (1) up until the Modelling. I think what we would like for the first iteration of the menu is the ability to run this in R-Instat. I've attached the Full Script at the end in case you'd rather have it all as one. Otherwise, here it is broken up with comments throughout.
I've also amended this to use Kaue's scripts he shared with us on the trainings.
Importing and Rearranging the Data
This is all very straightforward and do-able in R-Instat. This rearrangement is relevant for this data type, but not for all tricot data shapes.
There are data formats that the tricot analyses have which require much more manipulation to get it into a consistent shape. That is not relevant for this vignette, but the shape it is rearranged to a shape which David suggested, and I strongly agree with. I'm not sure how we want to offer that rearrangement in R-Instat yet. I should discuss that with David, it might be that he sees that as something that happens automatically in the defining part of the dialog (that if you define certain columns, clearly it's in the X format, and so we can rearrange that to the Y format).
There is also the option to import from ClimMob using their
ClimMobTools
package. Kaue has shared this in script 2 of his training materials (02_fetch_and_merge_data.R
). I've written up an issue in #9400 on how we can incorporate that in, and added amendments now we have this scripts from Kaue.Defining the Data
As I said above there are data formats that the tricot analyses have which require much more manipulation to get it into a consistent shape. I will get the names of the columns in that shape if we want that rearrangement to occur at this part for it to be in a consistent format.
Currently from this vignette and from the second training script, we want to define:
For our Traits, we have a multiple receiver. I discussed this part with David, and we agreed that adding a "select" for these trait columns can be a function that automatically occurs here when defining:
Create Rankings Object
A ranking object is created for the tricots analyses. I can see two main types:
ranking
andgrouped_ranking
. This is then used in other dialogs throughout. I see this as a bit like the "Create Survival Object" for survival data. So we have a dialog where you create the rankings (or grouped_rankings) object (and we should add ranking and grouped_ranking as objects in R-Instat, like how "surv" is a survival object).I'll put a draft idea for this dialog here:
I've added the option to "Store as DF checkbox", since looking at the training materials there is an instance where they View the rankings object as a data frame. It can be accessed as just a column this way when modelling. Otherwise, it can be an object you don't see - like a key or link.
In addition, by "Traits" can we have an option for it to be either a column (e.g., the "Overall" column) or a select object (e.g., the traits select). It will return a list of rankings objects if it is a select, which we can save as a "list_rankings" object perhaps? (or list_grouped_rankings if grouped is checked)
It will return just a single ranking if it is a single column, which we can save as a "rankings" object perhaps? (or grouped_rankings if grouped is checked)
I can see four objects being returned:
(Unless we always return as a list, even if it's a list of size one. But what if it's a column. Then it's a single rnaking object. Or we have a select or multiple ranking objects, i.e., our "list")
Prepare Parts
The "trial" data is of course clean as it is one of their package data sets. I have enquired to Kaue about data used in the training (at the very least it would be nice to know what sort of data the participants come with so we can clean it). I know they've previously said that ClimMob is quite good at the cleaning stage.
From the training data scripts I can see a few dialogs which are relevant
Other dialogs, for describe, perhaps:
Describe: Correlations
This dialog gives the Kendall correlation between overall appreciation (baseline level) and the other traits in the trial. There is also an additional plot option.
Note that this is from the Gosset Vignette. These correlations are run in the third R script provided by Kaue for the training. I have tried this script with that data, and it gives the same results (because I've slightly amended the R code to fit in the tidyverse format).
To add into R-Instat, could we amend the Correlations dialog to have a third tab? This gives new tab gives the Correlation of a rankings object:
kendallTau_bootstrap
graphic if you click a checkbox for this display.kendallTau
function and returns a table.Side notes:
kendallTau_permute
which isn't covered in the vignette. I get an error with this stating that it is not supported on Windows. I will check with Kaue at a later date on this, but I assume this is not used. Update: I cannot see it in the training scripts shared by Kaue.gosset::kendallTau
. Except I'm not sure if there can be missing things in a rankings object! I should talk to Kaue and explore that a bit more.EDIT: This next bit is for modelling. I will move this over to modelling when I've opened a discussion on it!
Describe: Performance of Varieties across Traits
This is a function to visualise the performance of the different varieties across the multiple traits. The values represented in a worth map are log-worth estimates. In this, we fit a model to each of our traits, giving a list of models. But we don't use the models or look at them - like in an ANOVA how we make a linear model, but look at the descriptive side of it in the Describe menu.
This outputs a graph where we can visualise how much each variety impacts that traits ranking of "1-2-3".
If it is likely to give a lower rank, it is a darker brown colour, and if it is likely to give a higher rank it is a bluer colour. e.g., if we run percentages and look at the
look at trial_by_trait_item_rank
data frame, and look just atOverallAppreciation
, we can see that "SX" tends to often be ranked 1 out of it's three rankings (and hence is a dark blue in the plot), and INT F is ranked 1 the least times (and hence is a dark brown).I'm not sure how is best to fit this in yet. It's just a simple graph, but not correlations. It just helps show the relationship between the different varieties and traits. I haven't seen anything similar in other vignettes that it could tie in with. Could it be an option in a correlation dialog or a describe/summarise dialog. We could also offer other tabular statistics alongside. Let me know what you think.
I will look into how this fits into the Training scripts. That might provide more insights into how and where this can fit in! Especially as I can see some other functions that they look at alongside it -- it might be that it is appropriate in the modelling side.
Full Script (Gosset Vignette, for the R Code run in here)
vignette_1_start.zip
Beta Was this translation helpful? Give feedback.
All reactions