Modelling and data in Tricot #9475
Replies: 3 comments 1 reply
-
Now to compare our three models:
In addition: |
Beta Was this translation helpful? Give feedback.
-
I am working on a summary on tricot models here |
Beta Was this translation helpful? Give feedback.
-
@lilyclements I comment on the script and data in the current day 3 script.
Now to the data. I suggest we include these data in our teaching, starting from day 1! So I wonder how we can do that most easily. The ideal would be if they added "it" into one of their packages as another dataset. Ideally it would produce all 4 (or 5 data frames described below. The first dataset, called trial has 4 variables and 15,039 rows: It is a nice example of very clean data, and I made all 4 variables into factors. But it could load as in the script, so that would be the first step. @lilyclements I realise I don't yet understand the data. The measurement is the ranks, 1, 2, and 3. What's the "winner"? Is it 1 as the first rank, or 3 as the highest number? The second is called covar. It has 11 variables and 557 rows: This provides potential covariates for the ranks in the trial data. There are 10 variaties and the next data provides variety-level data: (I don't know from where, but it is very nice to have this level of information. I hope the background can be included in the documentation about these data.) The data, called dat, is just a pivot-wider of the trial data (we don't need that, as it is easy to produce.): Next is a dataset at the trait level called kendall-rankings. I don't know why the overall results is absent, and am not clear what we learn from these data. I assume this is produced from their analyses, so we would not include it, in the data.) Finally, so far, we get to the climatic data, and they are in an odd shape: There are 577 rows - presumably for the 577 trials. There are 581 columns, with one for each day of the record to span the planting dates of each trial. I assume the rows are in the same order as the trials, but would have so much welcomed an ID variable as an indication of good practices. I was relieved that we will be handling the climatic covariates differently. However, it is probably worth reshaping these data and including the resulting data frame. I suggest there are just 77 different pixels of data, i.e. on average 10 stations make use of the same climatic data, and suggest we might include this information in our version of the climatic dataset. They won't necessarily have the same climatic summary, because their planting dates could be different, but we should still include that variable for completeness. I wonder how it relates to the 6 different trials? I have now included a reshaped chirps dataframe, together with the additional pixel variable. I am assuming we would not have both shapes of climatic data. |
Beta Was this translation helpful? Give feedback.
-
I can see modelling in Scripts 2 and 3 shared by Kaue.
From Script 2:
Model 1. Plackett-Luce Model
Fitting Plackett-Luce models to a list of rankings (
rankings_list
).This means we are fitting a separate Plackett-Luce model for each ranking in your list.
These models estimate a worth parameter for each item being ranked.
Look at
summary
,coef
,qvcalc(itempar(.x))
,reliability
,worth_map
,plot_logworth
:Model 2: Tree-Based Model for Rankings & Chocolate Consumption
We look at:
plot
,top_items
,node_rules
,regret
This shows the decision tree structure, top items in each split, and the rules defining the tree splits.
regret?
Model 3 and 4
Model 3: PLADMM Model Using Overall Rankings
rankings_list[["overall"]]
)We look at the
summary
We compare to our
overall
model from Model 1 to check differences in parameter estimates between PLADMM and PL.Model 4: PLADMM Model Using Trait Features
rankings_list[["overall"]]
)Beta Was this translation helpful? Give feedback.
All reactions