Note: this is open for discussion, currently on hold pending feedback from people who will collect categorical data.
Some traits are categorical - these are commonly used by breeders. The species table imported from USDA Plants has a number of categorical traits (e.g. 'PropagatedBySeed').
For the TERRAREF program, we need to allow categorical variables, but have decided to track these in traits in order to capture the who / where / when metadata (this is not captured in the species table). Categorical traits include some traits that could be quantified, but are not quantified in practice, such as 'seed color', 'maturity class' etc.
Proposed solution:
To support the collection of such data we will create two new fields in the variables table:
- 'categorical' of type boolean
- 'options' of type array
- arrays will have fields value, name, definition
When returning data, we can use lookup. As an example, the variable maturity class would be recorded as
| field |
value |
| id |
999 |
| description |
maturity class |
| units |
NULL |
| notes |
Maximum rate of RuBP regeneration. |
| name |
maturity_class |
| max |
0 |
| min |
1 |
| type |
trait |
| categorical |
TRUE |
| options |
|
| value |
option_name |
definition |
| 0 |
early |
senesces in < 100 days after planting |
| 1 |
late |
senesces in > 100 days after planting |
Other options
- add new columns to cultivars for each characteristic (this is not normalized, requires migrations)
- add a cultivars_characteristics and characteristics table (to prevent > 90% sparse table like species)
- figure out how to convert all categorical traits to numeric (e.g. in above example, use 'senescence' as an observation, days after planting computed from observation date - planting date.
- categories such as 'maturity group' can be computed 'on the fly'
- (can be combined with above) use BMS to store categorical variables.
would appreciate feedback from @nfahlgren, @terraref/standards-committee
Note: this is open for discussion, currently on hold pending feedback from people who will collect categorical data.
Some traits are categorical - these are commonly used by breeders. The species table imported from USDA Plants has a number of categorical traits (e.g. 'PropagatedBySeed').
For the TERRAREF program, we need to allow categorical variables, but have decided to track these in traits in order to capture the who / where / when metadata (this is not captured in the species table). Categorical traits include some traits that could be quantified, but are not quantified in practice, such as 'seed color', 'maturity class' etc.
Proposed solution:
To support the collection of such data we will create two new fields in the variables table:
When returning data, we can use lookup. As an example, the variable maturity class would be recorded as
Other options
would appreciate feedback from @nfahlgren, @terraref/standards-committee