Duplicated entries at the lowest level of the hierarchy cause inaccurate errors

Hi @richardjtelford got a few related error reports from a user today.  Here's a reproducible example of the first:

If the community dataset supplied to traitstrap has additional levels of study that aren't included in the hierarchy, it can result in a misleading error.  For example, a user collected data on the nested levels of site, plot, and aspect.  Each plot has both an east and west aspect.  However, they decide they don't care about aspect and only want to analyze site and plot, and specify the hierarchy c("site","plot") without making changes to the data:

```r
require(traitstrap)  
#> Loading required package: traitstrap
#> Warning: package 'traitstrap' was built under R version 4.3.2
require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(tidyr)
#> Loading required package: tidyr
require(ggplot2)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.3.1
require(purrr)
#> Loading required package: purrr
#> Warning: package 'purrr' was built under R version 4.3.2

data(community)
data(trait)

community <-
community |>
    filter(
      PlotID %in% c("A", "B"),
      Site == 1
    )

# simulate a lower-level grouping variable (aspect) that isn't used in traitstrap

community <-
community %>%
  mutate(aspect = "W") %>%
  bind_rows(community %>%
              mutate(aspect = "E"))


filled_traits <- trait_fill(
  comm = community,
  traits = trait,
  scale_hierarchy = c("Site", "PlotID"),
  taxon_col = "Taxon", value_col = "Value",
  trait_col = "Trait", abundance_col = "Cover",
  complete_only = TRUE, leaf_id = "ID"
)

boot_traits <- trait_multivariate_bootstrap(filled_traits,
                                            fun = cor,
                                            nrep = 10,
                                            sample_size = 100
)
#> Error in trait_multivariate_bootstrap(filled_traits, fun = cor, nrep = 10, : Some leaves with incomplete set of traits.
#>          Please run trait_fill() with complete_only set to TRUE.

# Error says "incomplete leaves", but there are actually TOO MANY leaves

  filled_traits %>%
    group_by(Site,PlotID,Taxon,Trait,ID)%>%
    summarise(n=n())
#> `summarise()` has grouped output by 'Site', 'PlotID', 'Taxon', 'Trait'. You can
#> override using the `.groups` argument.
#> # A tibble: 279 x 6
#> # Groups:   Site, PlotID, Taxon, Trait [45]
#>    Site  PlotID Taxon             Trait                 ID          n
#>    <chr> <chr>  <chr>             <chr>                 <chr>   <int>
#>  1 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ARP2422     2
#>  2 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ARQ4344     2
#>  3 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ARR3874     2
#>  4 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ASE0575     2
#>  5 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ASI1602     2
#>  6 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ASJ7230     2
#>  7 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ASN6300     2
#>  8 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ASR3787     2
#>  9 1     A      bistorta vivipara Leaf_Thickness_Ave_mm ATZ0228     2
#> 10 1     A      bistorta vivipara Leaf_Thickness_Ave_mm AUA5793     2
#> # i 269 more rows

```

The error the user gets suggests there are missing data, but the actual cause is TOO much data.  This error is shown whenever the expected number of traits is not found (whether that number is too high OR too low).

I suggest we add a check to trait_fill to make sure that each taxon/(lowest level of the hierarchy) combination only appears once at most in the community.

We may also want to amend the errors in trait_multivariate_bootstrap to differentiate between too many and too few data, as these will have different causes.

Happy to make these changes if they seem reasonable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated entries at the lowest level of the hierarchy cause inaccurate errors #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Duplicated entries at the lowest level of the hierarchy cause inaccurate errors #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions