Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_SDA_*(): Local SDA-style queries #246

Merged
merged 19 commits into from
May 13, 2022
Merged

get_SDA_*(): Local SDA-style queries #246

merged 19 commits into from
May 13, 2022

Conversation

brownag
Copy link
Member

@brownag brownag commented May 2, 2022

Work in Progress:

  • Implements dsn and SQLite<->T-SQL compatible syntax for get_SDA_*() methods

TODO:

  • Why is SQLite GROUP_CONCAT() so slow when run locally?
    • Evaluate alternate FOR XML syntax options
  • Generalize "TOP 1" -> "LIMIT 1" syntax conversion
  • Generalize temporary table conversion

 - currently only get_SDA_coecoclass(), get_SDA_cosurfmorph(), get_SDA_muaggatt(), get_SDA_pmgroupname(), get_SDA_property(method="none") have SQLite compatible syntax
 - still working on T-SQL/SQLite cross-compatibility
@dylanbeaudette
Copy link
Member

Nice, thanks for taking this on. I tested the get / createSSURGO code over the weekend and the resulting gpkg looked great. Can we point these queries to one of those databases as well as any sqlite DB with SSURGO schema?

@brownag
Copy link
Member Author

brownag commented May 3, 2022

Can we point these queries to one of those databases as well as any sqlite DB with SSURGO schema?

Yes, but it presently won't work for several queries that use temporary tables. Haven't come up with a workaround yet.

brownag and others added 3 commits May 3, 2022 17:11
 - some new non-exported helper functions for FGDB
- mirror get_SDA_interpretation type result for NASIS SSURGO exports etc.

- mdb2sqlite
@brownag brownag mentioned this pull request May 4, 2022
@brownag
Copy link
Member Author

brownag commented May 5, 2022

Added a fix for another edge case reported by @dylanbeaudette.

In some instances when a weighted average was calculated the "rated percent" used to correct results for the proportion of the mapunit that had "averageable values" did not account for possibility of some horizons missing data in that depth interval. The rating incorrectly assumed that 100% of the mapunit / horizons were rated in these cases, resulting in values diluted by averaging in 0 values.

For example STATSGO MUKEY 660955 all the components are (apparently) minor components (per majcompflag), the dominant component Rindge has organic horizons without pH, but other components do have pH reported in [0,25]. The "expected" value is a pH that is not diluted by the lack of data for the Rindge components (i.e. using just 0 to 25cm data data from Gazwell, Egbert, Sailboat, and Columbia)

Before:

get_SDA_property("ph1to1h2o_r", "weighted average", top = 0, bottom = 25, mukeys = 660955, include_minors = TRUE)
#>  areasymbol musym                       muname  mukey ph1to1h2o_r
#> 1         US  s852 Rindge-Gazwell-Egbert (s852) 660955        2.49

After:

get_SDA_property("ph1to1h2o_r", "weighted average", top = 0, bottom = 25, mukeys = 660955, include_minors = TRUE)  # #> #> areasymbol musym                       muname  mukey ph1to1h2o_r
#> 1         US  s852 Rindge-Gazwell-Egbert (s852) 660955        6.24

Matching expected value: (0.25*5.8+0.06*7+0.05*7+0.02*6.7+0.02*7)/0.4 = 6.235

Un-aggregated data:

soilDB::get_SDA_property("ph1to1h2o_r", "none", mukeys = 660955, include_minors = TRUE)[,c("compname","comppct_r","hzdept_r","hzdepb_r","ph1to1h2o_r")]
#>    compname comppct_r hzdept_r hzdepb_r ph1to1h2o_r
#> 1    Rindge        35        0       33          NA
#> 2    Rindge        35       33      152          NA
#> 3   Gazwell        25        0       76         5.8
#> 4   Gazwell        25       76       91         6.1
#> 5   Gazwell        25       91      152         5.3
#> 6    Rindge        14        0       38          NA
#> 7    Rindge        14       38      152          NA
#> 8    Egbert         6        0       48         7.0
#> 9    Egbert         6       48       84         7.0
#> 10   Egbert         6       84      152         7.3
#> 11   Egbert         5        0       79         7.0
#> 12   Egbert         5       79      117         7.0
#> 13   Egbert         5      117      152         7.3
#> 14   Rindge         5        0       33          NA
#> 15   Rindge         5       33      152          NA
#> 16   Venice         4        0       30          NA
#> 17   Venice         4       30      152          NA
#> 18 Sailboat         2        0       41         6.7
#> 19 Sailboat         2       41       71         7.2
#> 20 Sailboat         2       71       86         7.9
#> 21 Sailboat         2       86      157         7.9
#> 22 Columbia         2        0       41         7.0
#> 23 Columbia         2       41       58         7.0
#> 24 Columbia         2       58      104         7.0
#> 25 Columbia         2      104      152         7.0
#> 26   Rindge         1        0      152          NA
#> 27   Venice         1        0       30          NA
#> 28   Venice         1       30      152          NA

@brownag
Copy link
Member Author

brownag commented May 6, 2022

Another fix: covers cases where some horizons (but not all) within rated components / depth interval are missing data for property of interest. Also omits some unneeded logic pertaining to soil texture--leftovers from prior filtering out organic materials/bedrock--that interfere with getting STATSGO component horizons without an RV texture (!!)

Rather than interpreting (some) missing data horizons as having data value of zero, weighting is adjusted for calculating the component-specific portion of the weighted average. Horizon-specific thickness weight and total thickness (over whole profile) are reduced--essentially just using the part(s) of each profile that has data.

I think that it is questionable what the "right" answer is under these circumstances, but this result should be better/more defensible than assuming NA == 0 in most cases which generally dilutes the mapunit-level result.

Before

#>   areasymbol musym                                              muname  mukey   ec_l
#> 1         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517  1.13

After

library(soilDB)
get_SDA_property(
  "ec_l",
  method = "weighted average",
  mukey = 659517          ,
  top = 0,
  bottom = 25,
  include_minors = TRUE,
)
#>   areasymbol musym                                              muname  mukey
#> 1         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#>   ec_l
#> 1 1.61

X <- get_SDA_property(
  "ec_l",
  method = "none",
  mukey = 659517          ,
  top = 0,
  bottom = 25,
  include_minors = TRUE
) |> 
  subset(hzdept_r <= 25)
X
#>    areasymbol musym                                              muname  mukey
#> 1          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 2          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 5          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 6          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 7          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 8          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 9          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 10         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 11         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 12         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 13         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 14         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#>      compname compkind comppct_r majcompflag    cokey    chkey compname.1
#> 1      Uffens   Series        25         No  14182030 40788857     Uffens
#> 2      Uffens   Series        25         No  14182030 40788854     Uffens
#> 5   Panitchen   Series        25         No  14182032 40788860  Panitchen
#> 6   Panitchen   Series        25         No  14182032 40788861  Panitchen
#> 7   Dominguez   Series        20         No  14182035 40788867  Dominguez
#> 8   Dominguez   Series        20         No  14182035 40788868  Dominguez
#> 9      Happle   Series        15         No  14182034 40788865     Happle
#> 10     Happle   Series        15         No  14182034 40788866     Happle
#> 11  Youngston   Series        10         No  14182031 40788858  Youngston
#> 12  Youngston   Series        10         No  14182031 40788859  Youngston
#> 13 Cowestglen   Series         5         No  14182033 40788862 Cowestglen
#> 14 Cowestglen   Series         5         No  14182033 40788863 Cowestglen
#>    comppct_r.1 majcompflag.1 hzdept_r hzdepb_r ec_l
#> 1           25           No         0       13    0
#> 2           25           No        13       51    4
#> 5           25           No         0       15   NA
#> 6           25           No        15      152    4
#> 7           20           No         0       23    0
#> 8           20           No        23      152    0
#> 9           15           No         0       18    0
#> 10          15           No        18      152    0
#> 11          10           No         0       10    0
#> 12          10           No        10      152    2
#> 13           5           No         0       23    0
#> 14           5           No        23      107    2

Same results are obtained doing aggregation in R

library(aqp)
depths(X) <- cokey ~ hzdept_r + hzdepb_r
site(X) <- ~ comppct_r
# X$ec_l[is.na(X$ec_l)] <- 0 # alternate NA assumption for comparison
x2 <- trunc(X, 0, 25) |>
  mutate_profile(na_thk = sum((hzdepb_r - hzdept_r)[is.na(ec_l)]),
                 ec_025 = sum(na.omit((hzdepb_r - hzdept_r) * ec_l)) / (25 - na_thk))
sum(x2$comppct_r / 100 * x2$ec_025)
#> [1] 1.608

Under the "alternate" assumption (where horizons with some data elements NULL are 0) results in 1.008.

@brownag brownag marked this pull request as ready for review May 13, 2022 20:59
@brownag
Copy link
Member Author

brownag commented May 13, 2022

I am going to merge this PR because of all the little bug fixes and getting the simpler queries working with SQLite. I think I have figured out an approach to handle TODO item 3, that with indexing of the temp tables may resolve item 1.

Notably most get_SDA_interpretation() and get_SDA_property() aggregation methods still will not work with SQLite sources: this will require a significant refactor. I am going to create an issue to track my ideas on future updates on those, and will do another PR when I have a prototype

@brownag brownag merged commit 8b7d035 into master May 13, 2022
@brownag brownag deleted the localSDA branch May 14, 2022 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants