`get_SDA_*()`: Local SDA-style queries #246

brownag · 2022-05-02T23:59:58Z

Work in Progress:

Implements dsn and SQLite<->T-SQL compatible syntax for get_SDA_*() methods

TODO:

Why is SQLite GROUP_CONCAT() so slow when run locally?
- Evaluate alternate FOR XML syntax options
Generalize "TOP 1" -> "LIMIT 1" syntax conversion
Generalize temporary table conversion

- currently only get_SDA_coecoclass(), get_SDA_cosurfmorph(), get_SDA_muaggatt(), get_SDA_pmgroupname(), get_SDA_property(method="none") have SQLite compatible syntax

- still working on T-SQL/SQLite cross-compatibility

dylanbeaudette · 2022-05-03T16:09:58Z

Nice, thanks for taking this on. I tested the get / createSSURGO code over the weekend and the resulting gpkg looked great. Can we point these queries to one of those databases as well as any sqlite DB with SSURGO schema?

brownag · 2022-05-03T16:42:43Z

Can we point these queries to one of those databases as well as any sqlite DB with SSURGO schema?

Yes, but it presently won't work for several queries that use temporary tables. Haven't come up with a workaround yet.

- some new non-exported helper functions for FGDB

- mirror get_SDA_interpretation type result for NASIS SSURGO exports etc. - mdb2sqlite

Add get_SSURGO_utils.R

….mukey` or `component.cokey`

brownag · 2022-05-05T22:49:21Z

Added a fix for another edge case reported by @dylanbeaudette.

In some instances when a weighted average was calculated the "rated percent" used to correct results for the proportion of the mapunit that had "averageable values" did not account for possibility of some horizons missing data in that depth interval. The rating incorrectly assumed that 100% of the mapunit / horizons were rated in these cases, resulting in values diluted by averaging in 0 values.

For example STATSGO MUKEY 660955 all the components are (apparently) minor components (per majcompflag), the dominant component Rindge has organic horizons without pH, but other components do have pH reported in [0,25]. The "expected" value is a pH that is not diluted by the lack of data for the Rindge components (i.e. using just 0 to 25cm data data from Gazwell, Egbert, Sailboat, and Columbia)

Before:

get_SDA_property("ph1to1h2o_r", "weighted average", top = 0, bottom = 25, mukeys = 660955, include_minors = TRUE)
#>  areasymbol musym                       muname  mukey ph1to1h2o_r
#> 1         US  s852 Rindge-Gazwell-Egbert (s852) 660955        2.49

After:

get_SDA_property("ph1to1h2o_r", "weighted average", top = 0, bottom = 25, mukeys = 660955, include_minors = TRUE)  # #> #> areasymbol musym                       muname  mukey ph1to1h2o_r
#> 1         US  s852 Rindge-Gazwell-Egbert (s852) 660955        6.24

Matching expected value: (0.25*5.8+0.06*7+0.05*7+0.02*6.7+0.02*7)/0.4 = 6.235

Un-aggregated data:

soilDB::get_SDA_property("ph1to1h2o_r", "none", mukeys = 660955, include_minors = TRUE)[,c("compname","comppct_r","hzdept_r","hzdepb_r","ph1to1h2o_r")]
#>    compname comppct_r hzdept_r hzdepb_r ph1to1h2o_r
#> 1    Rindge        35        0       33          NA
#> 2    Rindge        35       33      152          NA
#> 3   Gazwell        25        0       76         5.8
#> 4   Gazwell        25       76       91         6.1
#> 5   Gazwell        25       91      152         5.3
#> 6    Rindge        14        0       38          NA
#> 7    Rindge        14       38      152          NA
#> 8    Egbert         6        0       48         7.0
#> 9    Egbert         6       48       84         7.0
#> 10   Egbert         6       84      152         7.3
#> 11   Egbert         5        0       79         7.0
#> 12   Egbert         5       79      117         7.0
#> 13   Egbert         5      117      152         7.3
#> 14   Rindge         5        0       33          NA
#> 15   Rindge         5       33      152          NA
#> 16   Venice         4        0       30          NA
#> 17   Venice         4       30      152          NA
#> 18 Sailboat         2        0       41         6.7
#> 19 Sailboat         2       41       71         7.2
#> 20 Sailboat         2       71       86         7.9
#> 21 Sailboat         2       86      157         7.9
#> 22 Columbia         2        0       41         7.0
#> 23 Columbia         2       41       58         7.0
#> 24 Columbia         2       58      104         7.0
#> 25 Columbia         2      104      152         7.0
#> 26   Rindge         1        0      152          NA
#> 27   Venice         1        0       30          NA
#> 28   Venice         1       30      152          NA

@dylanbeaudette

…izons (adjusts RATING_PCT accordingly) - found by @dylanbeaudette, e.g. MUKEY 660955

brownag · 2022-05-06T21:33:37Z

Another fix: covers cases where some horizons (but not all) within rated components / depth interval are missing data for property of interest. Also omits some unneeded logic pertaining to soil texture--leftovers from prior filtering out organic materials/bedrock--that interfere with getting STATSGO component horizons without an RV texture (!!)

Rather than interpreting (some) missing data horizons as having data value of zero, weighting is adjusted for calculating the component-specific portion of the weighted average. Horizon-specific thickness weight and total thickness (over whole profile) are reduced--essentially just using the part(s) of each profile that has data.

I think that it is questionable what the "right" answer is under these circumstances, but this result should be better/more defensible than assuming NA == 0 in most cases which generally dilutes the mapunit-level result.

Before

#>   areasymbol musym                                              muname  mukey   ec_l
#> 1         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517  1.13

After

library(soilDB)
get_SDA_property(
  "ec_l",
  method = "weighted average",
  mukey = 659517          ,
  top = 0,
  bottom = 25,
  include_minors = TRUE,
)
#>   areasymbol musym                                              muname  mukey
#> 1         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#>   ec_l
#> 1 1.61

X <- get_SDA_property(
  "ec_l",
  method = "none",
  mukey = 659517          ,
  top = 0,
  bottom = 25,
  include_minors = TRUE
) |> 
  subset(hzdept_r <= 25)
X
#>    areasymbol musym                                              muname  mukey
#> 1          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 2          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 5          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 6          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 7          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 8          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 9          US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 10         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 11         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 12         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 13         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 14         US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#>      compname compkind comppct_r majcompflag    cokey    chkey compname.1
#> 1      Uffens   Series        25         No  14182030 40788857     Uffens
#> 2      Uffens   Series        25         No  14182030 40788854     Uffens
#> 5   Panitchen   Series        25         No  14182032 40788860  Panitchen
#> 6   Panitchen   Series        25         No  14182032 40788861  Panitchen
#> 7   Dominguez   Series        20         No  14182035 40788867  Dominguez
#> 8   Dominguez   Series        20         No  14182035 40788868  Dominguez
#> 9      Happle   Series        15         No  14182034 40788865     Happle
#> 10     Happle   Series        15         No  14182034 40788866     Happle
#> 11  Youngston   Series        10         No  14182031 40788858  Youngston
#> 12  Youngston   Series        10         No  14182031 40788859  Youngston
#> 13 Cowestglen   Series         5         No  14182033 40788862 Cowestglen
#> 14 Cowestglen   Series         5         No  14182033 40788863 Cowestglen
#>    comppct_r.1 majcompflag.1 hzdept_r hzdepb_r ec_l
#> 1           25           No         0       13    0
#> 2           25           No        13       51    4
#> 5           25           No         0       15   NA
#> 6           25           No        15      152    4
#> 7           20           No         0       23    0
#> 8           20           No        23      152    0
#> 9           15           No         0       18    0
#> 10          15           No        18      152    0
#> 11          10           No         0       10    0
#> 12          10           No        10      152    2
#> 13           5           No         0       23    0
#> 14           5           No        23      107    2

Same results are obtained doing aggregation in R

library(aqp)
depths(X) <- cokey ~ hzdept_r + hzdepb_r
site(X) <- ~ comppct_r
# X$ec_l[is.na(X$ec_l)] <- 0 # alternate NA assumption for comparison
x2 <- trunc(X, 0, 25) |>
  mutate_profile(na_thk = sum((hzdepb_r - hzdept_r)[is.na(ec_l)]),
                 ec_025 = sum(na.omit((hzdepb_r - hzdept_r) * ec_l)) / (25 - na_thk))
sum(x2$comppct_r / 100 * x2$ec_025)
#> [1] 1.608

Under the "alternate" assumption (where horizons with some data elements NULL are 0) results in 1.008.

…el result (compname, comppct, majcompflag)

brownag · 2022-05-13T21:06:17Z

I am going to merge this PR because of all the little bug fixes and getting the simpler queries working with SQLite. I think I have figured out an approach to handle TODO item 3, that with indexing of the temp tables may resolve item 1.

Notably most get_SDA_interpretation() and get_SDA_property() aggregation methods still will not work with SQLite sources: this will require a significant refactor. I am going to create an issue to track my ideas on future updates on those, and will do another PR when I have a prototype

brownag added 3 commits May 2, 2022 17:03

get_SDA_*: add support for dsn argument

149d358

- currently only get_SDA_coecoclass(), get_SDA_cosurfmorph(), get_SDA_muaggatt(), get_SDA_pmgroupname(), get_SDA_property(method="none") have SQLite compatible syntax

rename get_SDA_interpretations.R

b155f5d

Add local-SDA-demo.R

7859809

- still working on T-SQL/SQLite cross-compatibility

brownag force-pushed the localSDA branch from 85ba556 to 7859809 Compare May 3, 2022 00:03

brownag added 3 commits May 2, 2022 19:45

get_SDA_cosurfmorph: fix CONCAT

bce1bf0

get_SDA_interpretation: TOP 1 / LIMIT 1

63f62a6

get_SDA_hydric/get_SDA_pmgroupname

e371016

brownag and others added 3 commits May 3, 2022 17:11

createSSURGO: updates

2611320

- some new non-exported helper functions for FGDB

docs

dd92152

SSURGO schema-centric tools

06639fa

- mirror get_SDA_interpretation type result for NASIS SSURGO exports etc. - mdb2sqlite

brownag mentioned this pull request May 4, 2022

Add get_SSURGO_utils.R #247

Merged

brownag added 6 commits May 3, 2022 17:18

Merge pull request #247 from ncss-tech/accessSSURGOexport

1705aaf

Add get_SSURGO_utils.R

move

84010ee

createSSURGO: fixes

bdd6acf

Add compkind to method="none" (component or horizon level) output

e37e256

Fix for ambiguous by variables such as mukey => use e.g. `mapunit…

55a7c7f

….mukey` or `component.cokey`

Merge branch 'master' into localSDA

8a436d8

brownag added 3 commits May 5, 2022 15:50

get_SDA_property: fix for cases where a property is NULL for some hor…

ec28f15

…izons (adjusts RATING_PCT accordingly) - found by @dylanbeaudette, e.g. MUKEY 660955

Simplify NOT NULL logic

6c2fdd0

More updates for get_SDA_property

974e812

get_SDA_property: remove duplicated site-level columns in horizon-lev…

bc9403b

…el result (compname, comppct, majcompflag)

brownag marked this pull request as ready for review May 13, 2022 20:59

brownag merged commit 8b7d035 into master May 13, 2022

brownag mentioned this pull request May 13, 2022

Implement SQLite/T-SQL compatible syntax for get_SDA_*() methods #250

Open

2 tasks

brownag deleted the localSDA branch May 14, 2022 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`get_SDA_*()`: Local SDA-style queries #246

`get_SDA_*()`: Local SDA-style queries #246

brownag commented May 2, 2022 •

edited

Loading

dylanbeaudette commented May 3, 2022

brownag commented May 3, 2022

brownag commented May 5, 2022

brownag commented May 6, 2022

brownag commented May 13, 2022

get_SDA_*(): Local SDA-style queries #246

get_SDA_*(): Local SDA-style queries #246

Conversation

brownag commented May 2, 2022 • edited Loading

dylanbeaudette commented May 3, 2022

brownag commented May 3, 2022

brownag commented May 5, 2022

brownag commented May 6, 2022

Before

brownag commented May 13, 2022

`get_SDA_*()`: Local SDA-style queries #246

`get_SDA_*()`: Local SDA-style queries #246

brownag commented May 2, 2022 •

edited

Loading