-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_SDA_*()
: Local SDA-style queries
#246
Conversation
- currently only get_SDA_coecoclass(), get_SDA_cosurfmorph(), get_SDA_muaggatt(), get_SDA_pmgroupname(), get_SDA_property(method="none") have SQLite compatible syntax
- still working on T-SQL/SQLite cross-compatibility
Nice, thanks for taking this on. I tested the get / createSSURGO code over the weekend and the resulting gpkg looked great. Can we point these queries to one of those databases as well as any sqlite DB with SSURGO schema? |
Yes, but it presently won't work for several queries that use temporary tables. Haven't come up with a workaround yet. |
- some new non-exported helper functions for FGDB
- mirror get_SDA_interpretation type result for NASIS SSURGO exports etc. - mdb2sqlite
Add get_SSURGO_utils.R
….mukey` or `component.cokey`
Added a fix for another edge case reported by @dylanbeaudette. In some instances when a weighted average was calculated the "rated percent" used to correct results for the proportion of the mapunit that had "averageable values" did not account for possibility of some horizons missing data in that depth interval. The rating incorrectly assumed that 100% of the mapunit / horizons were rated in these cases, resulting in values diluted by averaging in 0 values. For example STATSGO MUKEY 660955 all the components are (apparently) minor components (per majcompflag), the dominant component Rindge has organic horizons without pH, but other components do have pH reported in [0,25]. The "expected" value is a pH that is not diluted by the lack of data for the Rindge components (i.e. using just 0 to 25cm data data from Gazwell, Egbert, Sailboat, and Columbia) Before:
After: get_SDA_property("ph1to1h2o_r", "weighted average", top = 0, bottom = 25, mukeys = 660955, include_minors = TRUE) # #> #> areasymbol musym muname mukey ph1to1h2o_r
#> 1 US s852 Rindge-Gazwell-Egbert (s852) 660955 6.24 Matching expected value: Un-aggregated data: soilDB::get_SDA_property("ph1to1h2o_r", "none", mukeys = 660955, include_minors = TRUE)[,c("compname","comppct_r","hzdept_r","hzdepb_r","ph1to1h2o_r")]
#> compname comppct_r hzdept_r hzdepb_r ph1to1h2o_r
#> 1 Rindge 35 0 33 NA
#> 2 Rindge 35 33 152 NA
#> 3 Gazwell 25 0 76 5.8
#> 4 Gazwell 25 76 91 6.1
#> 5 Gazwell 25 91 152 5.3
#> 6 Rindge 14 0 38 NA
#> 7 Rindge 14 38 152 NA
#> 8 Egbert 6 0 48 7.0
#> 9 Egbert 6 48 84 7.0
#> 10 Egbert 6 84 152 7.3
#> 11 Egbert 5 0 79 7.0
#> 12 Egbert 5 79 117 7.0
#> 13 Egbert 5 117 152 7.3
#> 14 Rindge 5 0 33 NA
#> 15 Rindge 5 33 152 NA
#> 16 Venice 4 0 30 NA
#> 17 Venice 4 30 152 NA
#> 18 Sailboat 2 0 41 6.7
#> 19 Sailboat 2 41 71 7.2
#> 20 Sailboat 2 71 86 7.9
#> 21 Sailboat 2 86 157 7.9
#> 22 Columbia 2 0 41 7.0
#> 23 Columbia 2 41 58 7.0
#> 24 Columbia 2 58 104 7.0
#> 25 Columbia 2 104 152 7.0
#> 26 Rindge 1 0 152 NA
#> 27 Venice 1 0 30 NA
#> 28 Venice 1 30 152 NA |
…izons (adjusts RATING_PCT accordingly) - found by @dylanbeaudette, e.g. MUKEY 660955
Another fix: covers cases where some horizons (but not all) within rated components / depth interval are missing data for property of interest. Also omits some unneeded logic pertaining to soil texture--leftovers from prior filtering out organic materials/bedrock--that interfere with getting STATSGO component horizons without an RV texture (!!) Rather than interpreting (some) missing data horizons as having data value of zero, weighting is adjusted for calculating the component-specific portion of the weighted average. Horizon-specific thickness weight and total thickness (over whole profile) are reduced--essentially just using the part(s) of each profile that has data. I think that it is questionable what the "right" answer is under these circumstances, but this result should be better/more defensible than assuming NA == 0 in most cases which generally dilutes the mapunit-level result. Before
After library(soilDB)
get_SDA_property(
"ec_l",
method = "weighted average",
mukey = 659517 ,
top = 0,
bottom = 25,
include_minors = TRUE,
)
#> areasymbol musym muname mukey
#> 1 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> ec_l
#> 1 1.61
X <- get_SDA_property(
"ec_l",
method = "none",
mukey = 659517 ,
top = 0,
bottom = 25,
include_minors = TRUE
) |>
subset(hzdept_r <= 25)
X
#> areasymbol musym muname mukey
#> 1 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 2 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 5 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 6 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 7 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 8 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 9 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 10 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 11 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 12 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 13 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> 14 US s1196 Youngston-Uffens-Panitchen-Happle-Dominguez (s1196) 659517
#> compname compkind comppct_r majcompflag cokey chkey compname.1
#> 1 Uffens Series 25 No 14182030 40788857 Uffens
#> 2 Uffens Series 25 No 14182030 40788854 Uffens
#> 5 Panitchen Series 25 No 14182032 40788860 Panitchen
#> 6 Panitchen Series 25 No 14182032 40788861 Panitchen
#> 7 Dominguez Series 20 No 14182035 40788867 Dominguez
#> 8 Dominguez Series 20 No 14182035 40788868 Dominguez
#> 9 Happle Series 15 No 14182034 40788865 Happle
#> 10 Happle Series 15 No 14182034 40788866 Happle
#> 11 Youngston Series 10 No 14182031 40788858 Youngston
#> 12 Youngston Series 10 No 14182031 40788859 Youngston
#> 13 Cowestglen Series 5 No 14182033 40788862 Cowestglen
#> 14 Cowestglen Series 5 No 14182033 40788863 Cowestglen
#> comppct_r.1 majcompflag.1 hzdept_r hzdepb_r ec_l
#> 1 25 No 0 13 0
#> 2 25 No 13 51 4
#> 5 25 No 0 15 NA
#> 6 25 No 15 152 4
#> 7 20 No 0 23 0
#> 8 20 No 23 152 0
#> 9 15 No 0 18 0
#> 10 15 No 18 152 0
#> 11 10 No 0 10 0
#> 12 10 No 10 152 2
#> 13 5 No 0 23 0
#> 14 5 No 23 107 2 Same results are obtained doing aggregation in R
Under the "alternate" assumption (where horizons with some data elements NULL are 0) results in |
…el result (compname, comppct, majcompflag)
I am going to merge this PR because of all the little bug fixes and getting the simpler queries working with SQLite. I think I have figured out an approach to handle TODO item 3, that with indexing of the temp tables may resolve item 1. Notably most |
Work in Progress:
dsn
and SQLite<->T-SQL compatible syntax forget_SDA_*()
methodsTODO:
GROUP_CONCAT()
so slow when run locally?FOR XML
syntax options