Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling NAs in covariate data #23

Open
sigmafelix opened this issue Feb 28, 2024 · 7 comments
Open

Handling NAs in covariate data #23

sigmafelix opened this issue Feb 28, 2024 · 7 comments

Comments

@sigmafelix
Copy link
Collaborator

  • Some NAs are not true zero as there are nonexistent categories (e.g., soil chemistry)
  • Non-covered points in thematic data affect NAs in the covariate data as well
  • Values will be replaced with zero if NAs are truly zero
  • To impute NAs or to exclude the field?
@kyle-messier
Copy link
Collaborator

@sigmafelix If a category exists at some location, but is non-existent in a given location, is it not a true zero?

@sigmafelix
Copy link
Collaborator Author

@Spatiotemporal-Exposures-and-Toxicology I think these values are unmeasured (unknown) than true zeros. For example, in the soil chemistry data, some locations got measurements of tens of elements while others only got a handful. The latter will have NAs in the fields of the elements measured in the former locations.

@kyle-messier
Copy link
Collaborator

@sigmafelix can you direct me to the source of the soil chemistry data generation?

@sigmafelix
Copy link
Collaborator Author

@sigmafelix
Copy link
Collaborator Author

@Spatiotemporal-Exposures-and-Toxicology data_AZO_covariates.qs (./output in the project directory) contains HUC-8, -10, and -12 level terraClimate and PRISM covariates. Point based fields were removed. For terraClimate variables, we need to consider what fields should be summed or averaged. As the qs data file has all of these fields, we could remove sum/mean fields for certain variables. My suggestion for selecting variables is that--

  • sum: aet, def, pet, ppt, q, soil, swe (mean?)
  • mean: PDSI, srad, tmax (max?), tmin (min?), vap, vpd, ws
# aet (Actual Evapotranspiration, monthly total), units = mm
# def (Climate Water Deficit, monthly total), units = mm
# PDSI (Palmer Drought Severity Index, at end of month), units = unitless
# pet (Potential evapotranspiration, monthly total), units = mm
# ppt (Precipitation, monthly total), units = mm
# q (Runoff, monthly total), units = mm
# soil (Soil Moisture, total column - at end of month), units = mm
# srad (Downward surface shortwave radiation), units = W/m2
# swe (Snow water equivalent - at end of month), units = mm
# tmax (Max Temperature, average for month), units = C
# tmin (Min Temperature, average for month), units = C
# vap (Vapor pressure, average for month), units  = kPa
# vpd (Vapor Pressure Deficit, average for month), units = kpa
# ws (Wind speed, average for month), units = m/s

@kyle-messier
Copy link
Collaborator

@sigmafelix I like your recommendations - I think we should keep it simple and only do sum or mean.

As for the point- there are no exact pixel extractions anymore? I'm good with that, but just wanted to check.

@sigmafelix
Copy link
Collaborator Author

@Spatiotemporal-Exposures-and-Toxicology
For terraClimate and PRISM, there are no pixel extractions. Soil chemistry, aquifer (rock type), geology unit type, and pesticide estimates (county level) are extracted at point locations. NASS variables were converted to proportions. Per our decision, I excluded unnecessary terraClimate variables from the table and cleaned field names to align with the prefix table (data_AZO_covariates_prefixes.csv). The result is saved as data_AZO_covariates_cleaned_03032024.qs (all in ddn).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants