Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBH checks #24

Open
ValentineHerr opened this issue Apr 11, 2023 · 4 comments
Open

DBH checks #24

ValentineHerr opened this issue Apr 11, 2023 · 4 comments
Assignees

Comments

@ValentineHerr
Copy link
Member

We need to align our thresholds between the app and the GitHub actions, so @jess-shue will try to add the following statements to the app's DBH checks (on top of the absolute checks of -0.5cm for negative and 4cm for positive).

dbh_previous*0.75 > dbh_current & !is.na(dbh_previous) & !is.na(dbh_current) # suspiciousNegativeGrowth
dbh_previous*1.92 < dbh_current & !is.na(dbh_previous) & !is.na(dbh_current) # suspiciousPositiveGrowth
@teixeirak
Copy link
Member

@ValentineHerr and @jess-shue , I don't think there's any point in programming these into both the GIS app and GitHub actions. I think what makes most sense is:

1- GIS app flags absolute and possibly also relative anomalies in what has to be a relatively simple formulation that will not capture all anomalies
2- GitHub actions can be programmed to flag anomalies defined by size- and species- specific functions derived from past data. This would be ideal but is a bit complicated and not essential.

@jess-shue and @mitreds , how important do you think number 2 would be in catching errors?

@ValentineHerr
Copy link
Member Author

I agree that there is no use in having the same check in both.

But I think it is good to add onto the app's check because already GitHub action is flagging suspicious dbh increases that were not flagged by the app's system. We don't want to have too many trees to go and check.

The flags are mostly for small trees that doubled in size, which, I admit, probably won't be flagged by a size- and species- specific function, but fine tuning that function will take time, and the more we have the field crew add the "I double checked in the field" code, the less we will need them to go back to the field later to re-find the tree and check the measurement.

Also, thinking out loud, I am wondering the implication of having such a fine mesh to detect dbh errors while previous census didn't have that. The variability of measurement error will be reduced. Would that matter for analysis in the future?

@teixeirak
Copy link
Member

I completely agree that it's best to program checks into the app whenever possible. It's always easiest to catch the errors while they're at the tree.

I agree that reducing the measurement error will be a change from previous censuses, but I think it's purely a positive change. The question is whether it would become too burdensome for the crew to go back and check suspicious measurements. I think it will take a bit of trial to optimize the criteria for flagging potential errors, and there's definitely some philosophical calls there (what level of error is tolerable? what's the optimal tradeoff between error-free data and greater efficiency?).

@teixeirak
Copy link
Member

teixeirak commented Jul 12, 2023

I just spoke with Madeleine Udell (Stanford) about turning this issue into a project for data science MS students at Stanford (https://docs.google.com/document/d/18uXErzdAAf8DYM67JZVykz6wyXPhNS1ObTy4cLFTIqs/edit). The idea would be to use unsupervised machine learning to detect detect outliers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants