Temporal dimension in calc_<> functions #112

eva0marques · 2024-07-26T18:50:55Z

I am writing process and calc functions for other covariates that I need in my own project.
I would like to open a discussion on the spatio x temporal case.

Let's say I want to create a model of AI to predict temperature at several locs x timestamps. I need to extract spatial covariates (easy) but also spatio x temporal ones.

In my ideal world, to do so:

I create a SpatVector or data.frame or sf/sftime with both geometry and time columns to give as locs param
I use calc_ functions to add columns for each covariate (they can be spatial or spatiotemporal). The calc_ functions for spatio-temporal covariates handle the "time" dimension properly, depending on the user's criteria (for eg: if geophysical model outputs are available every 3 days, and my predictions are every day: calc_ downscales the temporal resolution. It can also do the opposite if I have hourly data).

It would look like this:

my_spacetime_sample |>
  calc_era5() |>
  calc_nlcd() |>
  calc_gmted() |>
  ...

For now, calc_ functions are not optimally designed for temporal dimension. It is implied that locs is a spatial dataframe without time column. When calculating spatio-temporal covariates, it extracts all the time series of from. But if locs already has a time column (for eg created after calculating another spatio-temporal covariate), it becomes a mess.

As a summary, I see the following limitations with our current version of calc_ :

we cannot use several calc_ functions in a row (I mean give the output of a calc function to the input of another calc function) after dealing with spatio-temporal covariates
unlike spatial dimension, temporal dimension is not fine-tuned in the extracting process
user still has a lot of work to do in order to merge all covariates in a single spatio-temporal table, especially when covariates are not timely indexed in the same way.

It is not urgent of course, but I think it would be interesting to address this discussion in the future for a better use of amadeus.

The text was updated successfully, but these errors were encountered:

eva0marques · 2024-07-26T19:09:50Z

My suggestions to improve this situation:

add time_column parameter in calc_ functions for spatio-temporal covariates (narr, geos, hms, gridmet, terraclimate). It would be a character designating the time column in locs.
check that time_column exists in locs (I would also rename locs by sample or points or something more general rather than explicitly spatial) and that the data format is correct (POSIXCT with date and time for eg)
add a parameter for time extraction preference (nearest, downscale, mean, median, precedent, following...)
create a function to extract at time stamp with the corresponding way
1. extract all timeseries at each loc
2. create and use function find_time(time_pts, time_cov, method)
3. for each loc * time : extract the value of the corresponding covar date.

mitchellmanware · 2024-08-01T14:46:38Z

Temporal summaries + download inputs
Data frame "inflation" for static spatial variables
Syncoronize calc_* functions where output from calc_1 is used as locs in calc_2

calc_1() |>
  calc_2() |>
  calc_3()

eva0marques · 2024-08-01T15:04:31Z

In calc_ pipes it would be easier to distinguish spatiotemporal points from spatial points 🤔 (eventually include the inflate function from spatial pipe to spatiotemporal one):

If the goal is to create a datatable to feed AI models:

my_spatial_sample |>
  calc_nlcd() |>
  calc_gmted() |>
  ... |>
  inflate_to_spatiotemporal(timestamps) |>
  calc_era5() |>
  calc_modis()
  ...

If the goal is to store efficiently the calculated points:

my_spatial_sample |>
  calc_nlcd() |>
  calc_gmted() |>
  ... |>
  writeRDS()

my_spatiotemporal_sample |>
  calc_era5() |>
  calc_modis() |>
  ... |>
  writeRDS()

mitchellmanware · 2024-08-01T15:11:36Z

I think an option is updating the static calc functions to have an inflate parameter. If inflate = TRUE it automatically returns a spatio-temporal data frame (feed AI models example) where if inflate = FALSE it is a list with a vector of dates and single spatial data frame (efficiency example).

Either way refactoring the calc_ functions to retain columns from the locs to use in a pipe should not be too difficult to add.

mitchellmanware · 2024-08-01T15:13:06Z

Something like this

if (inflate) {
  message("Returning a list with ... because inflate = TRUE")
  inflated <- merge(dates, data.frame, all = TRUE)
  return(inflated)
} else {
  message("Returning a data.frame with ... because inflate = FALSE")
  return(list(dates, data.frame))
}

eva0marques · 2024-08-01T15:17:50Z

Yes it is also an interesting solution. I would still make the inflate() function available to Amadeus users because they might be interested to use it separately. For eg, you store the non-inflated sample, reopen it, and use inflate function without recalculating everything.

sigmafelix · 2024-08-07T14:41:46Z

@eva0marques

Sorry I am late for the discussion. As @mitchellmanware suggested, I think that a hands-on solution by adding several lines into calc_return_locs with inflate argument added. One thing to consider is how "full" space-time combinations are inferred or furnished, which can be implemented by using a fixed set of field names (i.e., lon, lat, and time) or by adding additional argument for a full space-time combination templates (by using expand.grid, for example). I think the former is more of a hands-on solution since we easily utilize set operations to detect the common field names for determining what to join and to expand. I have added some functions to do this in beethoven already, so I'd be happy to make changes in functions at which we will agree to update to implement this functionality.

sigmafelix · 2024-08-07T15:03:51Z

As a side note, if we are aiming to make calc_* functions to be piped, the default value of inflate or the equivalent argument should be TRUE.

eva0marques · 2024-08-07T18:17:13Z

I've implemented my idea (my comment above) on my own project because it was the most optimized and flexible set up. It works pretty well, I'll be able to share my feedback if you are interested.

eva0marques assigned eva0marques, sigmafelix, MAKassien, kyle-messier and mitchellmanware Jul 26, 2024

kyle-messier mentioned this issue Sep 22, 2024

1.1.0 checklist #125

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporal dimension in calc_<> functions #112

Temporal dimension in calc_<> functions #112

eva0marques commented Jul 26, 2024 •

edited

Loading

eva0marques commented Jul 26, 2024 •

edited

Loading

mitchellmanware commented Aug 1, 2024

eva0marques commented Aug 1, 2024 •

edited

Loading

mitchellmanware commented Aug 1, 2024

mitchellmanware commented Aug 1, 2024 •

edited

Loading

eva0marques commented Aug 1, 2024 •

edited

Loading

sigmafelix commented Aug 7, 2024

sigmafelix commented Aug 7, 2024

eva0marques commented Aug 7, 2024

Temporal dimension in calc_<> functions #112

Temporal dimension in calc_<> functions #112

Comments

eva0marques commented Jul 26, 2024 • edited Loading

eva0marques commented Jul 26, 2024 • edited Loading

mitchellmanware commented Aug 1, 2024

eva0marques commented Aug 1, 2024 • edited Loading

mitchellmanware commented Aug 1, 2024

mitchellmanware commented Aug 1, 2024 • edited Loading

eva0marques commented Aug 1, 2024 • edited Loading

sigmafelix commented Aug 7, 2024

sigmafelix commented Aug 7, 2024

eva0marques commented Aug 7, 2024

eva0marques commented Jul 26, 2024 •

edited

Loading

eva0marques commented Jul 26, 2024 •

edited

Loading

eva0marques commented Aug 1, 2024 •

edited

Loading

mitchellmanware commented Aug 1, 2024 •

edited

Loading

eva0marques commented Aug 1, 2024 •

edited

Loading