Add listcol functionality #8

boshek · 2017-04-05T20:44:27Z

Not for upcoming CRAN submission. I'm uncertain how to ultimately implement this as some user might not like it. Perhaps just as a listcol = TRUE argument in weather().

steffilazerte · 2017-04-07T12:45:09Z

How did you envision this looking like in the end? Would the user be returned a data frame with a column of station ids and a column with listed station data (as in below)?

We could either just show users how to do it themselves:

Unnested

w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15")
w

# A tibble: 720 × 35
   station_name station_id   prov   lat    lon       date                time  year month   day  hour  qual       weather  hmdx
*         <chr>      <dbl> <fctr> <dbl>  <dbl>     <date>              <dttm> <chr> <chr> <chr> <chr> <chr>         <chr> <dbl>
1   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 00:00:00  2017    01    01 00:00     ‡          Snow    NA
2   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 01:00:00  2017    01    01 01:00     ‡          Snow    NA
3   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 02:00:00  2017    01    01 02:00     ‡          Snow    NA
4   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 03:00:00  2017    01    01 03:00     ‡ Moderate Snow    NA
5   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 04:00:00  2017    01    01 04:00     ‡          Snow    NA
6   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 05:00:00  2017    01    01 05:00     ‡          Snow    NA
7   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 06:00:00  2017    01    01 06:00     ‡          Snow    NA
8   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 07:00:00  2017    01    01 07:00     ‡          Snow    NA
9   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 08:00:00  2017    01    01 08:00     ‡          Snow    NA
10  FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 09:00:00  2017    01    01 09:00     ‡          Snow    NA
# ... with 710 more rows, and 21 more variables: hmdx_flag <chr>, pressure <dbl>, pressure_flag <chr>, rel_hum <dbl>,
#   rel_hum_flag <chr>, temp <dbl>, temp_dew <dbl>, temp_dew_flag <chr>, temp_flag <chr>, visib <dbl>, visib_flag <chr>,
#   wind_chill <dbl>, wind_chill_flag <chr>, wind_dir <dbl>, wind_dir_flag <chr>, wind_spd <dbl>, wind_spd_flag <chr>, elev <dbl>,
#   climat_id <chr>, WMO_id <chr>, TC_id <chr>

Nested

w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
     tidyr::nest(-station_id)
w
# A tibble: 2 × 2
  station_id                data
       <dbl>              <list>
1      48568 <tibble [360 × 34]>
2      50309 <tibble [360 × 34]>

But as you point out, a simple TRUE/FALSE argument could be used right at the end of the weather function to apply the nest() function which could be a nice convenience. However, I would suggest that the default is FALSE, as it's a bit of an advanced functionality.

Is this the sort of thing you had in mind?

boshek · 2017-04-07T15:56:22Z

Agreed completely about which is a default. And your example is almost is. But more like this.:

> w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
+   tidyr::nest(-station_name,-station_id,-lat,-lon)
> 
> w
# A tibble: 2 × 5
    station_name station_id   lat    lon                data
           <chr>      <dbl> <dbl>  <dbl>              <list>
1    FREDERICTON      48568 45.87 -66.54 <tibble [360 × 31]>
2 MONCTON INTL A      50309 46.11 -64.68 <tibble [360 × 31]>

The thing I need to think about a little more is that we will need some conditionals depending on whether the data is hourly/daily etc. We also could also do it like this:

> w_day <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
+   tidyr::nest(-station_name,-station_id,-lat,-lon, -date)
> 
> w_day
# A tibble: 30 × 6
   station_name station_id   lat    lon       date               data
          <chr>      <dbl> <dbl>  <dbl>     <date>             <list>
1   FREDERICTON      48568 45.87 -66.54 2017-01-01 <tibble [24 × 30]>
2   FREDERICTON      48568 45.87 -66.54 2017-01-02 <tibble [24 × 30]>
3   FREDERICTON      48568 45.87 -66.54 2017-01-03 <tibble [24 × 30]>
4   FREDERICTON      48568 45.87 -66.54 2017-01-04 <tibble [24 × 30]>
5   FREDERICTON      48568 45.87 -66.54 2017-01-05 <tibble [24 × 30]>
6   FREDERICTON      48568 45.87 -66.54 2017-01-06 <tibble [24 × 30]>
7   FREDERICTON      48568 45.87 -66.54 2017-01-07 <tibble [24 × 30]>
8   FREDERICTON      48568 45.87 -66.54 2017-01-08 <tibble [24 × 30]>
9   FREDERICTON      48568 45.87 -66.54 2017-01-09 <tibble [24 × 30]>
10  FREDERICTON      48568 45.87 -66.54 2017-01-10 <tibble [24 × 30]>
# ... with 20 more rows

Which would help with manipulation. Devil is in the details here though I totally agree that default should still just be a normal dataframe - well tibble I guess.

steffilazerte · 2017-04-10T12:38:47Z

Yes those would be nice ways of keeping the data organized.

One option would be to set the default to 'date' for hourly or daily data and 'month' or 'year' for monthly data. We could even make the listcols argument specify the grouping structure with three potential options: listcols = "day", listcols = "month", listcols = "year" (defaulting to one level up from the interval argument, i.e. interval = "hour", listcols = "day" or interval = "day", listcols = "month", etc.)

We could add checks so prevent monthly data from being grouped at the daily level, but even if we didn't I don't think it would result in an error, just a list of single row data frames corresponding to the first of the month.

Lots of different options, I know what you mean, by having to think about it! Alternatively, if there isn't one really good default use case, we could simply add the options to the vignette/readme, to illustrate how people can organize their data if they wish.

boshek · 2017-04-10T21:19:03Z

This is a great idea. Do you think we should add this ahead or after a CRAN submission?

steffilazerte · 2017-04-10T21:31:56Z

Let's go ahead and add this before CRAN, I don't think it'll take too long, and I still have a couple of other things I need to figure out before the submission, anyway.

boshek · 2017-04-17T15:41:04Z

Addressed with pull request #14

boshek self-assigned this Apr 5, 2017

boshek closed this as completed Apr 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add listcol functionality #8

Add listcol functionality #8

boshek commented Apr 5, 2017

steffilazerte commented Apr 7, 2017

boshek commented Apr 7, 2017

steffilazerte commented Apr 10, 2017

boshek commented Apr 10, 2017

steffilazerte commented Apr 10, 2017

boshek commented Apr 17, 2017

Add listcol functionality #8

Add listcol functionality #8

Comments

boshek commented Apr 5, 2017

steffilazerte commented Apr 7, 2017

boshek commented Apr 7, 2017

steffilazerte commented Apr 10, 2017

boshek commented Apr 10, 2017

steffilazerte commented Apr 10, 2017

boshek commented Apr 17, 2017