Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add listcol functionality #8

Closed
boshek opened this issue Apr 5, 2017 · 6 comments
Closed

Add listcol functionality #8

boshek opened this issue Apr 5, 2017 · 6 comments
Assignees

Comments

@boshek
Copy link
Collaborator

boshek commented Apr 5, 2017

Not for upcoming CRAN submission. I'm uncertain how to ultimately implement this as some user might not like it. Perhaps just as a listcol = TRUE argument in weather().

@boshek boshek self-assigned this Apr 5, 2017
@steffilazerte
Copy link
Member

How did you envision this looking like in the end? Would the user be returned a data frame with a column of station ids and a column with listed station data (as in below)?

We could either just show users how to do it themselves:

Unnested

w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15")
w

# A tibble: 720 × 35
   station_name station_id   prov   lat    lon       date                time  year month   day  hour  qual       weather  hmdx
*         <chr>      <dbl> <fctr> <dbl>  <dbl>     <date>              <dttm> <chr> <chr> <chr> <chr> <chr>         <chr> <dbl>
1   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 00:00:00  2017    01    01 00:00     ‡          Snow    NA
2   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 01:00:00  2017    01    01 01:00     ‡          Snow    NA
3   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 02:00:00  2017    01    01 02:00     ‡          Snow    NA
4   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 03:00:00  2017    01    01 03:00     ‡ Moderate Snow    NA
5   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 04:00:00  2017    01    01 04:00     ‡          Snow    NA
6   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 05:00:00  2017    01    01 05:00     ‡          Snow    NA
7   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 06:00:00  2017    01    01 06:00     ‡          Snow    NA
8   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 07:00:00  2017    01    01 07:00     ‡          Snow    NA
9   FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 08:00:00  2017    01    01 08:00     ‡          Snow    NA
10  FREDERICTON      48568     NA 45.87 -66.54 2017-01-01 2017-01-01 09:00:00  2017    01    01 09:00     ‡          Snow    NA
# ... with 710 more rows, and 21 more variables: hmdx_flag <chr>, pressure <dbl>, pressure_flag <chr>, rel_hum <dbl>,
#   rel_hum_flag <chr>, temp <dbl>, temp_dew <dbl>, temp_dew_flag <chr>, temp_flag <chr>, visib <dbl>, visib_flag <chr>,
#   wind_chill <dbl>, wind_chill_flag <chr>, wind_dir <dbl>, wind_dir_flag <chr>, wind_spd <dbl>, wind_spd_flag <chr>, elev <dbl>,
#   climat_id <chr>, WMO_id <chr>, TC_id <chr>

Nested

w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
     tidyr::nest(-station_id)
w
# A tibble: 2 × 2
  station_id                data
       <dbl>              <list>
1      48568 <tibble [360 × 34]>
2      50309 <tibble [360 × 34]>

But as you point out, a simple TRUE/FALSE argument could be used right at the end of the weather function to apply the nest() function which could be a nice convenience. However, I would suggest that the default is FALSE, as it's a bit of an advanced functionality.

Is this the sort of thing you had in mind?

@boshek
Copy link
Collaborator Author

boshek commented Apr 7, 2017

Agreed completely about which is a default. And your example is almost is. But more like this.:

> w <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
+   tidyr::nest(-station_name,-station_id,-lat,-lon)
> 
> w
# A tibble: 2 × 5
    station_name station_id   lat    lon                data
           <chr>      <dbl> <dbl>  <dbl>              <list>
1    FREDERICTON      48568 45.87 -66.54 <tibble [360 × 31]>
2 MONCTON INTL A      50309 46.11 -64.68 <tibble [360 × 31]>

The thing I need to think about a little more is that we will need some conditionals depending on whether the data is hourly/daily etc. We also could also do it like this:

> w_day <- weather(station_ids = c(48568, 50309), start = "2017-01-01", end = "2017-01-15") %>%
+   tidyr::nest(-station_name,-station_id,-lat,-lon, -date)
> 
> w_day
# A tibble: 30 × 6
   station_name station_id   lat    lon       date               data
          <chr>      <dbl> <dbl>  <dbl>     <date>             <list>
1   FREDERICTON      48568 45.87 -66.54 2017-01-01 <tibble [24 × 30]>
2   FREDERICTON      48568 45.87 -66.54 2017-01-02 <tibble [24 × 30]>
3   FREDERICTON      48568 45.87 -66.54 2017-01-03 <tibble [24 × 30]>
4   FREDERICTON      48568 45.87 -66.54 2017-01-04 <tibble [24 × 30]>
5   FREDERICTON      48568 45.87 -66.54 2017-01-05 <tibble [24 × 30]>
6   FREDERICTON      48568 45.87 -66.54 2017-01-06 <tibble [24 × 30]>
7   FREDERICTON      48568 45.87 -66.54 2017-01-07 <tibble [24 × 30]>
8   FREDERICTON      48568 45.87 -66.54 2017-01-08 <tibble [24 × 30]>
9   FREDERICTON      48568 45.87 -66.54 2017-01-09 <tibble [24 × 30]>
10  FREDERICTON      48568 45.87 -66.54 2017-01-10 <tibble [24 × 30]>
# ... with 20 more rows

Which would help with manipulation. Devil is in the details here though I totally agree that default should still just be a normal dataframe - well tibble I guess.

@steffilazerte
Copy link
Member

Yes those would be nice ways of keeping the data organized.

One option would be to set the default to 'date' for hourly or daily data and 'month' or 'year' for monthly data. We could even make the listcols argument specify the grouping structure with three potential options: listcols = "day", listcols = "month", listcols = "year" (defaulting to one level up from the interval argument, i.e. interval = "hour", listcols = "day" or interval = "day", listcols = "month", etc.)

We could add checks so prevent monthly data from being grouped at the daily level, but even if we didn't I don't think it would result in an error, just a list of single row data frames corresponding to the first of the month.

Lots of different options, I know what you mean, by having to think about it! Alternatively, if there isn't one really good default use case, we could simply add the options to the vignette/readme, to illustrate how people can organize their data if they wish.

@boshek
Copy link
Collaborator Author

boshek commented Apr 10, 2017

This is a great idea. Do you think we should add this ahead or after a CRAN submission?

@steffilazerte
Copy link
Member

Let's go ahead and add this before CRAN, I don't think it'll take too long, and I still have a couple of other things I need to figure out before the submission, anyway.

@boshek
Copy link
Collaborator Author

boshek commented Apr 17, 2017

Addressed with pull request #14

@boshek boshek closed this as completed Apr 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants