Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

Blog post: weathercan #156

Merged
merged 8 commits into from
Mar 6, 2018
Merged

Blog post: weathercan #156

merged 8 commits into from
Mar 6, 2018

Conversation

steffilazerte
Copy link
Contributor

@steffilazerte steffilazerte commented Mar 2, 2018

Hi @stefaniebutland,

Here's the blog post for weathercan. I cover 3 different ways of integrating data from weathercan with other sources along with a note about general things to remember and a note on reproducibility.

This might be a bit long. Sam (@boshek) suggested the interpolation section could be dropped. I still like that section, but would be fine to omit it if you think it doesn't suit or if the article should be shorter.

I put the date in as today (I couldn't preview it otherwise), so I'll change that at the end or when I addin the proper topicid.

@sckott
Copy link
Contributor

sckott commented Mar 2, 2018

👍 can see it here https://deploy-preview-156--ropensci.netlify.com/blog/2018/03/02/weathercan/

@stefaniebutland
Copy link
Collaborator

Thanks for the pull request @steffilazerte.

I will review this in depth and give you feedback before Monday (asap!). Your comments above are helpful for that. I'll confirm the publication date then - aiming for Tues Mar 6th since Charles' post is ready to go for the following week. fyi its preview is here: https://deploy-preview-152--ropensci.netlify.com/blog/2018/03/13/ode-to-testing/. Please do comment on that post if you think anything is unclear.

I add the topicid just before posting, so you can ignore that.

Copy link
Collaborator

@stefaniebutland stefaniebutland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffilazerte Thank you for taking the time to write this post and @boshek for reviewing a draft.

You can make the date 2018-03-06 (or 07 if you think you won't be finished till Wednesday). Don't forget to update the folder name of the images.

I've added some comments throughout. My main feedback is that you would benefit from having a clear description of what each example will do i.e. what is the question you can address by combining the data. Maybe use the headings to lure people in, so rather than "Linear interpolation" try to convey what you'll do with interpolation in the example (challenging, I know)

Happy to help further if needed. I will add the topicid before publishing the post.


This is one of the reasons why, when I designed [`weathercan`](http://github.com/ropensci/weathercan), I tried as hard as possible to make it simple and straightforward. `weathercan` is an R package designed to make it easy to access historical weather data from [Environment and Climate Change Canada (ECCC)](http://climate.weather.gc.ca/historical_data/search_historic_data_e.html). It downloads, combines, cleans, and transforms the data from multiple stations and across long time frames. So when you access ECCC data, you get everything in one dataset. Nifty, eh?

However, there is a certain point at which the user has to take matters into their own hands. Although downloading data with `weathercan` is fairly straight forward, weather data often needs to be integrated into other data sets. Depending on the other data this can be a tricky step. You may want to combine `weathercan` data with other types of measurements (e.g., river water samples on a specific day), or summarize and join it with data on other scales (e.g. temporal or spatial).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of sentences are similar enough in tone that you could omit one of them or revise to condense.

  • "Depending on the other data this can be a tricky step. "
  • "Integrating weathercan data can be straightforward, but often it’s a bit convoluted."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could combine and condense the paragraphs that start "However, there is a certain point ..." and "Integrating weathercan data can be straightforward,..."

library(weathercan)
```

2. Look at the built in `stations` data set to find the specific stations you're interested in (you can also use the `stations_search()` function). Here, we'll use the [`dplyr`](http://dplyr.tidyverse.org/) package (part of [`tidyverse`](https://www.tidyverse.org)) to `filter()` stations to only those in the province of Manitoba, at daily intervals, and which have an end date of at least 2018 (which likely means it's still operational). Note that we'll also be removing some columns (`prov`, `climate_id`, `WMO_id`, `TC_id`) just for clarity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our fault - notice in preview that the #2 in list doesn't align with top text of that list item. Sorry about that, it's on our to-fix list

A simple example
----------------

Where things get tricky are in the specific use cases. For example, what if you have multiple sites and multiple days for which you want a measure of temperature and precipitation?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could start this para like this: "What if you have multiple sites and multiple days for which you want a measure of temperature and precipitation?" and it sounds stronger without the "...things get tricky..."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start this section with a clear statement of what you will do / accomplish (i.e. you're going to combine weather and stream data from some stations near a site) in the example so reader doesn't get lost.


Where things get tricky are in the specific use cases. For example, what if you have multiple sites and multiple days for which you want a measure of temperature and precipitation?

We'll be using pipes (`%>%`) throughout our examples, so checkout the [chapter on Pipes in R for Data Science](http://r4ds.had.co.nz/pipes.html) if you need a refresher or introduction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkout -> 2 words


Where things get tricky are in the specific use cases. For example, what if you have multiple sites and multiple days for which you want a measure of temperature and precipitation?

We'll be using pipes (`%>%`) throughout our examples, so checkout the [chapter on Pipes in R for Data Science](http://r4ds.had.co.nz/pipes.html) if you need a refresher or introduction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line about pipes disrupts the explanation of the example. You could probably leave it out


- First filter to only include Jan 1st 2018
- Convert to spatial data using the lat/lon for each station
- Finally convert to the same CRS as the ecological data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is CRS?

Combining data in general
-------------------------

I hope these examples will help guide you in the many ways in which you can integrate `weathercan` data into other data sets. There are many different types of data to integrate, but generally, the same principals apply to merging `weathercan` data as to merging all data:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

principles


- Make sure your data is summarized to the appropriate level first (i.e. don't try to merge hourly data with yearly data)
- Make sure you join data by the correct columns (i.e. include your index columns as well as the appropriate time/date column)
- Often, you'll need to make an intermediate, `index`, data frame in which you link sites or observations to a `station_id`, this can then be used to link specific weather observations to your other measurements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is worded less generally


I would also highly recommend documenting the process by which you join your data. This makes it easy for you to keep track of what you've done and makes your work reproducible.

You can easily document this process by keeping an R script with all the coding steps and then using the "Knit" button in RStudio (or use [`Rmarkdown`](https://rmarkdown.rstudio.com) directly). If you like a more polished document, consider using `roxygen` comments `#'` rather than regular R comments `#` directly in your R script. `roxygen` comments allow you to write your comments in Markdown which is then converted to regular or marked up text in the html/pdf file. Your .R script would look something like this (including `devtools::session_info()` at the end spits out information on your version of R and any packages loaded):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to minimize use of words like "easily" since for beginners, something like that won't be easy. You could omit the word in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this technical paragraph at the end that is about general R stuff and not about weathercan takes the focus away. Better to consider omitting it.


I love working with R and have been sharing the love with my friends and colleagues for almost seven years now. I'm one of those really annoying people whose response to most analysis-related questions is "You can do that in R! Five minutes, tops!" or "Three lines of code, I swear!" The problem was that I invariably spent an hour or more showing people how to get the data, load the data, clean the data, transform the data, and join the data, before we could even start the "five minute analysis". With the advent of [`tidyverse`](https://www.tidyverse.org) data manipulation has gotten much, much easier, but I still find that data manipulation is where most new users get stuck.

This is one of the reasons why, when I designed [`weathercan`](http://github.com/ropensci/weathercan), I tried as hard as possible to make it simple and straightforward. `weathercan` is an R package designed to make it easy to access historical weather data from [Environment and Climate Change Canada (ECCC)](http://climate.weather.gc.ca/historical_data/search_historic_data_e.html). It downloads, combines, cleans, and transforms the data from multiple stations and across long time frames. So when you access ECCC data, you get everything in one dataset. Nifty, eh?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe start this para with " weathercan is an R package ..." and move the first sentence up to be the last sentence of previous para

- Fix typos
- Condense introduction
- Clarify origin of RFID feedr data
- Omit pipes
- Override numbering
- Have a more relevant title
- Be more explicit at the beginning
- Clarify the missing snow/temperature data
- Some minor reorganizing
- Have a more relevant title
- Be more explicit at the beginning
- Some reorganization
 - add clear description of the purpose
 - rename to be more relevant
 - clarify CRS
- Make it more general
- Omit reproducibility paragraph, but add as another general point
@steffilazerte
Copy link
Contributor Author

steffilazerte commented Mar 5, 2018

Thanks for your 🚀 feedback, @stefaniebutland. I've addressed your edits (specifics below). I've tried to make the section titles a little less bland, but please let me know if you have any suggestions! I've also made an effort to explain the context and examples before getting into them. Sometimes this resulted in more than just a line or two, though. Let me know if you think it's getting overly verbose.

More than happy to hear of any more suggestions you have!

  • Date changed to 2018-03-06 (including figs and fig folder)
  • Edits to Introduction
  • Override numbering
  • "A simple example"
    • add clear description of the purpose
    • rename to be more relevant ("Finding local weather")
    • clarify the missing snow/temperature data
  • "Interpolation"
    • add clear description of the purpose
    • rename to be more relevant ("Small temporal scales")
  • "Spatial scales"
    • add clear description of the purpose
    • rename to be more relevant ("Wide geographic scales")
    • clarify CRS
  • Mention Thompson University?
    Yes, I think it's important to recognize groups that contribute to open data. I've clarified the project to give better credit where it's due. (On a side note, I'm not sure how I missed that you were in Kamloops, I'd just assumed you were in Europe as you have the European spelling of 'Stefanie', but obviously I'm another example of why one shouldn't make that assumption 😀)
  • Omit pipes
  • Fix typos
  • Generalize the general section
  • Omit reproducibility paragraph (added as another general point instead)
    This section definitely got a little out of hand, so I've just omitted it. Would it be relevant for a potential blog article in the future? I really love using this technique for marking up .R files rather than creating .Rmd files, but I don't think it's talked about much (But see: https://deanattali.com/2015/03/24/knitrs-best-hidden-gem-spin/)

Copy link
Collaborator

@stefaniebutland stefaniebutland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffilazerte I really like the updates you made. I've added a few more comments that you are welcome to take or leave. I will publish the post and tweet from rOpenSci, tagging you, after you give your final permission to post.

Thank you again for all your work on this!

preserve_yaml: true
---

I love working with R and have been sharing the love with my friends and colleagues for almost seven years now. I'm one of those really annoying people whose response to most analysis-related questions is "You can do that in R! Five minutes, tops!" or "Three lines of code, I swear!" The problem was that I invariably spent an hour or more showing people how to get the data, load the data, clean the data, transform the data, and join the data, before we could even start the "five minute analysis". With the advent of [`tidyverse`](https://www.tidyverse.org) data manipulation has gotten much, much easier, but I still find that data manipulation is where most new users get stuck. This is one of the reasons why, when I designed [`weathercan`](http://github.com/ropensci/weathercan), I tried as hard as possible to make it simple and straightforward.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma after "tidyverse"


I love working with R and have been sharing the love with my friends and colleagues for almost seven years now. I'm one of those really annoying people whose response to most analysis-related questions is "You can do that in R! Five minutes, tops!" or "Three lines of code, I swear!" The problem was that I invariably spent an hour or more showing people how to get the data, load the data, clean the data, transform the data, and join the data, before we could even start the "five minute analysis". With the advent of [`tidyverse`](https://www.tidyverse.org) data manipulation has gotten much, much easier, but I still find that data manipulation is where most new users get stuck. This is one of the reasons why, when I designed [`weathercan`](http://github.com/ropensci/weathercan), I tried as hard as possible to make it simple and straightforward.

`weathercan` is an R package designed to make it easy to access historical weather data from [Environment and Climate Change Canada (ECCC)](http://climate.weather.gc.ca/historical_data/search_historic_data_e.html). It downloads, combines, cleans, and transforms the data from multiple stations and across long time frames. So when you access ECCC data, you get everything in one dataset. Nifty, eh?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This para is so clear now!! I like how the change reads.
I do think it would help to say "weather stations" here: "...data from multiple stations ..."


`weathercan` is an R package designed to make it easy to access historical weather data from [Environment and Climate Change Canada (ECCC)](http://climate.weather.gc.ca/historical_data/search_historic_data_e.html). It downloads, combines, cleans, and transforms the data from multiple stations and across long time frames. So when you access ECCC data, you get everything in one dataset. Nifty, eh?

However, there is a certain point at which the user has to take matters into their own hands. Although downloading data with `weathercan` is fairly straight forward, weather data often needs to be integrated into other data sets. You may want to combine `weathercan` data with other types of measurements (e.g., river water samples on a specific day), or summarize and join it with data on other scales (e.g. temporal or spatial). Depending on the other data this can be a tricky step. That's why I'm going to walk you through some different ways of integrating weather data from `weathercan` with other data sets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably omit this sentence: "However, there is a certain point at which the user has to take matters into their own hands." but I don't want to change your style so it's up to you

library(weathercan)
```

2) Look at the built in `stations` data set to find the specific stations you're interested in (you can also use the `stations_search()` function). Here, we'll use the [`dplyr`](http://dplyr.tidyverse.org/) package (part of [`tidyverse`](https://www.tidyverse.org)) to `filter()` stations to only those in the province of Manitoba, at daily intervals, and which have an end date of at least 2018 (which likely means it's still operational). Note that we'll also be removing some columns (`prov`, `climate_id`, `WMO_id`, `TC_id`) just for clarity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... at daily intervals, ..." add a word somewhere because it sounds like you're filtering stations at intervals but I think you're filtering weather data from stations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...end date of at least 2018..." can you omit "at least" there? or say 2018 or later? I understand this is 'cause someone will hopefully still be reading this post in 2019 :-)

Finding local weather
---------------------

A common scenario is when you have observations or measurements taken at several different sites across different dates and you want to match these to local weather data. Perhaps you want to control for changes in ambient temperature, or perhaps you're interested in how precipitation affects your measurements. Here, we'll go through an example of how to combine weather data with stream data measured from multiple nearby sites. In this example, by adding local temperature data to our data set we could then go on to explore the relationship between air and water temperature across sites.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A common scenario is when you have observations or measurements"
clarify using a couple of words - I know here you mean non-weathercan data but that's not exactly clear till later in the para where you talk about stream water temp data.

interval = "day", quiet = TRUE)
```

Finding local weather
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new headings and descriptions


A common scenario is when you have observations or measurements taken at several different sites across different dates and you want to match these to local weather data. Perhaps you want to control for changes in ambient temperature, or perhaps you're interested in how precipitation affects your measurements. Here, we'll go through an example of how to combine weather data with stream data measured from multiple nearby sites. In this example, by adding local temperature data to our data set we could then go on to explore the relationship between air and water temperature across sites.

Let's assume you have two sites and the following measurements made on specific dates:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say "stream sites" or something, again to clarify this is about the non-weathercan data?

## 1 EMERSON AUTO 48068 49.0 -97.2 242 day 2009 2018 30.4
## 2 GRETNA (AUT) 3605 49.0 -97.6 253 day 1885 2018 31.4

We have a selection of stations for each site that are all about the same distance away. Before we choose any we should make sure they have the data we're interested in.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice and clear!

<p>
Surprisingly Churchill, MB (the north-eastern, green area) was almost balmy compared to south-western Manitoba!

Combining data in general
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General tips for combining data - suggested alternate heading

Acknowledgements
----------------

Over the course of this `weathercan` journey I've had some valuable assistance. In particular, [Sam Albers](https://github.com/boshek) has been a wonderful contributor to `weathercan` on code as well as with advice and suggestions for how to take the package to the next level. rOpenSci Reviewers [Joe Thorley](https://github.com/joethorley) and [Charles T. Gray](https://github.com/softloud), and editor [Scott Chamberlain](https://github.com/sckott) supplied wonderful comments and suggestions that were greatly appreciated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider linking "advice and suggestions" to ropensci/software-review#160 so people can get a real taste of the process

@steffilazerte
Copy link
Contributor Author

steffilazerte commented Mar 6, 2018

I like all of your suggestions and have adopted them all, except:

  • I clarified the stations filtering, but you actually do filter stations by the type of data they have (added some edits to make this more clear)
  • In the Acknowledgements, I assume you meant to link it to the "comments and suggestions" as that was the review

Please feel free to publish and twitter it up when every you're ready :)

@stefaniebutland stefaniebutland merged commit e2ef27f into ropensci-archive:master Mar 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants