Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyedit 1-4 #102

Merged
merged 9 commits into from
Jun 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,19 @@ url: 'https\://books.ropensci.org/http-testing/'

# Preamble

Are you working on a R package accessing resources on the web, be it a cat facts API, a scientific data source or your system for Customer relationship management?
As for all other packages, appropriate unit testing can make your code more robust.
Are you working on an R package accessing resources on the web, be it a cat facts API, a scientific data source or your system for Customer relationship management?
As with all other packages, appropriate unit testing can make your code more robust.
The unit testing of a package interacting with web resources, however, brings special challenges:
dependence of tests on a good internet connection, testing in the absence of authentication secrets, etc.
Having tests fail due to resources being down or slow, during development or on CRAN, means a time loss for everyone involved (slower development, messages from CRAN).
Although some packages accessing remote resources are well tested, there is a lack of resources around best practices.

This book is meant to be a free, central reference for developers of R packages accessing web resources, to help them have a faster and more robust development.
Our aim is to develop an useful guidance to go with the great recent tools that `{vcr}`, `{webmockr}`, `{httptest}`, `{httptest2}` and `{webfakes}` are.
Our aim is to develop a useful guide to go with the great recent tools `{vcr}`, `{webmockr}`, `{httptest}`, `{httptest2}` and `{webfakes}`.

We expect you to know [package development basics](https://r-pkgs.org/), and [git](https://happygitwithr.com/).

_Note related to previous versions: this book was intended as a detailed guide to using a particular suite of packages for HTTP mocking and testing in R code and/or packages, namely those maintained by Scott Chamberlain (`{crul}`, `{webmockr}`, `{vcr}`) but its scope has been extended to generalize the explanation of concepts to similar packages._
_Note related to previous versions: this book was intended as a detailed guide to using a particular suite of packages for HTTP mocking and testing in R code and/or packages, namely those maintained by Scott Chamberlain (`{crul}`, `{webmockr}`, `{vcr}`), but its scope has been extended to generalize the explanation of concepts to similar packages._

You can also read the [PDF version](/http-testing/main.pdf) or [epub version](/http-testing/main.epub) of this book.

Expand All @@ -36,6 +36,7 @@ _Thanks to contributors to the book:
[Christophe Dervieux](https://github.com/cderv),
[Daniel Possenriede](https://github.com/dpprdan),
[Hugo Gruson](https://github.com/Bisaloo),
[Jon Harmon](https://github.com/jonthegeek/),
[Lluís Revilla Sancho](https://github.com/llrs),
[Xavier A](https://github.com/xvrdm)._

Expand Down
43 changes: 21 additions & 22 deletions intro-general.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@

HTTP means HyperText Transport Protocol, but you were probably not just looking for a translation of the abbreviation.
HTTP is a way for you to exchange information with a remote server.
In your package, if information is going back and forth between the R session and internet, you are using some sort of HTTP tooling.
In your package, if information is going back and forth between the R session and the internet, you are using some sort of HTTP tooling.
Your package is making _requests_ and receives _responses_.

### HTTP requests

The HTTP request is what your package makes.
It has a method (are you fetching information via `GET`? are you sending information via `POST`?), different parts of an URL (domain, endpoint, query string), headers (containing e.g. your secret identifiers).
It can contain a body, for instance you might be sending data as JSON.
In that case one of the header will describe the content.
It has a method (are you fetching information via `GET`? are you sending information via `POST`?), different parts of a URL (domain, endpoint, query string), and headers (containing e.g. your secret identifiers).
It can contain a body. For instance, you might be sending data as JSON.
In that case one of the headers will describe the content.

How do you know what request to make from your package?
Hopefully you are interacting with a well documented web resource that will explain to you what methods are associated with what endpoints.
Expand All @@ -33,11 +33,11 @@ How do you get started with interacting with HTTP in R?

#### General HTTP resources

* [Mozilla Developer Network docs about HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP) (recommended in the zine mentioned thereafter)
* [Mozilla Developer Network docs about HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP) (recommended in the zine mentioned hereafter)
* (_not free_) [Julia Evans' Zine "HTTP: Learn your browser's language!"](https://wizardzines.com/zines/http/)
* The docs of the web API you are aiming to work with, and a search engine to understand the words that are new.

### HTTP with R
#### HTTP with R

* The docs of the R package you end up choosing!
* Digging into the source code of another package that does similar things.
Expand All @@ -49,20 +49,20 @@ In R, to interact with web resources, it is recommended to use `{curl}`; or its

Do not use RCurl, because it is not actively maintained!

When writing a package interacting with web resources, you will probably use either `{httr2}`, `{httr}` or `{crul}`.
When writing a package interacting with web resources, you will probably use `{httr2}`, `{httr}` or `{crul}`.

* httr is the most popular and oldest of the two, and supports OAuth.
httr docs feature a vignette called [Best practices for API packages](https://httr.r-lib.org/articles/api-packages.html)
* `{httr}` is the most popular and oldest of the two httr packages, and supports OAuth.
`{httr}` docs feature a vignette called [Best practices for API packages](https://httr.r-lib.org/articles/api-packages.html)

* httr2 _"is a ground-up rewrite of httr that provides a pipeable API with an explicit request object that solves more problems felt by packages that wrap APIs (e.g. built-in rate-limiting, retries, OAuth, secure secrets, and more)"_ so it might be a good idea to adopt it rather than httr for a new package. It has a vignette about [Wrapping APIs](https://httr2.r-lib.org/articles/wrapping-apis.html).
* `{httr2}` _"is a ground-up rewrite of httr that provides a pipeable API with an explicit request object that solves more problems felt by packages that wrap APIs (e.g. built-in rate-limiting, retries, OAuth, secure secrets, and more)"_ so it might be a good idea to adopt it rather than `{httr}` for a new package. It has a vignette about [Wrapping APIs](https://httr2.r-lib.org/articles/wrapping-apis.html).

* crul does not support OAuth but it uses an object-oriented interface, which you might like.
crul has a set of [clients, or ways to perform requests](https://docs.ropensci.org/crul/articles/choosing-a-client.html), that might be handy. crul also has a vignette about [API package best practices
* `{crul}` does not support OAuth but it uses an object-oriented interface, which you might like.
`{crul}` has a set of [clients, or ways to perform requests](https://docs.ropensci.org/crul/articles/choosing-a-client.html), that might be handy. `{crul}` also has a vignette about [API package best practices
](https://docs.ropensci.org/crul/articles/best-practices-api-packages.html).

Below we will try to programmatically access the [status of GitHub](https://www.githubstatus.com/api/#status), the open-source platfrom provided by the company of the same name.
We will access the same information with httr2 and crul.
If you decide for the low-level curl, feel free to contribute an example.
Below we will try to programmatically access the [status of GitHub](https://www.githubstatus.com/api/#status), the open-source platform provided by the company of the same name.
We will access the same information with `{httr2}` and `{crul}`
If you decide to try the low-level curl, feel free to contribute an example.
The internet has enough examples for httr.

```{r}
Expand All @@ -71,7 +71,7 @@ github_url <- "https://kctbh9vrtdwd.statuspage.io/api/v2/status.json"

The URL above leaves no doubt as to what format the data is provided in, JSON!

Let's first use httr2.
Let's first use `{httr2}`.

```{r}
library("magrittr")
Expand All @@ -81,7 +81,7 @@ response <- httr2::request(github_url) %>%
# Check the response status
httr2::resp_status(response)

# Or in a package you'd just write
# Or in a package you'd write
httr2::resp_check_status(response)

# Parse the content
Expand All @@ -91,27 +91,26 @@ httr2::resp_body_json(response)
httr2::resp_header(response, "content-type")
```

Now, the same with crul.
Now, the same with `{crul}`.

```{r}
# Create a client and get a response
client <- crul::HttpClient$new(github_url)
response <- client$get()


# Check the response status
response$status_http()

# Or in a package you'd just write
# Or in a package you'd write
response$raise_for_status()

# Parse the content
response$parse()
jsonlite::fromJSON(response$parse())
```

Hopefully these very short snippets give you an idea of what syntax to expect when choosing one of those packages.
Hopefully these very short snippets give you an idea of what syntax to expect when choosing one of these packages.

Note that the choice of a package will constrain the HTTP testing tools you can use.
However, the general ideas will remain the same.
You could switch your package backend from say crul to httr _without changing your tests_, if your tests do not test too many specifities of internals.
You could switch your package backend from, say, `{crul}` to `{httr}` _without changing your tests_, if your tests do not test too many specificities of internals.
26 changes: 13 additions & 13 deletions intro-graceful.Rmd
Original file line number Diff line number Diff line change
@@ -1,51 +1,51 @@
# Graceful HTTP R packages {#graceful}

Based on the previous chapter, your package interacting with a web resource has a dependency on `{curl}`, `{httr}`, `{httr2}` or `{crul}`. You have hopefully read the docs of the dependency you chose, including, in the case of httr, httr2 and crul, the vignette about best practice for HTTP packages. Now, in this chapter we want to give more tips aimed at making your HTTP R package graceful, part of which you'll learn more about in this very book!
Based on the previous chapter, your package interacting with a web resource has a dependency on `{curl}`, `{httr}`, `{httr2}` or `{crul}`. You have hopefully read the docs of the dependency you chose, including, in the case of `{httr}`, `{httr2}` and `{crul}`, the vignette about best practices for HTTP packages. Now, in this chapter we want to give more tips aimed at making your HTTP R package graceful, part of which you'll learn more about in this very book!

**Why** write a *graceful* HTTP R package? First of all, graceful is a nice adjective. 💃🕺Then, graceful is the adjective used in [CRAN repository policy](https://cran.r-project.org/web/packages/policies.html) *"Packages which use Internet resources should fail gracefully with an informative message if the resource is not available or has changed (and not give a check warning nor error)."* Therefore, let's review how to make your R package graceful from this day forward, in success and in failure.
**Why** write a *graceful* HTTP R package? First of all, graceful is a nice adjective. 💃🕺Second, graceful is the adjective used in [CRAN repository policy](https://cran.r-project.org/web/packages/policies.html) *"Packages which use Internet resources should fail gracefully with an informative message if the resource is not available or has changed (and not give a check warning nor error)."* Therefore, let's review how to make your R package graceful from this day forward, in success and in failure.

## Choose the HTTP resource wisely

First of all, your life and the life of your package's users will be easier if the web service you're wrapping is well maintained and well documented. When you have a choice, try not to rely on a fragile web service. Moreover, if you can, try to communicate with the API providers (telling them about your package; reporting feature requests and bug reports in their preferred way).

## User-facing grace (how your package actually works)

0. If you can, do not request the API every time the user asks for something but cache data instead. No API call, no API call failure! 😉 To remember answers within a session check out [memoise](<https://github.com/r-lib/memoise>). To remember answers across sessions, see approaches presented in the R-hub blog post ["Persistent config and data for R packages"](<https://blog.r-hub.io/2020/03/12/user-preferences/>). Caching behavior should be well documented for users, and there should probably be an expiration time for caches that's based on how often data is updated on the remote service.
0. If you can, do not request the API every time the user asks for something; cache data instead. No API call, no API call failure! 😉 To remember answers within a session check out [memoise](<https://github.com/r-lib/memoise>). To remember answers across sessions, see approaches presented in the R-hub blog post ["Persistent config and data for R packages"](<https://blog.r-hub.io/2020/03/12/user-preferences/>). Caching behavior should be well documented for users, and there should probably be an expiration time for caches that's based on how often data is updated on the remote service.

1. Try to send correct requests by knowing what the API expects and validating user inputs; at the correct rate.
* For instance, don't even try interacting with a web API requiring authentication if the user does not provide authentication information.
* For limiting rate i.e. not sending too many requests, automatically wait or, if the API docs allow you to define an ideal or maximal rate, set the request rate in advance using the [ratelimitr](https://github.com/tarakc02/ratelimitr) package, or for httr2 `httr2::req_throttle()`.
* For limiting rate (not sending too many requests), automatically wait. If the API docs allow you to define an ideal or maximal rate, set the request rate in advance using the [ratelimitr](https://github.com/tarakc02/ratelimitr) package (or, with `{httr2}`, `httr2::req_throttle()`).

2. If there's a status API i.e. a separate API indicating whether the web resource is up or down, use it. If it tells you the API is down, `stop()` (or `rlang::abort()`) with an informative error message.
2. If there's a status API (a separate API indicating whether the web resource is up or down), use it. If it tells you the API is down, `stop()` (or `rlang::abort()`) with an informative error message.

3. If the API indicates an error, depending on the actual error,

- If the *server* seems to be having issues, [re-try with an exponential back-off](<https://blog.r-hub.io/2020/04/07/retry-wheel/>). In httr2 there is `httr2::req_retry()`.
- If the *server* seems to be having issues, [re-try with an exponential back-off](<https://blog.r-hub.io/2020/04/07/retry-wheel/>). In `{httr2}` there is `httr2::req_retry()`.

- Otherwise, [transform the error into an useful error](https://httr2.r-lib.org/articles/wrapping-apis.html#error-handling-1).
- Otherwise, [transform the error into a useful error](https://httr2.r-lib.org/articles/wrapping-apis.html#error-handling-1).

- If you used retry and nothing was sent after the maximal number of retries, have an informative error message.
- If you used retry and nothing was sent after the maximal number of retries, show an informative error message.

That was it for aspects the user will care about. Now, what might be more problematic for your package's fate on CRAN are the automatic checks that happen there at submission and then [regularly](https://blog.r-hub.io/2019/04/25/r-devel-linux-x86-64-debian-clang/#cran-checks-101).

## Graceful vignettes and examples

4. [Pre-compute vignettes](https://blog.r-hub.io/2020/06/03/vignettes/#how-to-include-a-compute-intensive--authentication-dependent-vignette) in some way. Don't use them as tests, they are a showcase. Of course have a system to prevent them from going stale, maybe even simple reminders (potentially in the [unexported `release_questions()` function](https://devtools.r-lib.org/reference/release.html#details)). Don't let vignettes run on a system where a failure has bad consequences.
4. [Pre-compute vignettes](https://blog.r-hub.io/2020/06/03/vignettes/#how-to-include-a-compute-intensive--authentication-dependent-vignette) in some way. Don't use them as tests; they are a showcase. Of course have a system to prevent them from going stale, maybe even simple reminders (potentially in the [unexported `release_questions()` function](https://devtools.r-lib.org/reference/release.html#details)). Don't let vignettes run on a system where a failure has bad consequences.
5. Don't run [examples](https://blog.r-hub.io/2020/01/27/examples/) on CRAN. Now, for a first submission, CRAN maintainers might complain if there is no example. In that case, you might want to add some minimal example, e.g.

```r
if (crul::ok("some-url")) {
foo_bar() # some eg that uses some-url
my_fun() # some eg that uses some-url
}
```

These two precautions ensure that CRAN checks won't end with some WARNINGs e.g. because an example failed when the API was down.
These two precautions ensure that CRAN checks won't end with some WARNINGs, e.g. because an example failed when the API was down.

## Graceful code

For simplifying your own life and those of contributors, make sure to re-use code in your package by e.g. defining helper functions for making requests, handling responses etc.
It will make it easier for you to support interactions with more parts of the web API.
Writing DRY (don't repeat yourself) code means less lines of code to test, less API calls to make or fake!
Writing DRY (don't repeat yourself) code means less lines of code to test, and less API calls to make or fake!

Also, were you to export a function à la `gh::gh()`, you'll help users call any endpoint of the web API even if you haven't written any high-level helper for it yet.

Expand All @@ -54,7 +54,7 @@ Also, were you to export a function à la `gh::gh()`, you'll help users call any
We're getting closer to the actual topic of this book!

6. Read the rest of this book! Your tests should ideally run without needing an actual internet connection nor the API being up. Your tests that do need to interact with the API should be skipped on CRAN. `testthat::skip_on_cran()` will ensure that.
7. Do not only test success behavior! Test for the behavior of your package in case of API errors, which shall also be covered later in the book.
7. Do not only test "success" behavior! Test for the behavior of your package in case of API errors, which shall also be covered later in the book.

## Conclusion

Expand Down
Loading