diff --git a/http-web-services.html b/http-web-services.html index f07ac409..9652561b 100755 --- a/http-web-services.html +++ b/http-web-services.html @@ -54,7 +54,7 @@

Caching

-

HTTP is designed with caching in mind. There is an entire class of devices (called “caching proxies”) whose only job is to sit between you and the rest of the world and minimize network access. Your company or ISP almost certainly maintains caching proxies, even if you’re unaware of them. They work because caching built into the HTTP protocol. +

HTTP is designed with caching in mind. There is an entire class of devices (called “caching proxies”) whose only job is to sit between you and the rest of the world and minimize network access. Your company or ISP almost certainly maintains caching proxies, even if you’re unaware of them. They work because caching is built into the HTTP protocol.

Here’s a concrete example of how caching works. You visit diveintomark.org in your browser. That page includes a background image, wearehugh.com/m.jpg. When your browser downloads that image, the server includes the following HTTP headers: @@ -264,10 +264,10 @@

What’s On The Wire?

  • This response includes an ETag header.
  • The data is 3070 bytes long. Notice what isn’t here: a Content-encoding header. Your request stated that you only accept uncompressed data (Accept-encoding: identity), and sure enough, this response contains uncompressed data.
  • This response includes caching headers that state that this feed can be cached for up to 24 hours (86400 seconds). -
  • And finally, download the actual data by calling response.read(). As you can tell from the len() function, this downloads all 3070 bytes at once. +
  • And finally, download the actual data by calling response.read(). As you can tell from the len() function, this fetched a total of 3070 bytes. -

    As you can see, this code is already inefficient: it asked for (and received) uncompressed data. I know for a fact that this server supports gzip compression, but HTTP compression is opt-in. We didn’t ask for it, so we didn’t get it. That means we’re downloading 3070 bytes when we could have just downloaded 941. Bad dog, no biscuit. +

    As you can see, this code is already inefficient: it asked for (and received) uncompressed data. I know for a fact that this server supports gzip compression, but HTTP compression is opt-in. We didn’t ask for it, so we didn’t get it. That means we’re fetching 3070 bytes when we could have fetched 941. Bad dog, no biscuit.

    But wait, it gets worse! To see just how inefficient this code is, let’s request the same feed a second time. @@ -307,8 +307,8 @@

    What’s On The Wire?

    True
    1. The server is still sending the same array of “smart” headers: Cache-Control and Expires to allow caching, Last-Modified and ETag to enable “not-modified” tracking. Even the Vary: Accept-Encoding header hints that the server would support compression, if only you would ask for it. But you didn’t. -
    2. Once again, fetching this data downloads the whole 3070 bytes… -
    3. …the exact same 3070 bytes you downloaded last time. +
    4. Once again, this request fetches the whole 3070 bytes… +
    5. …the exact same 3070 bytes you got last time.

    HTTP is designed to work better than this. urllib speaks HTTP like I speak Spanish — enough to get by in a jam, but not enough to hold a conversation. HTTP is a conversation. It’s time to upgrade to a library that speaks HTTP fluently.