Skip to content

Latest commit

 

History

History
150 lines (78 loc) · 19.1 KB

2013-06-10-http-hell-no.md

File metadata and controls

150 lines (78 loc) · 19.1 KB
layout title categories tags published comments share
post
HTTP/REST is not big, nor healthy. It's hell no!
consume serve
software specification kitt rfc pegjs
true
true
true

To get the small things out of the way, the title is intentionally controversial (credits: Gabriel Iglesias). But beyond that, it only says that HTTP is more than what the average can handle - after all HTTP/REST is software design on the scale of decades.

Another thing to get out of the way - this is not another lame post to be summarized as "I make no attempt to satisfy a standard if it doesn't feel right.". If anything, it's just another lame post, pointing out some lack of context.

Now that I got your attention, I'll take it step by step - details and views.

Instead of assuming that people are dumb, ignorant, and making mistakes, assume they are smart, doing their best, and that you lack context.

— Nicholas C. Zakas (@slicknet) February 10, 2013
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

(NOT) Simple

When you hear that HTTP/REST is simple, there are high chances that you're reading something on Rorschach-REST or similar; the least you can do is be on your toes.

It is not simple. If it were, then people wouldn't argue about simplifying it, about levels, etc. That doesn't make it rocket science either, but that's another story.

(NOT) Easy/trivial

Just like there is complex and complicated, there is also simple and easy.

Despite HTTP being complex, not simple, HTTP seems rather trivial - after all, that is what people seek as convenience. Most people understand a HTTP method, a URI, a payload, a HTTP status code, etc. Some people understand safe methods, idempotent methods, HTTP headers, etc. Using or consuming HTTP is indeed trivial. Even scratching the surface of serving HTTP.

But implementing HTTP by the book is not.

How many know that a HTTP response can have more than one HTTP status code? Or that some responses, based on the status code, may or may not have a body, while other must or must not have a body?

How many OTW (one thing well) libraries have you seen out there to handle conneg, dates, etags, parsing and stringifying headers, etc?

(NOT) Clear

With so much specification around, most of it available for a couple of decades, one would think that every scenario is covered, and that there is no decision left to take, because everything has been thought-out by professionals. It is actually quite the opposite: there are plenty of moments when you'll go like WAT? or just go blank, and resign. This is what feeds the whole "it doesn't feel right" philosophy. If you think this cannot be possible, then please send me a link to a perfect HTTP client/server. AFAIK there is no implementation so far that aggregates and is compliant with all the relevant HTTP specifications.

In 2005, web services were hurting our heads, but in 2013, HTTP isn't any different, if you take it for its literal specifications. Sure, maybe you don't need all of the pieces to get started, but you will need more and more as you build an API along. It's either that, or you'll start reinventing the wheel and ignore the specifications altogether (e.g. ?_method=), and argue against them with your feelings.

.@darrel_miller @veesahni your "feel right" >>> "opinions [...] academic discussions [...] subjective [...] fuzzy" ?! buff.ly/112YTwo

— Andrei Neculau (@andreineculau) May 30, 2013
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

How do you treat duplicate query keys? There isn't any specification on that. Maybe there isn't any immediate need for that, but the option of letting things like this unspecified doesn't give any consistent gain either. On the contrary - it hurts interoperability.

Regular expressions always come in pair, but even so - why haven't we simplified emails by means of publishing a simple best practice document? Then we could all rant "you're not following doc X, one page long, with 10 lines implementation". Being too liberal, having to make too many choices, can either end up with the paradox of choice - no choice or poor choice. If the ad-hoc technical solution for validating emails became a regular expression, then what advantages does it bring to keep it so liberal that nobody gets it right? Some because of ignorance, they don't know what is right, some because they don't know how to do it right, and some that cannot argue in favor of doing right no matter how much they'd like that.

Same goes for URIs; a flashy example http://klarna.com = http://88.80.182.205 = http://1481684685 -> WAT?!

Simplify to amplify.

Solutions

Let me first rephrase and summarize the problem, and only then follow up with some solutions given the humble context that I have.

HTTP specification is so broad, so dispersed or hyperlinked if you will, and so loosely defined on the other hand, that we are constantly teaching ourselves and one another what HTTP is, and how it is supposed to be implemented or consumed. If you want an analogy - just think of a recipe that doesn't give you the steps, but just the ingredients. You'll follow grand chefs in making that recipe come alive, but you'll still not know how to make the recipe on your own - sometimes you will mess up the order, sometimes you will mess up the times, sometimes you won't have the necessary tools and you'll give up or improvise incorrectly, etc, etc.

The too-late solution

Things inevitably change. One can already notice some simplifications in the HTTP 2.0 draft (e.g. all header names are lowercase; message meta is all made of headers, special headers marked with a colon prefix replaced method, path, host, scheme, status code). Things like these need to be applauded, and I think there are more in the pipeline. Whatever is of no considerable gain should be stripped down. But by the time the majority upgrades from HTTP/1.1 to HTTP/2.0, and we will be safe to ignore HTTP/1.1 implementations, just like we eventually came to do about some browsers, many years will pass. What do we do until then?

The worst of it all is the possibility that we would repeat history, just under different names now, and still write too liberal RFCs and with too many open ends, and write code willy-nilly as if Internet standards have regular expressions, not ABNFs.

The by-the-book solution

We need to forget about HTTP methods, headers, status codes, resources, and the faster we do that, the better.

<iframe src="http://player.vimeo.com/video/45768176?title=0&byline=0&portrait=0" width="500" height="375" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>

Promote Knowledge Management (2) from Andrei Neculau on Vimeo.

And what is the best way to forget, but "not even knowing, in the first place"!

We can't have it all, and worse yet the desire to have it all and the illusion that we can is one of the principal sources of torture of modern affluent free and autonomous thinkers. Barry Schwartz #

My efforts go in this direction, as it is rather clear that HTTP APIs will grow in numbers, that I will contribute to that by both consuming and serving them, and that I really don't want to repeatedly implement, nor discuss the same stuff over and over again.

#httpdd

  1. Meet the HTTP decision diagram. Introduced several years ago by Alan Dean as http-headers-status and then built into Erlang, Ruby and Clojure, the diagram is an abstract way to think of a HTTP server implementation. The diagram allows for a binary-transition Finite State Machine to decide on the proper HTTP status code and required HTTP headers. With that in mind, it makes use of defined callbacks, some with boolean, some with mixed outputs to decide what is the next proper transition. You can watch Sean Cribbs at Øredev Conference speaking on how this is helpful and how it (v3; Alan Dean's version) is implemented in Webmachine (Erlang).

  2. Meet an implementation of the v4 diagram's FSM in NodeJS, under the name hyperrest-machine. An Erlang/Elixir implementation is coming.

  3. Meet a simple iteration of a HTTP server that is using the v4 diagram's FSM, under the name hyperrest-server.

  4. My plan is to do the same thing (HTTP decision diagram and implementation) for the HTTP client (most probably integrated into my hypermedia client) and the HTTP cache-proxy.

  5. Meet my first iteration of parsers & generators for what I call tokenized HTTP headers. When I started, I didn't know any better, so despite being smarter software than the average, it is still far from 100% compliance with the specification (I take the liberty, in spite of the usage of regular expressions, to label the weaknesses as being a victim of too liberal mumbo-jumbo; too harsh? then maybe you can have a look at a Set-Cookie header and the comma in the Expires parameter).

  6. Meet my second iteration of parsers & generators & not only. Again, I don't know any better. I'm using PEGjs for parsing, after translating a core piece of the HTTP ABNFs (why didn't I stick to ABNFs is another bed-time story, but not for tonight). This is really in an infant stage, with code that hasn't yet been pushed, need's testing.

Note One comment regarding testing. It's a major problem for my parsers. Why? Well, firstly I do mea culpa - because the reality is that all of this is being done in my spare time ~= no time, so I focus on what brings quick gain and blindly hope that I don't write such bad interfaces (what's behind, you can refactor). But secondly, and somewhat more importantly, it's because except uritemplate-test there is no other collection of test data for HTTP messages. Am I wrong? So, since there is none, my goal would be to write something simple along those guidelines. I don't just want to write tests for my code that can never be ported easily to another library or another language. HTTP is not about libraries, nor languages. But that will take time, and I'll take it without any trendy sense of urgency.

None of the above software is in a state that I consider production-ready, just to be clear. The philosophy backing it though is production-ready for years now, and for years to come - it's a safe bet for me, but feel free to challenge that in the comments.

I will follow up on these in the weeks to come, but it suffices to say for now that if you agree HTTP is not simple, nor easy, then join the KITT (~KISS) ride: create tools that abstract specifications into customizable tools with sensible defaults.

Otherwise, I wish you good luck in your neverending fight of teaching everyone everything about HTTP/REST.

The utopian solution

There is one more solution that I put forward, a solution that is not in my power.

Most specifications follow the rule that they only become standards once they have two independent implementations. I won't stop at asking myself how the implementations of HTTP and email looked like, but actually ask for something.. outrageous?!... which is that each specification comes from now on with a technical solution aside. We all (should) know that language doesn't matter as long as it is not pseudocode - we need software to compile, run and pass tests. If you only wrote ABNFs and words - no standard for you, mister! You write language-agnostic test data (XML, JSON, CSV, etc) and software (Fortran, Assembler, Prolog, Ruby, whatever) to match your ABNFs and words - I bow, master!

And that's not because programmers can't read or program, but because it simplifies and amplifies - it lets people focus on other matters at hand, that are closer to their domain. I'm lucky to be in a work environment which isn't fifty-fifty, but the global environment sounds more like ninety-ten, when it comes to skilled computer scientists, especially on mundane topics like APIs (yes, APIs are mundane, ask computer scientists, not brogrammers).

Repetition needs to be eliminated - you can't just go round with a microwave oven schematics, praying that everyone makes time to understand and find the resources to build one. It's not only time consuming, but at times, it's really impossible to argue. Have you ever heard of the Status header? Answer: "what do you have in mind?". And don't be fooled, it's not just the small guys. Heck, the next in line could even be me.

seriously, @github? --- curl -sI github.com | grep "Status:" --- #http

— Andrei Neculau (@andreineculau) May 18, 2013
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Above all, software and tests make for a great candidate of eating your own dog food.

You write a specification for URI syntax.

Oh, it has an ABNF specification? then write a validator and a parser.

Oh, it's about sending data? then write a generator.

Oh, you need to test your software? then write some language-agnostic test data.

Oh, some tests fail, then don't forget that ABNFs are not all of the specification; sometimes not strict enough, sometimes in need of context, sometimes just too complex corner-cases.

TL;DR

HTTP/REST is not as simple as some people tend to advertise it.

Put together, HTTP specifications and extensions go on tens, if not hundreds, of A4 papers. And HTTP/REST is not only complex, but it also requires you to change paradigms, to stop thinking about code, and to rethink the problem to fit a sound paradigm.

Software is the litmus test of each specification. No software? then stop writing poetry, expecting everybody else to have the one and the same interpretation and complaining that nobody understands you. Make the spec a black box that true professionals know how to use, export simple APIs. Make it usable. Delegate and be delegated. Do one thing well.

<iframe width="420" height="315" src="https://www.youtube-nocookie.com/embed/E_rwwEo5YhY?rel=0" frameborder="0" allowfullscreen></iframe>

#RESTful or #RESTfool or #RESTfoul or #RESTfail ? now pick. same, same, but different

— Andrei Neculau (@andreineculau) March 14, 2013
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Errata

  1. As Mike Amundsend pointed out on good grounds, this post munched HTTP and REST in one go. The two do NOT depend on one another. Theoretically and from a adding-noise perspective, this is bad, but I took the decision to go with the flow - 9 out of 10 posts that I read don't make the difference. They speak about RESTful APIs, and then they explain HTTP terms. The goal of this post is to touch on both worlds - both are fluffy implementation-wise. So until 1) there is a massive decoupling (in perception) between HTTP and REST (i.e. use a different protocol, but keep to a REST architectural style) and 2) both the protocol and the style have hands-on artifacts in line with author's vision - I might be wrong, dead wrong even, but I do not see a convincing point in highlighting the distinction, beyond an academic and an educational one. First let's focus on additions and substractions, and then integrals and primitives. There's a pretty long and widening road ahead.