Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing Async method with Java 8 CompletableFuture #208

Open
pprun opened this issue Nov 13, 2014 · 44 comments
Open

Providing Async method with Java 8 CompletableFuture #208

pprun opened this issue Nov 13, 2014 · 44 comments

Comments

@pprun
Copy link

pprun commented Nov 13, 2014

Spark API currently is very simple and blocking,
as Java 8 introduced the async style method by CompletableFuture,
we hope Spark be scale to vary large throughput application.

@aplatypus
Copy link

If you're interested we got pretty good turn-around with a simple async model. The essential structure is outlined here:

Response times really dropped. Unfortunately JAXB (XML) processing remained/s the real processor hog, especially when we timed both the data-comm and XML. HTTP GET-s & PUT-s will be a lot less intensive; it's a good fit.

@o0x2a
Copy link

o0x2a commented Nov 15, 2015

It would be great if Spark offer non-blocking APIs. 😍

@yeshodhan
Copy link

👍

@naivefun
Copy link

+1 for non blocking

@suzel
Copy link

suzel commented Dec 7, 2015

👍

6 similar comments
@vrcca
Copy link

vrcca commented Jan 7, 2016

+1

@EtienneK
Copy link

+1

@krrg
Copy link

krrg commented Jan 27, 2016

+1

@dirkharrington
Copy link

+1

@mgivney
Copy link

mgivney commented Mar 20, 2016

👍

@dinesh707
Copy link

+1

@ruurd
Copy link
Contributor

ruurd commented Mar 22, 2016

Actually, -1. Let spark be simple. And blocking.

Non-blocking introduces a lot of other complexities that would have to be handled also. I say NO.

@krrg
Copy link

krrg commented Mar 22, 2016

@ruurd In the spirit of an open discussion, could you expand on this? What complexities did you have in mind?

@ruurd
Copy link
Contributor

ruurd commented Mar 22, 2016

Lots of threading for example? How many requests would you want to handle simultaneously as a proces? What to do if you pass that threshold? What to do if you have passed it and now the number of simultaneous requests drops below the threshold? How can you simply and meaningfully configure this kind of stuff? Should the configuration be changeable on the fly? And and and...

Besides. If spark cannot process requests fast enough, it is simple enough to put it behind a load balancer.

@krrg
Copy link

krrg commented Mar 22, 2016

Non-blocking != Lots of Threading. Although threads are one way of implementing a non-blocking server, it is not the only way. See https://docs.oracle.com/javase/8/docs/api/java/nio/channels/Selector.html and http://tutorials.jenkov.com/java-nio/selectors.html#why-use-a-selector, for instance.

The point here is that you can multiplex many requests on a single thread. Obviously this raises different questions of implementation, but lots of threading doesn't have to be one of them.

@luolong
Copy link

luolong commented Mar 23, 2016

@ruurd: Actually, the blocking version causes more threading as the only way to scale blocking API is to feed it more threads, while non-blocking variant scales quite nicely with relatively low number of threads.

@ruurd
Copy link
Contributor

ruurd commented Mar 23, 2016

Nice trick, zeroing in on the trheading stuff :-) The main point is that nonblocking IO is going to make spark bigger and more difficult to configure. I think Spark is small, lightweight, easy to get running, short time to market, microservice. If your problem does not fit, find another tool.

@northlander
Copy link

+1

@yeshodhan
Copy link

@ruurd stfu!
+1, big time! again!

@ruurd
Copy link
Contributor

ruurd commented May 4, 2016

@yeshodhan stfu yourself!
-30000.

@o0x2a
Copy link

o0x2a commented May 5, 2016

@ruurd Using nio instead of io will not make things hard for you buddy, so just chill.
Use your time to read on the topic instead.

@ruurd
Copy link
Contributor

ruurd commented May 5, 2016

@Code-guru 1) I'm not your buddy 2) tell @yeshodhan to chill he is starting this and 3) if you really want to use something that entertains async and experience related difficulties, use node. Using an aynchronous IO paradigm will make spark harder to use, harder to maintain, harder to debug, will increase the number of failure modes it has to deal with and just plain does not fit in with what spark wants to be: easy, small, lightweight.

@tipsy
Copy link
Contributor

tipsy commented May 5, 2016

For the people who want this, just how large are you applications?

Making Spark async is not on the roadmap currently, mostly because of the reasons @ruurd just mentioned. We think that ease of use is the main selling point of Spark, so we're very wary of changing the current paradigm into something more complex.
We'll have a look at it for Spark 3, maybe we can find a way to make it extremely simple to use.

@krrg
Copy link

krrg commented May 5, 2016

My service was ~2000 lines of code, servicing about 100,000 HTTP requests a day, usually within a 12 hour window.

We ended up using Vertx, since it supported async, and had the words "Lightweight" "Easy" "Fast" and "Simple" on its homepage.

@tipsy
Copy link
Contributor

tipsy commented May 5, 2016

@krrg Thanks. Did you have performance issues with Spark, or was it a 'better safe than sorry' decision? Did you do a comparison test?

@LeifW
Copy link

LeifW commented May 5, 2016

A Scala version of this framework, Scalatra, added non-blocking IO support, using Servlet 3.0+. It's not in the core, but an add-on module.
Given current Spark syntax get("/hello", (req, res) -> "Hello World"), a version using Java 8 CompleteableFuture might return a CompletableFuture<String> instead of simply a String, e.g. get("/hello", (req, res) -> CompleteableFuture("Hello World")) or get("/hello", (req, res) -> someAsyncHttpRequestTo("http://google.com/?q=foo"))

In my opinion, an async version can be less work to configure, as I don't have to pick ahead of time a number of request threads in the servlet container pool (usually just runs on one thread per CPU core).

Some other JVM web frameworks supporting async: Finagle, Netty, JAX-RS, Scalatra, Servlets, Spray, etc...

@krrg
Copy link

krrg commented May 6, 2016

@tipsy It was more "better safe than sorry" approach. Unfortunately I don't have any performance results.

@ruurd
Copy link
Contributor

ruurd commented May 6, 2016

@krrg and if you are a vertx user what is you interest in turning spark into vertx? And @LeifW I think that Scalatra is a Scala version patterned after Sinatra.

@ruurd
Copy link
Contributor

ruurd commented May 6, 2016

@LeifW not having to pick the number of request threads introduces unexpected behavior in that case. What if you have to use your server for additional tasks? How are those tasks going to deal with a program that just hogs all CPUs because it feels like it? So instead of configuring Spark you will need to configure something else NOT to hog your CPU. I'm a big believer in convention over configuration but in this case it most probably will bite you in the proverbial behind the moment your service is being used outside of a development environment. Having to configure the number of requests threads forces you to plan ahead for the case where that number is insufficient.

@luolong
Copy link

luolong commented May 6, 2016

@ruurd I must admit I have not had any reason to configure it, but from what I understand, the underlying fork-join api that is backing the async servlet stuff, has some knobs for tuning the threading behavior.

As a consumer of the async API I really don't have to do anything too different. Basic async servlet examples make the difference clear and very easy:

  1. get AsyncContext from a request
  2. run your processing on a separate thread with the attached async context.
  3. Call asyncContext.complete() when done.

Servlet 3.0 api itself doesn't really impose any specific threading strategies on you.

Most of the very simple samples on the internet use simple thread executor to execute a long running task off the servlet request processing thread, which makes it very malleable to thread pool configuration and execution strategies.

As a Spark api surface area, I imagine that if I register a handler for an endpoint that returns a CompletableFuture instead of a plain result, that should be enough to signal that I really want it to be run asynchronously I imagine there's really no more complexity required.

@ruurd
Copy link
Contributor

ruurd commented May 9, 2016

@luolong OK the scenario I see before me is that you fork of a long running process then rip in no time flat through the handler and spend the rest of the time waiting for the result of the forked process to return the end result. Where did my gains go? And how long is the requestor waiting for a result?
There is only one scenario in which I can imagine that this could make sense at all: in the case that there is no one waiting for a result at short notice (most websites have an NFR that specifies 3 seconds max waiting time for all top level requests in the 99th percentile). Even long running processes are hampered by the fact that the browser will close the connection after a given amount of time. So max runtime would be what? 30 seconds?
I think that microservices should be engineered to yield a result in something in the order of 100 ms tops. And that it should be engineered to run only a single task per request. Anything long running should be handed down to a different proces over a bus as a fire-and-forget. Synchronous IO makes it much easier to measure performance, is easier from a development and testing perspective and the resulting services have a more deterministic behavior meaning that it is easier to derive how the service should be horizontally scaled. If you need to scale then use something like kong. That is specially made for managing microservices and allows you to keep microservices what they are: micro. simple. fast. synchronous :-)

@vietj
Copy link

vietj commented May 9, 2016

@ruurd indeed to fully benefit of non-blocking / async you need non blocking services as well, otherwise there is no real again and more complexity. In the scenario you describe async request / blocking service then you move the thread blocked from the IO layer to another thread (usually a worker pool). However your users could use a non blocking service like a Cassandra client. That being said to me the fundamental problem is that servlet technology is blocking by nature and the non blocking programming model provided by the servlet spec is not trivial (frameworks should make it easier).

Don't get me wrong I'm not pledging for supporting async in SparkJava, you are the boss, I'm just shedding some light on the benefits / drawback of async.

@luolong
Copy link

luolong commented May 9, 2016

Well, @ruurd you can certainly do as you like with this framework. It seems that you have thoroughly thought about this issue and decided against it. I might not share your views, but I do respect them.

The reason I was interested in async support in Spark was that my use case was intermediate service set up to translate web requests from an internal service API to an external services that had a very high probability of being slow. In addition, the internal API was heavily asynchronous.

Having async support in this situation was highly desirable. Anyway, that project is now long done and forgotten -- I ended up simply implementing bare bones Servlets and using the async support provided by Servlet spec instead.

@tipsy
Copy link
Contributor

tipsy commented May 9, 2016

Just to clear things up, @ruurd is not a Spark maintainer.

@vietj
Copy link

vietj commented May 9, 2016

@tipsy sorry for the misunderstanding, anyway I just gave my opinionated view whoever the boss is :-)

@kivan-mih
Copy link

+1 for async apis. But blocking apis should remain as well, engineer should choose between them.

@paulakimenko
Copy link

+1

@shenliuyang
Copy link

+1

@mj1618
Copy link

mj1618 commented Feb 12, 2018

PR submitted in this thread if anyone interested in reviewing (it's a bit of a spike and not merge-ready yet) #549

@vinyfalcao
Copy link

+1

2 similar comments
@ghost
Copy link

ghost commented Jul 29, 2018

+1

@foldik
Copy link

foldik commented Oct 4, 2018

+1

@kran
Copy link

kran commented Oct 12, 2018

-1 of course.

@tipsy tipsy pinned this issue Mar 16, 2019
@perwendel perwendel unpinned this issue Mar 22, 2019
@buckelieg
Copy link

As of my 5 cents:
Discussing about sync/async request handling - could it be reasonable to have both options if we talk about framework?
And leave the descision about what to use to developer?
Whenever one has to implement some feature it could be great to have technical capabilities to do the stuff without needing to leave the framework

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests