Skip to content

[SPARK-17631] [SQL] Add HttpStreamSink for structured streaming. #15197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zhangxinyu1
Copy link

@zhangxinyu1 zhangxinyu1 commented Sep 22, 2016

What changes were proposed in this pull request?

Add a class HttpStreamSink for structured streaming. This class extends StreamSinkProvider and DataSourceRegister. Streaming query results can be sinked to http server if we configure DataStreamWrite with .format("http").option("url", yourHttpUrl).
e.g.

val query = counts.writeStream
.outputMode("append")
.format("http")
.option("url", "yourHttpUrl")
.start()

How was this patch tested?

Use HttpStreamSinkSuite to test

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zhangxinyu1
Copy link
Author

@marmbrus
I propose this feature mainly because we require output streaming query results via http api. One of use cases is real-time alarm. It's essential to send alarm messages in real time via Http Api, when we use structured streaming to analyze logs and find some Exceptions in logs.
What do you think about it?

@zhangxinyu1 zhangxinyu1 changed the title [SPARK-17631] [SQL] Add HttpStreamSink for structured streaming. Streaming query results can be sinked to http server [SPARK-17631] [SQL] Add HttpStreamSink for structured streaming. Sep 22, 2016
@marmbrus
Copy link
Contributor

Thanks for working on this, it does seem like it could be useful. I'm not sure if this should go into Spark or into a separate package. It really depends on how many people want this feature.

Regardless, a few high level comments on this PR:

  • Check out the contributing to Spark guide. Patches need to have tests and follow the style guide.
  • I would not define a new HttpDataFormat interface. Instead I would mandate that the input is a single string column (similar to what we do for df.write.text). Users can use all of the existing DataFrame/Dataset operations to convert their data into a string.
  • It would be good to write up a short design on JIRA and debate there. A few things that I can think of off the top of my head:
    • should we support https too?
    • do we need to set any headers (i.e. maybe the batch id?)
  • We'd also need to add docs for this feature.

zhangxinyu1 added 2 commits September 26, 2016 18:54
@zhangxinyu1
Copy link
Author

zhangxinyu1 commented Sep 26, 2016

@marmbrus
Thanks for all your sugguestion! I have done the following things:

  • Write a short design on JIRA
  • I replace HttpDataFormat with the input which only has one string column. I like this idea very much! thanks again.
  • Complete HttpStreamSinkSuite to test. It's my first time to write this, could you please help me test it?

@maropu maropu mentioned this pull request Apr 23, 2017
maropu added a commit to maropu/spark that referenced this pull request Apr 23, 2017
@asfgit asfgit closed this in e9f9715 Apr 24, 2017
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues).

// Open PRs whose JIRA tickets have been already closed
Closes apache#11785
Closes apache#13027
Closes apache#13614
Closes apache#13761
Closes apache#15197
Closes apache#14006
Closes apache#12576
Closes apache#15447
Closes apache#13259
Closes apache#15616
Closes apache#14473
Closes apache#16638
Closes apache#16146
Closes apache#17269
Closes apache#17313
Closes apache#17418
Closes apache#17485
Closes apache#17551
Closes apache#17463
Closes apache#17625

// Open PRs whose JIRA tickets does not exist and they are not minor issues
Closes apache#10739
Closes apache#15193
Closes apache#15344
Closes apache#14804
Closes apache#16993
Closes apache#17040
Closes apache#15180
Closes apache#17238

N/A

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17734 from maropu/resolved_pr.

Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants