[Proposal] Streaming execution support roadmap

# [Proposal] Streaming execution support roadmap

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Adding streaming support to Datafusion and executing queries continuously on unbounded datasets is a frequent topic of discussion. Streaming is also an item [on the roadmap](https://arrow.apache.org/datafusion/contributor-guide/quarterly_roadmap.html), discussed in [Ballista #30](https://github.com/apache/arrow-ballista/issues/30), and a part of the general desiderata.

In the recent past, there have been some attempts and PoC implementations to explore how this could be done. Some examples are:

- [DataFusion Streams](https://github.com/datafusion-contrib/datafusion-streams)
- https://github.com/apache/arrow-datafusion/issues/1544

We would like to use this issue to coordinate a fresh re-think and a disciplined push toward achieving the streaming support goals and making progress on the roadmap.

**Describe the solution you'd like**
We have a proposal-stage roadmap that details how streaming support can be achieved as a sequence/collection of multiple tasks. [You can find our proposal here](https://synnada.notion.site/EPIC-Long-running-stateful-execution-support-for-unbounded-data-with-mini-batches-a416b29ae9a5438492663723dbeca805)

Within this proposal, you can find design discussions, code snippets, and individual task/issue descriptions paving the way for full support.

We have been experimenting with many different candidate approaches and worked on a few PoC implementations as we went through the design process. Still, this is a huge topic and we are sure there are certain subtleties, perspectives, and challenges we might have missed.

Looking forward to hearing the community’s thoughts on this proposal. Thanks!

**Describe alternatives you've considered**
We studied @hntd187 's valuable contributions on https://github.com/apache/arrow-datafusion/issues/1544 for Kafka provider.

**Additional context**
If this proposal is found to be a sensible path forward, we are happy to turn it into an epic GitHub issue and start tracking the progress through that. We are also happy to take on the implementation work of a significant number of the steps/tasks in this proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Streaming execution support roadmap #4285

[Proposal] Streaming execution support roadmap

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Streaming execution support roadmap #4285

Description

[Proposal] Streaming execution support roadmap

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions