I want to be able to process/query big datasets reproducibly without needing to download to my local machine

## Concepts

1. Users can define a distributed processing environment like Spark, Dask, Trino, etc., and a command could run in that.
2. We could spin up a large machine for them, clone the project onto it, then run a stage or the whole pipeline there, commit, push, and they could pull down the result. This wouldn't be a great interactive workflow though. It i s probably a requirement that interactive tasks be able to be done to build the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to be able to process/query big datasets reproducibly without needing to download to my local machine #308

Concepts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I want to be able to process/query big datasets reproducibly without needing to download to my local machine #308

Description

Concepts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions