Skip to content

Investigate and potentially add support for spark connect #284

Closed
@razvan

Description

@razvan

Spark Connect

Spark 3.5 introduces a new client called Spark Connect.

The use case seems to be thin clients that connect to a running spark driver.

This probably means that the operator needs to be able to start spark connect servers without spark applications and publish a service for "connect" clients.

Roadmap

Rough roadmap to GA:

  • POC: can set up a spark-connect server with kubernetes as resource manager, basic integration test
  • minimal CRD: drop the stateful set, minimum configuration for the server (jvm props, logging)
    • server
      • deployment with one replica
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation with vector
      • pod overrides
      • resource requests
      • status and transition events
      • reconciliation operation (paused, stopped, etc)
    • executor
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation
      • resource requests
      • pod affinity
  • add preliminary documentation
  • expose Prometheus metrics
  • integrate with the history server See: doc: comment on spark history integration #559
  • integrate with the listener op
  • create a new demo

Related PRs

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Share

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions