Skip to content

prustic/spark-connect-js

spark-connect-js

TypeScript client for Apache Spark Connect

Issues · Contributing · Docs

CI codecov npm version License

Note

This project is in early development (v0.4.0) and is not recommended for production usage, but feedback is very welcome on GitHub.

About

spark-connect-js is a TypeScript client for Spark Connect, the thin client protocol introduced in Spark 3.4 and expanded in Spark 4.0. It provides a Spark-like DataFrame API with full TypeScript types.

Plans are built in TypeScript and executed on the server over gRPC. The core package has no runtime dependencies, so it can work with different runtimes. Right now there's a Node.js adapter using gRPC and Apache Arrow.

Quick Start (Node.js)

npm install @spark-connect-js/node
import { connect, col, sum, desc } from "@spark-connect-js/node";

const spark = connect("sc://localhost:15002");

const result = await spark
  .table("employees")
  .filter(col("age").gt(30))
  .groupBy("dept")
  .agg(sum("salary").alias("total"))
  .sort(desc("total"))
  .collect();

await spark.stop();

Requires a running Spark Connect server (Spark 4.0+). The Node.js adapter requires Node >= 22.

Packages

Package Description
@spark-connect-js/node Node.js runtime: gRPC transport + Arrow decoding
@spark-connect-js/core DataFrame API and plan builder (platform-agnostic)
@spark-connect-js/connect Generated protobuf types

Documentation

Full docs at prustic.github.io/spark-connect-js.

Contribution

spark-connect-js is free and open source, licensed under Apache-2.0.

Security

If you discover a security vulnerability, please see SECURITY.md for how to report it responsibly.

Packages

 
 
 

Contributors

Languages