TypeScript client for Apache Spark Connect
Issues
·
Contributing
·
Docs
Note
This project is in early development (v0.4.0) and is not recommended for production usage, but feedback is very welcome on GitHub.
spark-connect-js is a TypeScript client for Spark Connect, the thin client protocol introduced in Spark 3.4 and expanded in Spark 4.0. It provides a Spark-like DataFrame API with full TypeScript types.
Plans are built in TypeScript and executed on the server over gRPC. The core package has no runtime dependencies, so it can work with different runtimes. Right now there's a Node.js adapter using gRPC and Apache Arrow.
npm install @spark-connect-js/nodeimport { connect, col, sum, desc } from "@spark-connect-js/node";
const spark = connect("sc://localhost:15002");
const result = await spark
.table("employees")
.filter(col("age").gt(30))
.groupBy("dept")
.agg(sum("salary").alias("total"))
.sort(desc("total"))
.collect();
await spark.stop();Requires a running Spark Connect server (Spark 4.0+). The Node.js adapter requires Node >= 22.
| Package | Description |
|---|---|
@spark-connect-js/node |
Node.js runtime: gRPC transport + Arrow decoding |
@spark-connect-js/core |
DataFrame API and plan builder (platform-agnostic) |
@spark-connect-js/connect |
Generated protobuf types |
Full docs at prustic.github.io/spark-connect-js.
spark-connect-js is free and open source, licensed under Apache-2.0.
If you discover a security vulnerability, please see SECURITY.md for how to report it responsibly.