Flechette

Flechette is a JavaScript library for reading the Apache Arrow columnar in-memory data format. It provides a faster, lighter, zero-dependency alternative to the Arrow JS reference implementation.

Flechette performs fast extraction of data columns in the Arrow binary IPC format, supporting ingestion of Arrow data (from sources such as DuckDB) for downstream use in JavaScript data analysis tools like Arquero, Mosaic, Observable Plot, and Vega-Lite.

Why Flechette?

In the process of developing multiple data analysis packages that consume Arrow data (including Arquero, Mosaic, and Vega), we've had to develop workarounds for the performance and correctness of the Arrow JavaScript reference implementation. Instead of workarounds, Flechette addresses these issues head-on.

Speed. Flechette provides faster decoding. Across varied datasets, initial performance tests show 1.3-1.6x faster value iteration, 2-7x faster array extraction, and 5-9x faster row object extraction.
Size. Flechette is ~16k minified (~6k gzip'd), versus 163k minified (~43k gzip'd) for Arrow JS.
Coverage. Flechette supports multiple data types unsupported by the reference implementation at the time of writing, including decimal-to-number conversion and support for month/day/nanosecond time intervals (as used, for example, by DuckDB).
Flexibility. Flechette includes options to control data value conversion, such as numerical timestamps vs. Date objects for temporal data, and numbers vs. bigint values for 64-bit integer data.
Simplicity. Our goal is to provide a smaller, simpler code base in the hope that it will make it easier for ourselves and others to improve the library. If you'd like to see support for additional Arrow data types or features, please file an issue or open a pull request.

That said, no tool is without limitations or trade-offs. Flechette is consumption oriented: it does yet support encoding (though feel free to upvote encoding support!). Flechette also requires simpler inputs (byte buffers, no promises or streams), has less strict TypeScript typings, and at times has a slightly slower initial parse (as it decodes dictionary data upfront for faster downstream access).

What's with the name?

The project name stems from the French word fléchette, which means "little arrow" or "dart". 🎯

Examples

Load and Access Arrow Data

import { tableFromIPC } from '@uwdata/flechette';

const url = 'https://vega.github.io/vega-datasets/data/flights-200k.arrow';
const ipc = await fetch(url).then(r => r.arrayBuffer());
const table = tableFromIPC(ipc);

// print table size: (231083 x 3)
console.log(`${table.numRows} x ${table.numCols}`);

// inspect schema for column names, data types, etc.
// [
//   { name: "delay", type: { typeId: 2, bitWidth: 16, signed: true }, ...},
//   { name: "distance", type: { typeId: 2, bitWidth: 16, signed: true }, ...},
//   { name: "time", type: { typeId: 3, precision: 1 }, ...}
// ]
// typeId: 2 === Type.Int, typeId: 3 === Type.Float
console.log(JSON.stringify(table.schema.fields, 0, 2));

// convert a single Arrow column to a value array
// when possible, zero-copy access to binary data is used
const delay = table.getChild('delay').toArray();

// data columns are iterable
const time = [...table.getChild('time')];

// data columns provide random access
const time0 = table.getChild('time').at(0);

// extract all columns into a { name: array, ... } object
// { delay: Int16Array, distance: Int16Array, time: Float32Array }
const columns = table.toColumns();

// convert Arrow data to an array of standard JS objects
// [ { delay: 14, distance: 405, time: 0.01666666753590107 }, ... ]
const objects = table.toArray();

// create a new table with a selected subset of columns
// use this first to limit toColumns or toArray to fewer columns
const subtable = table.select(['delay', 'time']);

Customize Data Extraction

Data extraction can be customized using options provided to the table generation method. By default, temporal data is returned as numeric timestamps, 64-int integers are coerced to numbers, and map-typed data is returned as an array of [key, value] pairs. These defaults can be changed via conversion options that push (or remove) transformations to the underlying data batches.

const table = tableFromIPC(ipc, {
  useDate: true,   // map temporal data to Date objects
  useBigInt: true, // use BigInt, do not coerce to number
  useMap: true     // create Map objects for [key, value] pair lists
});

Build Instructions

To build and develop Flechette locally:

Clone https://github.com/uwdata/flechette.
Run npm i to install dependencies.
Run npm test to run test cases, npm run perf to run performance benchmarks, and npm run build to build output files.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
perf		perf
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
esbuild.js		esbuild.js
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flechette

Why Flechette?

What's with the name?

Examples

Load and Access Arrow Data

Customize Data Extraction

Build Instructions

About

Releases

Packages

Languages

License

manzt/flechette

Folders and files

Latest commit

History

Repository files navigation

Flechette

Why Flechette?

What's with the name?

Examples

Load and Access Arrow Data

Customize Data Extraction

Build Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages