Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44246: [JS] Add a lightweight array view to Table #44247

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions js/perf/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,10 @@ for (const { name, table, counts } of config) {
table.toArray();
}),

b.add(`toArrayView, dataset: ${name}, numRows: ${formatNumber(table.numRows)}`, () => {
table.toArrayView();
}),

b.add(`get, dataset: ${name}, numRows: ${formatNumber(table.numRows)}`, () => {
for (let i = -1, n = table.numRows; ++i < n;) {
table.get(i);
Expand Down
4 changes: 3 additions & 1 deletion js/src/row/struct.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ export class StructRow<T extends TypeMap = any> {
constructor(parent: Data<Struct<T>>, rowIndex: number) {
this[kParent] = parent;
this[kRowIndex] = rowIndex;
return new Proxy(this, new StructRowProxyHandler());
return new Proxy(this, structRowProxyHandler);
}

public toArray() { return Object.values(this.toJSON()); }
Expand Down Expand Up @@ -157,3 +157,5 @@ class StructRowProxyHandler<T extends TypeMap = any> implements ProxyHandler<Str
return false;
}
}

const structRowProxyHandler = new StructRowProxyHandler();
91 changes: 89 additions & 2 deletions js/src/table.ts
Original file line number Diff line number Diff line change
Expand Up @@ -238,17 +238,30 @@ export class Table<T extends TypeMap = any> {
*
* @returns An Array of Table rows.
*/
public toArray() {
public toArray(): Array<Struct<T>['TValue']> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this improve the type returned by this method or just make the type explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just makes it explicit. I added it to know precisely what toArrayView() was supposed to return. And also because I like explicit types 😉

Copy link
Contributor

@trxcllnt trxcllnt Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike in other languages, explicitly annotating return types is somewhat of an anti-pattern in TypeScript. Unless there's a special case where we need to inform the compiler what the real return type is, I'm with @domoritz that we should default to letting the compiler infer the type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, no problem, I removed it in 4b706a9.

return [...this];
}

/**
* Return a JavaScript Array view of the Table rows.
*
* It is a lightweight read-only proxy that delegates to the table. Accessing elements has some
* overhead compared to the regular array returned by `toArray()` because of this indirection,
* but it avoids potentially large memory allocation.
*
* @returns An Array proxy to the Table rows.
*/
public toArrayView(): Array<Struct<T>['TValue']> {
return new Proxy([] as Array<Struct<T>['TValue']>, new TableArrayProxyHandler(this));
}

/**
* Returns a string representation of the Table rows.
*
* @returns A string representation of the Table rows.
*/
public toString() {
return `[\n ${this.toArray().join(',\n ')}\n]`;
return `[\n ${this.toArrayView().join(',\n ')}\n]`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are stringifying the whole table, is it worth using an array view here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an array (or array-like) to use join(). This was creating an array, and with toArrayView() this now just creates a proxy. Or did I misunderstand your comment?

Copy link
Member

@domoritz domoritz Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the difference but I wonder whether this change introduces a performance regression since "the proxy adds some overhead to direct array access."

I'm not against this change but want to make sure we have thought through the implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I don't think this is an issue here, as toString() is generally used for debugging purposes. Actually I'm wondering if it's practically usable for Table beyond tests as it can create huge strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some benchmarking, toString is a perfect candidate for using toArrayView(): the access overhead is actually the same as with toArray(). It is just spread while iterating while with toArrayView() while it's paid once upfront with toString. Since we're not reusing the array, after join, a view that doesn't allocate memory is a better choice.

}

/**
Expand Down Expand Up @@ -444,3 +457,77 @@ export function tableFromArrays<I extends Record<string | number | symbol, Typed
}
return new Table<T>(vecs);
}

class TableArrayProxyHandler<T extends TypeMap = any> implements ProxyHandler<Array<Struct<T>['TValue']>> {
table: Table<T>;

constructor(table: Table<T>) {
this.table = table;
}

// Traps that aren't implemented:
// - apply
// - construct
// - defineProperty
// - deleteProperty
// - getPrototypeOf
// - isExtensible
// - preventExtensions
// - set
// - setPrototypeOf

get(target: Array<Struct<T>['TValue']>, p: string | symbol, receiver: any): any {
if (typeof p === 'string') {
const i = Number(p);
if (Number.isInteger(i)) {
return this.table.get(i);
}
if (p === 'at') {
return (i: number): Struct<T>['TValue'] | undefined => {
return this.table.at(i) ?? undefined;
};
}
if (p === 'length') {
return this.table.numRows;
}
} else if (p === Symbol('keys')) {
const end = this.table.numRows;
return function * () {
let i = 0;
while(i < end) {
yield i++;
}
return;
};
}
return Reflect.get(target, p, receiver);
}

getOwnPropertyDescriptor(target: Array<Struct<T>['TValue']>, p: string | symbol): PropertyDescriptor | undefined {
if (typeof p === 'string') {
const i = Number(p);
if (Number.isInteger(i) && i >= 0 && i < this.table.numRows) {
return { enumerable: true, configurable: true };
}
}
return Reflect.getOwnPropertyDescriptor(target, p);
}

has(target: Array<Struct<T>['TValue']>, p: string | symbol): boolean {
if (typeof p === 'string') {
const i = Number(p);
if (Number.isInteger(i)) {
return i >= 0 && i < this.table.numRows;
}
}
return Reflect.has(target, p);
}

ownKeys(_target: Array<Struct<T>['TValue']>): ArrayLike<string | symbol> {
// Can be expensive as we allocate an array with all index numbers as strings
const keys = Array.from({length: this.table.numRows}, (_, i) => String(i));
keys.push('length');
return keys;
}
}

58 changes: 58 additions & 0 deletions js/test/unit/table/table-test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,61 @@ describe('tableFromJSON()', () => {
expect(table.getChild('c')!.type).toBeInstanceOf(Dictionary);
});
});

describe('table views', () => {
const table = tableFromJSON([{
a: 42,
b: true,
c: 'foo',
}, {
a: 12,
b: false,
c: 'bar',
}]);

function checkArrayValues(arr: Array<any>) {
expect(arr).toHaveLength(2);
expect(arr[0].a).toBe(42);
expect(arr[0].b).toBe(true);
expect(arr[0].c).toBe('foo');
expect(arr[1].a).toBe(12);
expect(arr[1].b).toBe(false);
expect(arr[1].c).toBe('bar');
}

function checkArray(arr: Array<any>) {
test('Wrapper', () => checkArrayValues(arr));

test('Iterator', () => {
const arr2 = [];
for (let item of arr) {
arr2.push(item);
}
checkArrayValues(arr);
});

test('Array index', () => {
const arr2 = new Array(arr.length);
for (let i = 0; i < arr2.length; i++) {
arr2[i] = arr[i];
}
checkArrayValues(arr2);
});

test('Keys', () => {
const arr2: any[] = new Array(arr.length);
let keys = Object.keys(arr);
for (let k in keys) {
arr2[k] = arr[k];
}
checkArrayValues(arr2);
});
}

describe('table.toArray()', () => {
checkArray(table.toArray());
});
describe('table.toArrayView()', () => {
checkArray(table.toArrayView());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we make this work?

Suggested change
checkArray(table.toArrayView());
checkArray(table);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion here. We cannot make Table behave like an array.

});
});
Loading