ExtendedIterable v2 #1

cb1kenobi · 2025-06-10T08:08:44Z

This ExtendedIterable is derived from the RangeIterable from the rocksdb-js project. The goals of this implementation were to maximize performance while maintaining code readability and extensibility. In addition, the implementation uses TypeScript with generics and comes with a complete test suite with 749 tests and 100% code coverage.

The ExtendedIterable can be initialized with any iterable-like type including arrays, strings, maps, sets, iterators, async iterators, iterable objects, generators, and async generators.

Supported methods include:

asArray
at()
concat()
drop()
every()
filter()
find()
flatMap()
map()
mapError()
reduce()
slice()
some()
take()
toArray()

These methods can be chained together and values are lazy evaluated as the consumer calls the next() method.

All methods that accept a callback support both sync and async callbacks. The control flow is sync until it detects a Promise in the iterable-like stream or callback result, then switches to async mode.

Errors are bubbled up to the caller. If in async mode, a rejected promise is returned. The mapError() method can be added to the chain to catch errors and allow iteration to continue.

…erplate

Ethan-Arrowood

looking excellent!

.github/workflows/pr.yaml

README.md

src/extended-iterable.ts

heskew · 2025-06-10T21:36:28Z

is there push and pull_requests running on each update?

this looks great btw 🎉

kriszyp

This seems to be fundamentally missing the entire functionality for calling return() (or throw()) on the source iterator. I added a couple tests in this branch https://github.com/HarperDB/extended-iterable/tree/test-return to verify this, but this seems to have been lost from the original version that handled this correctly. Am I missing something here?

I am also curious about the philosophy of using TypeScript (or enforcing types) for testing. Doesn't this limit the tests to type-safe JavaScript? We want this to work with regular JavaScript, and this seems like it is fundamentally limiting the breadth of the testing, when we actually want tests to be doing things that defy expectations (and defy type constraints) and verify that everything this works.

Anyway, there is certainly a lot of awesome stuff, great job with all the new methods!

README.md

cb1kenobi · 2025-06-11T03:59:28Z

This seems to be fundamentally missing the entire functionality for calling return() (or throw()) on the source iterator. I added a couple tests in this branch https://github.com/HarperDB/extended-iterable/tree/test-return to verify this, but this seems to have been lost from the original version that handled this correctly. Am I missing something here?

@kriszyp it wasn't my intention to lose support for return() and throw(). I know what I need to do. Should be easy to fix. FWIW, I see your branch, but no commits. Sit tight and I'll get things fixed up pronto.

I am also curious about the philosophy of using TypeScript (or enforcing types) for testing. Doesn't this limit the tests to type-safe JavaScript? We want this to work with regular JavaScript, and this seems like it is fundamentally limiting the breadth of the testing, when we actually want tests to be doing things that defy expectations (and defy type constraints) and verify that everything this works.

I don't believe this is anything to be concerned about. I have tests that pass in invalid data of which I needed to cast it as any. I have tested the transpiled ExtendedIterable in vanilla JS and it works as expected. Is there a specific scenario that you're worried about?

cb1kenobi · 2025-06-19T04:55:53Z

@kriszyp I've added full support for return() along with over 300 new tests. I've meticulously combed through every line and every practical test case I can think of. I've traced every flow to only touch promises if needed. I'm pretty sure every possible error handling scenario is covered. I've added support for sync and async callbacks. I've simplified the iterator implementation pattern to make it easy to add new iterators.

@Ethan-Arrowood I've removed the unnecessary transformer logic. It was an ill-designed AI suggestion that I wasn't smart enough at the time to recognize.

Ethan-Arrowood

Looking great! On Monday I'm going to check this out locally and see if all the type stuff makes sense. Type casting is generally a code smell for me. But it has its place so I'll confirm on a second pass.

Otherwise I appreciate the exhaustiveness of the tests.

This is definitely close to done!

README.md

Ethan-Arrowood · 2025-06-20T22:48:16Z

README.md

+* `callback`: `(value: T, index: number) => boolean | Promise<boolean>` - The
+  callback function to call for each result.
+
+##### Return value
+
+* `T | Promise<T> | undefined | Promise<undefined>` - The first item of the


Does the return type become a Promise if the callback parameter returns a Promise? If so I think you may want to consider overloaded types. Right now, this could be interpreted as "regardless if callback returns boolean or Promise<boolean>, the return type could be any one of T | Promise<T> | undefined | Promise<undefined>".

Or does it have to do with the items in the iterable already?

And thus, this type is actually correct and the user should always consider that an iteration could be a promise

This ExtendedIterable support sync and async iterators as well as sync and async callbacks for the few methods that accept a callback. That's the type I came up with that made TypeScript happy, but certainly open to doing it better.

I actually use function overloading for reduce(). I just played around with function overloading for find() and I can define a sync version, async version, but I still need the combo version to make TS happy like this:

find(callback: (value: T, index: number) => boolean): T | undefined; find(callback: (value: T, index: number) => Promise<boolean>): Promise<T | undefined>; find(callback: (value: T, index: number) => boolean | Promise<boolean>): T | Promise<T | undefined> | undefined {

I dunno how to make this situation better. The code is too flexible.

The general rule of thumb is you don't need to await the iterable if you know that both the iterator and the callback(s) are all sync. In rocksdb-js, both the iterator and callbacks are sync. I'm not sure where the requirement to support both sync and async things came from, but we have it now and it may come in handy.

Ethan-Arrowood · 2025-06-20T23:04:29Z

src/extended-iterable.ts

+		while (!result.done) {
+			if (currentIndex === index) {
+				if (iterator.return) {
+					return (iterator.return(result.value) as IteratorReturnResult<T>).value;


Seeing a type cast here seems like a code smell to me. I'll try pulling this locally another time and taking a closer look to understand it better. You don't have to change anything here necessarily, I trust you have a good reason for it.

Another TypeScript issues. iterator.return() can return T or Promise<T> depending if we're chained to a sync Iterator or AsyncIterator. It's impossible for an async iterator to reach this line as it branches above to the #asyncAt() flow. The problem is I couldn't find a way to convince TypeScript that result.value is indeed T. There's no way to detect if iterator is sync or async until you call next(), return(), or throw().

Consumers won't need to worry about this as for-of and for-await-of will get the appropriate iterator.

So it was either cast it as IteratorReturnResult<T> or capture the return value from return(), then test to see if it's a promise and then await it and return the value. I opted to avoid the bloat.

Ethan-Arrowood · 2025-06-20T23:16:14Z

src/iterators/base-iterator.ts

+		}
+	}
+
+	next(): IteratorResult<T> | Promise<IteratorResult<T>> | any {


I don't like the | any - i think that basically neutralizes the other types you've listed here. But I'll have to follow up after I run this locally and inspect the type system a bit closer

I concur. I need the any to make TypeScript happy because this is both an Iterator and AsyncIterator. JavaScript has no problem with this, but in TypeScript, Iterator and AsyncIterator conflict. You're not supposed to do both. The best solution I could come up with is any.

Ethan-Arrowood · 2025-06-20T23:17:42Z

src/iterators/drop-iterator.ts

+import { BaseIterator, DONE } from './base-iterator.js';
+
+export class DropIterator<T> extends BaseIterator<T> {
+	#count!: number;


why is !: it seems like you assign it definitely in the constructor

I do this in 3 places: drop(), take(), and in concat(). I move the assignment above the validation in both drop() and take().

I don't want to/cant't do that for concat(other) as I would need to make the property that holds the other iterable be undefined, which it will never be, then I'd have assert the iterable is valid every where when I know it will be post constructor.

src/types.ts

test/.asArray.test.ts

cb1kenobi · 2025-06-30T23:34:28Z

src/extended-iterable.ts

+	#iterator: (Iterator<T> | AsyncIterator<T>) & { async?: boolean };
+
+	constructor(iterator: IterableLike<T>) {
+		this.#iterator = resolveIterator(iterator);


We need to resolveIterator() to lazily create the iterator.

kriszyp · 2025-07-07T03:08:32Z

I looked through some of this PR. Some of the issues I found with what I have looked at so far:

iterable.map(...).map(...).mapError(...) doesn't seem to be tested and is not functioning correctly (not properly continuing with iteration)
.filter(...).mapError(...) doesn't seem to be tested and is not functioning correctly (is continuing with iteration when iterator should stop and cleanup).
I believe it is allowing an error thrown from next() in an iterator to continue iteration, when it should not (should immediately cleanup).
And as discussed earlier, this is prematurely creating an iterable before asked for an iterable (and unable to create multiple usable iterables, although it is worth noting that that is actually challenging to do with transaction management).

I think we have discussed this before, so sorry if I am rehashing this, but this does not have support for the SKIP constant for single step map/filtering like done here: https://github.com/HarperDB/harperdb/blob/main/resources/search.ts#L332-L340
Is your plan to modify the code here and break it up into a map and filter?

Generally this an increased amount of code to maintain. V1 accomplished everything we currently need with over 50% less code, AFAICT.

I have tests that pass in invalid data of which I needed to cast it as any

So currently if someone finds a bug and wants to submit a PR for a new unit test for JS their code that is failing, if their code is rejected by TS and lefthook's commit hooks, their commit will be rejected? Why this level of strictness when tests should be as broad? This seems kind of hostile to encouraging broader testing to me.

This PR is extremely difficult to review in terms of how the existing code was modified and updated to support new functionality. As noted, thre are regressions in functionality which are very difficult to understand and correlate with the underlying changes in the code. This PR conflates module refactors, JS -> TS conversions, behavioral/logical changes, workflow changes, directory structural changes, and more all in the same PR.
I think a good PR should:

Have a JIRA ticket describing it
- For significant refactors, document proposed design plans
Separate out code style refactors, behavior changes, and other changes into separate PRs that are properly ticketed and described

Anyway, I certainly understand and appreciate that you have put significant time and effort into these upgrades. I believe there is indeed significant value here and I have no intention of seeing any of this go to waste. However, this needs to properly planned (with appropriate time to review and test outside of unit tests) and this is not the right time to move forward with these changes, as we want to focus efforts on higher priorities.

I will go ahead and create the appropriate Jira tickets for this work, so we can properly track and plan these efforts.
These have been created as CORE-2807, CORE-2808, and CORE-2809 with appropriate timeline/priority.

cb1kenobi · 2025-07-07T16:16:39Z

@kriszyp thank you for finding time to review this PR!

iterable.map(...).map(...).mapError(...) doesn't seem to be tested and is not functioning correctly (not properly continuing with iteration)
.filter(...).mapError(...) doesn't seem to be tested and is not functioning correctly (is continuing with iteration when iterator should stop and cleanup).
I believe it is allowing an error thrown from next() in an iterator to continue iteration, when it should not (should immediately cleanup).

Correct, I do not have tests for all methods that return an iterable. These will need to be added.

Regarding the bug, the code was relying on the iterator's throw() to throw the error, but in the case where the generator catches the error, the map() method needs to check the result of the iterator's throw() and re-throw. I suspect other callback-based methods will have the same bug.

And as discussed earlier, this is prematurely creating an iterable before asked for an iterable (and unable to create multiple usable iterables, although it is worth noting that that is actually challenging to do with transaction management).

We agreed last week that the iterator should only be created when next() is first called. I believe this is an easy fix.

I think we have discussed this before, so sorry if I am rehashing this, but this does not have support for the SKIP constant for single step map/filtering like done here: HarperDB/harperdb@main/resources/search.ts#L332-L340
Is your plan to modify the code here and break it up into a map and filter?

It was my understanding that we were going to break this up into a map and filter in Harper.

Generally this an increased amount of code to maintain. V1 accomplished everything we currently need with over 50% less code, AFAICT.

I recognize there's a bit of code in v2. Much of it exists to handle both sync and async iterables along with error handling. My code also includes types, includes docs, and whitespace. I think there is opportunities to combine result/error handling and reduce the amount of code.

I have tests that pass in invalid data of which I needed to cast it as any
So currently if someone finds a bug and wants to submit a PR for a new unit test for JS their code that is failing, if their code is rejected by TS and lefthook's commit hooks, their commit will be rejected? Why this level of strictness when tests should be as broad? This seems kind of hostile to encouraging broader testing to me.

Casting parameters with invalid values as any in tests is common practice. Both source and tests must pass the linter and ts compiler, otherwise it's not valid code. I don't think there's a reason to allow invalid code to be merged. The lint rules do not forbid the use of any. Some of the tests explicitly define the iterable value's type because the test data is returned as any.

This PR is extremely difficult to review in terms of how the existing code was modified and updated to support new functionality. As noted, thre are regressions in functionality which are very difficult to understand and correlate with the underlying changes in the code. This PR conflates module refactors, JS -> TS conversions, behavioral/logical changes, workflow changes, directory structural changes, and more all in the same PR.
I think a good PR should:

Have a JIRA ticket describing it
For significant refactors, document proposed design plans
Separate out code style refactors, behavior changes, and other changes into separate PRs that are properly ticketed and described

I agree, but remember that the extended-iterable v2 code originated in the key ranges PR as apart of CORE-2646. We never had a ticket for extended-iterable.

Anyway, I certainly understand and appreciate that you have put significant time and effort into these upgrades. I believe there is indeed significant value here and I have no intention of seeing any of this go to waste. However, this needs to properly planned (with appropriate time to review and test outside of unit tests) and this is not the right time to move forward with these changes, as we want to focus efforts on higher priorities.

I will go ahead and create the appropriate Jira tickets for this work, so we can properly track and plan these efforts.
These have been created as CORE-2807, CORE-2808, and CORE-2809 with appropriate timeline/priority.

In the spirit of your previous paragraph, I strongly feel we should not have a single ticket CORE-2807 to add all the remaining methods and a single ticket CORE-2808 to add the tests. This is backwards. We should have a ticket for each method and it's tests.

You say you have no intention of seeing any of v2 go to waste, but I don't see a path forward with v2 especially given all of those tickets are a low priority.

kriszyp · 2025-07-08T03:45:18Z

Regarding the bug, the code was relying on the iterator's throw() to throw the error, but in the case where the generator catches the error, the map() method needs to check the result of the iterator's throw() and re-throw.

Hmm, I could be wrong about this, but I don't think this is quite right. From what I recall, the challenge here is that you have to determine if the transaction gets closed before propagating the error to the handlers (you can't make this decision when catching the upstream error), because there is no subsequent return() or anything to close. In trying to compare your code with the original code, I think you have kinda renamed continueOnRecoverableError to KEEP_ALIVE, but it is not done correctly because you need to transitively propagate the state up to the iterator (which is more doable with proper timing of the initialization of iterator). But with the respect to the bug with next errors, I also don't think the logic is properly capturing the distinction between recoverable and non-recoverable types of error. I think there are several critical states of error logic that don't seem to be preserved here or clear path for how they will be, AFAICT.

It was my understanding that we were going to break this up into a map and filter in Harper.

That's fine, but we do need to be cognizant of the extra work that is entailed in this so we can plan for it.

must pass the linter and ts compiler, otherwise it's not valid code

TypeScript says that iter.map('foo') is not valid code. Should we believe that?

In the spirit of your previous paragraph, I strongly feel we should not have a single ticket CORE-2807 to add all the remaining methods and a single ticket CORE-2808 to add the tests. This is backwards. We should have a ticket for each method and it's tests.

You say you have no intention of seeing any of v2 go to waste, but I don't see a path forward with v2 especially given all of those tickets are a low priority.

These tickets are marked as version 5.1, so they are scheduled, and 5.1 will happen. I changed these to medium priority.
But you think I should replace these with 7 separate Jira tickets for at(), drop(), every(), find(), reduce(), some(), and take()? I can certainly do that if you think that is helpful and more organized.

cb1kenobi · 2025-07-08T14:40:36Z

must pass the linter and ts compiler, otherwise it's not valid code

TypeScript says that iter.map('foo') is not valid code. Should we believe that?

In what scenario would iter.map('foo') be valid? map() accepts a single function argument. If it's not a function, it's not valid.

cb1kenobi added 18 commits June 4, 2025 03:02

Started adding async support, added unit tests, added a bunch of boil…

5c2adbb

…erplate

Finished the async work and added a bunch of tests

0b3e401

More test

345f27d

More code coverage

f79fe1d

Rename index

63e90c7

Lint

aa29f92

More tests

05f1d42

More code coverage

0dd954d

Flat map tests

144b004

Finished code coverage

787504c

Added for loop tests

9b88805

Update deps

e0e85fd

Add publish workflow, add more async callback support

1490a22

Finish async callbacks

fc3b8e0

Update deps, fix lint

a2005f8

Fix slack notification message

8326115

Update deps

9584728

Docs and cleanup

b1420ac

cb1kenobi requested a review from a team June 10, 2025 15:10

Ethan-Arrowood reviewed Jun 10, 2025

View reviewed changes

cb1kenobi added 3 commits June 10, 2025 13:46

Add Node.js 24 to the CI matrix

b458ace

Remove unnecessary variable assignments

b792201

Simplify GeneratorFunction definition

2aa5b36

cb1kenobi added 2 commits June 10, 2025 16:40

Adding dependabot

b3d1653

Remove push in ci

fa92cf7

heskew approved these changes Jun 10, 2025

View reviewed changes

kriszyp requested changes Jun 10, 2025

View reviewed changes

README.md Show resolved Hide resolved

cb1kenobi mentioned this pull request Jun 11, 2025

Add support for key ranges HarperFast/rocksdb-js#40

Merged

cb1kenobi requested a review from kriszyp June 19, 2025 04:42

Fix dist build

67fec52

cb1kenobi added 2 commits June 19, 2025 15:45

Fix mapError handling

a292ec3

Update deps

194d61f

Ethan-Arrowood reviewed Jun 20, 2025

View reviewed changes

cb1kenobi added 9 commits June 20, 2025 22:56

Remove type from invocation in readme

ea1f50b

Fixed readme formatting

4db024b

Readme formatting fix

75a76cf

Improve wording

7268ae4

Remove left over text

c2db7fd

Initialize take limit and drop count before validation

87769ed

Fix indentation

080dcfc

Rename tests

6ac0f83

Add back type

a22b7c1

cb1kenobi requested a review from a team June 26, 2025 14:22

cb1kenobi added 4 commits June 26, 2025 10:53

Update deps

eb90253

Remove unnecessary code from mapError and add toArray()

2dfb81d

toArray() docs

18c4f55

toArray() test

25dbd32

cb1kenobi commented Jun 30, 2025

View reviewed changes

cb1kenobi added 3 commits July 7, 2025 10:34

Add mapError() test for multiple map() calls in the chain

6828928

Fix

fffdf71

Undo fix

a01d7b2

Move error handling to base

3c15e2e

cb1kenobi marked this pull request as draft July 8, 2025 23:39

ExtendedIterable v2 #1

Are you sure you want to change the base?

ExtendedIterable v2 #1

Uh oh!

Conversation

cb1kenobi commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ethan-Arrowood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heskew commented Jun 10, 2025

Uh oh!

kriszyp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cb1kenobi commented Jun 11, 2025

Uh oh!

cb1kenobi commented Jun 19, 2025

Uh oh!

Ethan-Arrowood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kriszyp commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cb1kenobi commented Jul 7, 2025

Uh oh!

kriszyp commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cb1kenobi commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cb1kenobi commented Jun 10, 2025 •

edited

Loading

kriszyp left a comment •

edited

Loading

kriszyp commented Jul 7, 2025 •

edited

Loading

kriszyp commented Jul 8, 2025 •

edited

Loading