Skip to content

Conversation

@cb1kenobi
Copy link
Contributor

@cb1kenobi cb1kenobi commented Jun 10, 2025

This ExtendedIterable is derived from the RangeIterable from the rocksdb-js project. The goals of this implementation were to maximize performance while maintaining code readability and extensibility. In addition, the implementation uses TypeScript with generics and comes with a complete test suite with 749 tests and 100% code coverage.

The ExtendedIterable can be initialized with any iterable-like type including arrays, strings, maps, sets, iterators, async iterators, iterable objects, generators, and async generators.

Supported methods include:

  • asArray
  • at()
  • concat()
  • drop()
  • every()
  • filter()
  • find()
  • flatMap()
  • map()
  • mapError()
  • reduce()
  • slice()
  • some()
  • take()
  • toArray()

These methods can be chained together and values are lazy evaluated as the consumer calls the next() method.

All methods that accept a callback support both sync and async callbacks. The control flow is sync until it detects a Promise in the iterable-like stream or callback result, then switches to async mode.

Errors are bubbled up to the caller. If in async mode, a rejected promise is returned. The mapError() method can be added to the chain to catch errors and allow iteration to continue.

@cb1kenobi cb1kenobi requested a review from a team June 10, 2025 15:10
Copy link

@Ethan-Arrowood Ethan-Arrowood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking excellent!

@heskew
Copy link

heskew commented Jun 10, 2025

is there push and pull_requests running on each update?

this looks great btw 🎉

Copy link
Member

@kriszyp kriszyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be fundamentally missing the entire functionality for calling return() (or throw()) on the source iterator. I added a couple tests in this branch https://github.com/HarperDB/extended-iterable/tree/test-return to verify this, but this seems to have been lost from the original version that handled this correctly. Am I missing something here?

I am also curious about the philosophy of using TypeScript (or enforcing types) for testing. Doesn't this limit the tests to type-safe JavaScript? We want this to work with regular JavaScript, and this seems like it is fundamentally limiting the breadth of the testing, when we actually want tests to be doing things that defy expectations (and defy type constraints) and verify that everything this works.

Anyway, there is certainly a lot of awesome stuff, great job with all the new methods!

@cb1kenobi
Copy link
Contributor Author

This seems to be fundamentally missing the entire functionality for calling return() (or throw()) on the source iterator. I added a couple tests in this branch https://github.com/HarperDB/extended-iterable/tree/test-return to verify this, but this seems to have been lost from the original version that handled this correctly. Am I missing something here?

@kriszyp it wasn't my intention to lose support for return() and throw(). I know what I need to do. Should be easy to fix. FWIW, I see your branch, but no commits. Sit tight and I'll get things fixed up pronto.

I am also curious about the philosophy of using TypeScript (or enforcing types) for testing. Doesn't this limit the tests to type-safe JavaScript? We want this to work with regular JavaScript, and this seems like it is fundamentally limiting the breadth of the testing, when we actually want tests to be doing things that defy expectations (and defy type constraints) and verify that everything this works.

I don't believe this is anything to be concerned about. I have tests that pass in invalid data of which I needed to cast it as any. I have tested the transpiled ExtendedIterable in vanilla JS and it works as expected. Is there a specific scenario that you're worried about?

@cb1kenobi cb1kenobi requested a review from kriszyp June 19, 2025 04:42
@cb1kenobi
Copy link
Contributor Author

@kriszyp I've added full support for return() along with over 300 new tests. I've meticulously combed through every line and every practical test case I can think of. I've traced every flow to only touch promises if needed. I'm pretty sure every possible error handling scenario is covered. I've added support for sync and async callbacks. I've simplified the iterator implementation pattern to make it easy to add new iterators.

@Ethan-Arrowood I've removed the unnecessary transformer logic. It was an ill-designed AI suggestion that I wasn't smart enough at the time to recognize.

Copy link

@Ethan-Arrowood Ethan-Arrowood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! On Monday I'm going to check this out locally and see if all the type stuff makes sense. Type casting is generally a code smell for me. But it has its place so I'll confirm on a second pass.

Otherwise I appreciate the exhaustiveness of the tests.

This is definitely close to done!

Comment on lines +296 to +301
* `callback`: `(value: T, index: number) => boolean | Promise<boolean>` - The
callback function to call for each result.

##### Return value

* `T | Promise<T> | undefined | Promise<undefined>` - The first item of the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the return type become a Promise if the callback parameter returns a Promise? If so I think you may want to consider overloaded types. Right now, this could be interpreted as "regardless if callback returns boolean or Promise<boolean>, the return type could be any one of T | Promise<T> | undefined | Promise<undefined>".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it have to do with the items in the iterable already?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thus, this type is actually correct and the user should always consider that an iteration could be a promise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ExtendedIterable support sync and async iterators as well as sync and async callbacks for the few methods that accept a callback. That's the type I came up with that made TypeScript happy, but certainly open to doing it better.

I actually use function overloading for reduce(). I just played around with function overloading for find() and I can define a sync version, async version, but I still need the combo version to make TS happy like this:

find(callback: (value: T, index: number) => boolean): T | undefined;
find(callback: (value: T, index: number) => Promise<boolean>): Promise<T | undefined>;
find(callback: (value: T, index: number) => boolean | Promise<boolean>): T | Promise<T | undefined> | undefined {

I dunno how to make this situation better. The code is too flexible.

The general rule of thumb is you don't need to await the iterable if you know that both the iterator and the callback(s) are all sync. In rocksdb-js, both the iterator and callbacks are sync. I'm not sure where the requirement to support both sync and async things came from, but we have it now and it may come in handy.

while (!result.done) {
if (currentIndex === index) {
if (iterator.return) {
return (iterator.return(result.value) as IteratorReturnResult<T>).value;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a type cast here seems like a code smell to me. I'll try pulling this locally another time and taking a closer look to understand it better. You don't have to change anything here necessarily, I trust you have a good reason for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another TypeScript issues. iterator.return() can return T or Promise<T> depending if we're chained to a sync Iterator or AsyncIterator. It's impossible for an async iterator to reach this line as it branches above to the #asyncAt() flow. The problem is I couldn't find a way to convince TypeScript that result.value is indeed T. There's no way to detect if iterator is sync or async until you call next(), return(), or throw().

Consumers won't need to worry about this as for-of and for-await-of will get the appropriate iterator.

So it was either cast it as IteratorReturnResult<T> or capture the return value from return(), then test to see if it's a promise and then await it and return the value. I opted to avoid the bloat.

}
}

next(): IteratorResult<T> | Promise<IteratorResult<T>> | any {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the | any - i think that basically neutralizes the other types you've listed here. But I'll have to follow up after I run this locally and inspect the type system a bit closer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. I need the any to make TypeScript happy because this is both an Iterator and AsyncIterator. JavaScript has no problem with this, but in TypeScript, Iterator and AsyncIterator conflict. You're not supposed to do both. The best solution I could come up with is any.

import { BaseIterator, DONE } from './base-iterator.js';

export class DropIterator<T> extends BaseIterator<T> {
#count!: number;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is !: it seems like you assign it definitely in the constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this in 3 places: drop(), take(), and in concat(). I move the assignment above the validation in both drop() and take().

I don't want to/cant't do that for concat(other) as I would need to make the property that holds the other iterable be undefined, which it will never be, then I'd have assert the iterable is valid every where when I know it will be post constructor.

@cb1kenobi cb1kenobi requested a review from a team June 26, 2025 14:22
#iterator: (Iterator<T> | AsyncIterator<T>) & { async?: boolean };

constructor(iterator: IterableLike<T>) {
this.#iterator = resolveIterator(iterator);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to resolveIterator() to lazily create the iterator.

@kriszyp
Copy link
Member

kriszyp commented Jul 7, 2025

I looked through some of this PR. Some of the issues I found with what I have looked at so far:

  • iterable.map(...).map(...).mapError(...) doesn't seem to be tested and is not functioning correctly (not properly continuing with iteration)
  • .filter(...).mapError(...) doesn't seem to be tested and is not functioning correctly (is continuing with iteration when iterator should stop and cleanup).
  • I believe it is allowing an error thrown from next() in an iterator to continue iteration, when it should not (should immediately cleanup).
  • And as discussed earlier, this is prematurely creating an iterable before asked for an iterable (and unable to create multiple usable iterables, although it is worth noting that that is actually challenging to do with transaction management).

I think we have discussed this before, so sorry if I am rehashing this, but this does not have support for the SKIP constant for single step map/filtering like done here: https://github.com/HarperDB/harperdb/blob/main/resources/search.ts#L332-L340
Is your plan to modify the code here and break it up into a map and filter?

Generally this an increased amount of code to maintain. V1 accomplished everything we currently need with over 50% less code, AFAICT.

I have tests that pass in invalid data of which I needed to cast it as any

So currently if someone finds a bug and wants to submit a PR for a new unit test for JS their code that is failing, if their code is rejected by TS and lefthook's commit hooks, their commit will be rejected? Why this level of strictness when tests should be as broad? This seems kind of hostile to encouraging broader testing to me.

This PR is extremely difficult to review in terms of how the existing code was modified and updated to support new functionality. As noted, thre are regressions in functionality which are very difficult to understand and correlate with the underlying changes in the code. This PR conflates module refactors, JS -> TS conversions, behavioral/logical changes, workflow changes, directory structural changes, and more all in the same PR.
I think a good PR should:

  • Have a JIRA ticket describing it
    • For significant refactors, document proposed design plans
  • Separate out code style refactors, behavior changes, and other changes into separate PRs that are properly ticketed and described

Anyway, I certainly understand and appreciate that you have put significant time and effort into these upgrades. I believe there is indeed significant value here and I have no intention of seeing any of this go to waste. However, this needs to properly planned (with appropriate time to review and test outside of unit tests) and this is not the right time to move forward with these changes, as we want to focus efforts on higher priorities.

I will go ahead and create the appropriate Jira tickets for this work, so we can properly track and plan these efforts.
These have been created as CORE-2807, CORE-2808, and CORE-2809 with appropriate timeline/priority.

@cb1kenobi
Copy link
Contributor Author

@kriszyp thank you for finding time to review this PR!

  • iterable.map(...).map(...).mapError(...) doesn't seem to be tested and is not functioning correctly (not properly continuing with iteration)
    .filter(...).mapError(...) doesn't seem to be tested and is not functioning correctly (is continuing with iteration when iterator should stop and cleanup).
    I believe it is allowing an error thrown from next() in an iterator to continue iteration, when it should not (should immediately cleanup).

Correct, I do not have tests for all methods that return an iterable. These will need to be added.

Regarding the bug, the code was relying on the iterator's throw() to throw the error, but in the case where the generator catches the error, the map() method needs to check the result of the iterator's throw() and re-throw. I suspect other callback-based methods will have the same bug.

And as discussed earlier, this is prematurely creating an iterable before asked for an iterable (and unable to create multiple usable iterables, although it is worth noting that that is actually challenging to do with transaction management).

We agreed last week that the iterator should only be created when next() is first called. I believe this is an easy fix.

I think we have discussed this before, so sorry if I am rehashing this, but this does not have support for the SKIP constant for single step map/filtering like done here: HarperDB/harperdb@main/resources/search.ts#L332-L340
Is your plan to modify the code here and break it up into a map and filter?

It was my understanding that we were going to break this up into a map and filter in Harper.

Generally this an increased amount of code to maintain. V1 accomplished everything we currently need with over 50% less code, AFAICT.

I recognize there's a bit of code in v2. Much of it exists to handle both sync and async iterables along with error handling. My code also includes types, includes docs, and whitespace. I think there is opportunities to combine result/error handling and reduce the amount of code.

I have tests that pass in invalid data of which I needed to cast it as any
So currently if someone finds a bug and wants to submit a PR for a new unit test for JS their code that is failing, if their code is rejected by TS and lefthook's commit hooks, their commit will be rejected? Why this level of strictness when tests should be as broad? This seems kind of hostile to encouraging broader testing to me.

Casting parameters with invalid values as any in tests is common practice. Both source and tests must pass the linter and ts compiler, otherwise it's not valid code. I don't think there's a reason to allow invalid code to be merged. The lint rules do not forbid the use of any. Some of the tests explicitly define the iterable value's type because the test data is returned as any.

This PR is extremely difficult to review in terms of how the existing code was modified and updated to support new functionality. As noted, thre are regressions in functionality which are very difficult to understand and correlate with the underlying changes in the code. This PR conflates module refactors, JS -> TS conversions, behavioral/logical changes, workflow changes, directory structural changes, and more all in the same PR.
I think a good PR should:

Have a JIRA ticket describing it
For significant refactors, document proposed design plans
Separate out code style refactors, behavior changes, and other changes into separate PRs that are properly ticketed and described

I agree, but remember that the extended-iterable v2 code originated in the key ranges PR as apart of CORE-2646. We never had a ticket for extended-iterable.

Anyway, I certainly understand and appreciate that you have put significant time and effort into these upgrades. I believe there is indeed significant value here and I have no intention of seeing any of this go to waste. However, this needs to properly planned (with appropriate time to review and test outside of unit tests) and this is not the right time to move forward with these changes, as we want to focus efforts on higher priorities.

I will go ahead and create the appropriate Jira tickets for this work, so we can properly track and plan these efforts.
These have been created as CORE-2807, CORE-2808, and CORE-2809 with appropriate timeline/priority.

In the spirit of your previous paragraph, I strongly feel we should not have a single ticket CORE-2807 to add all the remaining methods and a single ticket CORE-2808 to add the tests. This is backwards. We should have a ticket for each method and it's tests.

You say you have no intention of seeing any of v2 go to waste, but I don't see a path forward with v2 especially given all of those tickets are a low priority.

@kriszyp
Copy link
Member

kriszyp commented Jul 8, 2025

Regarding the bug, the code was relying on the iterator's throw() to throw the error, but in the case where the generator catches the error, the map() method needs to check the result of the iterator's throw() and re-throw.

Hmm, I could be wrong about this, but I don't think this is quite right. From what I recall, the challenge here is that you have to determine if the transaction gets closed before propagating the error to the handlers (you can't make this decision when catching the upstream error), because there is no subsequent return() or anything to close. In trying to compare your code with the original code, I think you have kinda renamed continueOnRecoverableError to KEEP_ALIVE, but it is not done correctly because you need to transitively propagate the state up to the iterator (which is more doable with proper timing of the initialization of iterator). But with the respect to the bug with next errors, I also don't think the logic is properly capturing the distinction between recoverable and non-recoverable types of error. I think there are several critical states of error logic that don't seem to be preserved here or clear path for how they will be, AFAICT.

It was my understanding that we were going to break this up into a map and filter in Harper.

That's fine, but we do need to be cognizant of the extra work that is entailed in this so we can plan for it.

must pass the linter and ts compiler, otherwise it's not valid code

TypeScript says that iter.map('foo') is not valid code. Should we believe that?

In the spirit of your previous paragraph, I strongly feel we should not have a single ticket CORE-2807 to add all the remaining methods and a single ticket CORE-2808 to add the tests. This is backwards. We should have a ticket for each method and it's tests.

You say you have no intention of seeing any of v2 go to waste, but I don't see a path forward with v2 especially given all of those tickets are a low priority.

These tickets are marked as version 5.1, so they are scheduled, and 5.1 will happen. I changed these to medium priority.
But you think I should replace these with 7 separate Jira tickets for at(), drop(), every(), find(), reduce(), some(), and take()? I can certainly do that if you think that is helpful and more organized.

@cb1kenobi
Copy link
Contributor Author

must pass the linter and ts compiler, otherwise it's not valid code

TypeScript says that iter.map('foo') is not valid code. Should we believe that?

In what scenario would iter.map('foo') be valid? map() accepts a single function argument. If it's not a function, it's not valid.

@cb1kenobi cb1kenobi marked this pull request as draft July 8, 2025 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants