Skip to content

Conversation

@JoergBudi
Copy link
Contributor

…() get invoked, thus reducing call load on hasNext() drastically in nested chained iterator scenarios.

Before this change, the call-load on hasNext() grew exponentially with the number of nested IteratorChains.
@ppkarwasz: I think, the fix is indeed this easy, please check. It raises the question, if the fix for collections-722 is even needed anymore ... (the discussion in parallel on PR 628).

Thanks for your contribution to Apache Commons! Your help is appreciated!

Before you push a pull request, review this list:

  • Read the contribution guidelines for this project.
  • [-] Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • [-] I used AI to create any part of, or all of, this pull request.
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best-practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

…() get invoked, thus reducing call load on hasNext() dratically especially in nested chained iterator scenarios
Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JoergBudi,

Thanks for working on this! 💯

I think we could take the optimization a step further by caching both the result of hasNext() and the value returned by next() inside updateCurrentIterator, similarly to what FilterIterator already does:

/** The next object in the iteration. */
private E nextObject;
/** Whether the next object has been calculated yet. */
private boolean nextObjectSet;

The reason I originally thought caching the next element from the nested iterator wasn’t possible was because of this comment:

// set last used iterator here, in case the user calls remove
// before calling hasNext() or next() (although they shouldn't)
lastUsedIterator = currentIterator;

However, that comment doesn’t seem accurate, and in fact we may not even need lastUsedIterator.

Here’s what I propose:

  1. Remove lastUsedIterator and just track the iterator used in the last hasNext() or next() call (i.e., currentIterator).
  2. Add two fields: hasNextObject (Boolean) and nextObject (E) — to cache the upcoming element.
  3. Since updateCurrentIterator is protected but only modifies private fields (no observable effects), deprecate it and make it a no-op.
  4. Introduce a new setNextObject() method that handles all hasNext() and next() calls on the chained iterators, ensuring no duplicate hasNext() invocations (the current cause of the exponential performance hit). It could return true if an object was set.
  5. Keep remove() simple: check if next() has been called and throw IllegalStateException otherwise, following the same contract as, for example, ArrayList’s iterator.

This approach should solve the double hasNext() problem cleanly and bring the complexity down without making the code harder to maintain.

What do you think?

@JoergBudi
Copy link
Contributor Author

@ppkarwasz : Your suggestion with setNextObject() doesn't work, because you can't "prefetch" the next element without losing the possibility to properly invoke "remove()" down the chain.
However with the last commits, i think, there are now no double hasNext() calls anymore.
While working on this, I also discovered and fixed a bug with remove() (see the new Testcase IteratorChainTest#testHasNextIsInvokedOnEdgeBeforeRemove, which fails with 4.5.0 . I also followed your suggestion to directly throw an exeption in remove() when the contract is bluntly violated.
What do you think ? Any suggestions (e.g. for more testcases or so) ?

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JoergBudi,

Your suggestion with setNextObject() doesn't work, because you can't "prefetch" the next element without losing the possibility to properly invoke remove() down the chain.

You’re right: with prefetching, a sequence like next(), hasNext(), remove() would end up removing the wrong element from the nested iterator.

Your changes look good to me. Could you please add an entry to src/changes/changes.xml with dev="pkarwasz" (with one “p”) as the merging committer?

@ppkarwasz ppkarwasz merged commit 87408a0 into apache:master Aug 13, 2025
9 checks passed
@ppkarwasz
Copy link
Contributor

@JoergBudi, thanks! 🎉

@JoergBudi JoergBudi deleted the bugfix/collections-838-iterator-chain-cache branch August 14, 2025 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants