Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close #9402

Merged
merged 1 commit into from
Jul 1, 2024

Conversation

Heltman
Copy link
Contributor

@Heltman Heltman commented Jan 3, 2024

This patch is a part for fix #7843 , we will need to add BlockingParallelIterable to fix the whole problem.

@github-actions github-actions bot added the core label Jan 3, 2024
@Heltman
Copy link
Contributor Author

Heltman commented Jan 3, 2024

see #7844 for whole discuss

@Heltman
Copy link
Contributor Author

Heltman commented Jan 15, 2024

@findepi @electrum cc

@etra
Copy link

etra commented May 21, 2024

Any plans for when this will be released?

Thank you

@raunaqmorarka
Copy link

@Heltman could you please rebase your PR
cc: @nastra

@Heltman Heltman force-pushed the parallel_memory_leak_fix branch from 5e40a9e to 27f74d1 Compare June 28, 2024 06:38
@nastra
Copy link
Contributor

nastra commented Jun 28, 2024

Overall the change makes sense, just a few small comments to address

@Heltman Heltman force-pushed the parallel_memory_leak_fix branch from 27f74d1 to 52c19ae Compare June 28, 2024 07:37
@Heltman Heltman force-pushed the parallel_memory_leak_fix branch from 52c19ae to bb2888d Compare June 28, 2024 07:39
@Heltman
Copy link
Contributor Author

Heltman commented Jun 28, 2024

@nastra @raunaqmorarka Thanks for review and attention about this.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I verified that this fixes the underlying issue. I'll leave the PR open for a bit to give others a chance to review this as well

@nastra nastra added this to the Iceberg 1.6.0 milestone Jun 28, 2024
@amogh-jahagirdar amogh-jahagirdar changed the title Fix ParallelIterable memory leak because queue continues to be added even if iterator exited Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close Jul 1, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Heltman for fixing this leak, I also agree with this solution since the population of the queue can't really be "interrupted". I'll go ahead and merge.

@amogh-jahagirdar amogh-jahagirdar merged commit d3cb1b6 into apache:main Jul 1, 2024
49 checks passed
@amogh-jahagirdar
Copy link
Contributor

and thanks @nastra @Fokko @findepi for the reviews !

szehon-ho pushed a commit to szehon-ho/iceberg that referenced this pull request Sep 16, 2024
* Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close (apache#9402)

(cherry picked from commit d3cb1b6)

* Core: Limit ParallelIterable memory consumption by yielding in tasks (apache#10691)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for
processing input iterables. This defaults to 2 * # CPU cores.  When one
or some of the input iterables are considerable in size and the
ParallelIterable consumer is not quick enough, this could result in
unbounded allocation inside `ParallelIterator.queue`. This commit bounds
the queue. When queue is full, the tasks yield and get removed from the
executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)

* Drop ParallelIterable's queue low water mark (apache#10978)

As part of the change in commit
7831a8d, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.

(cherry picked from commit bcb3281)

---------

Co-authored-by: Helt <heltman@qq.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manifests reader queue in ParallelIterable is unlimited caused OOM
7 participants