Parquet: close zstd input stream early to avoid memory pressure #5681

bryanck · 2022-08-31T17:20:27Z

This PR adds a workaround for memory issues encountered when reading Parquet files compressed with zstd. During some load testing on Spark, we encountered various OOM kills when reading from zstd compressed tables. One suggested solution was to set the environment variable MALLOC_TRIM_THRESHOLD_ to something lower than default, like 8192. This helped in some cases but not all.

Upon further investigation, it appeared that buffers were accumulating...

Disabling the buffer pool resulted in finalizers accumulating instead...

The solution is the same being proposed in parquet-mr. The current version of Parquet will leave the decompress stream open. Instead of leaving it open, this PR changes the behavior to read the stream fully into a buffer and then close the stream, allowing native resources to be freed immediately rather than waiting for garbage collection, and the buffer to be returned to the pool for reuse.

MALLOC_TRIM_THRESHOLD_ should no longer be required to be lowered with this change. Anecdotally, this resulted in better performance (compared to setting MALLOC_TRIM_THRESHOLD_), but more testing would be needed to validate that.

Alternatively, we could wait for the Parquet PR to be merged, but this is a more targeted fix. Also we could add a flag of some sort if desired. Ideally we would backport this to 0.14.x.

Here's a viz of the heap dump with this change...

rdblue · 2022-08-31T18:20:22Z

parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java

        } else {
          optionsBuilder = ParquetReadOptions.builder();
+          optionsBuilder.withCodecFactory(new ParquetCodecFactory(new Configuration(), 0));


Is it safe to pass 0 for page size? How is that option used?

The page size isn't used for decompressors according to this comment. This is what is currently being set by default in the options builder.

I added comments to make that more clear

Idea: Maybe making this a named constant, like UNUSED_PARQUET_DECOMPRESSOR_PAGE_SIZE or something would be a good way of indicating that?

Given this solution is temporary, I don’t have a strong feeling either way, but I do like avoiding magic numbers.

Maybe a link to the upstream parquet-mr PR or an Iceberg issue related to zstd decompression would be more informative than just mentioning that page size is essentially ignored? The decompressor class itself might be sufficient documentation though.

kbendick · 2022-09-01T04:52:26Z

Alternatively, we could wait for the Parquet PR to be merged, but this is a more targeted fix.

I’m +1 to patching for Iceberg until the parquet-mr release is made available. One of the benefits of controlling the parquet version and writers is that we can do these things.

Thanks @bryanck! Will review this ASAP. Have been out pretty sick but occasionally I wake up and am able to review etc.

rdblue

Looks great to me. Thanks, @bryanck!

…he#5681)

…und for PARQUET-2160 ### What changes were proposed in this pull request? SPARK-41952 was raised for a while, but unfortunately, the Parquet community does not publish the patched version yet, as a workaround, we can fix the issue on the Spark side first. We encountered this memory issue when migrating data from parquet/snappy to parquet/zstd, Spark executors always occupy unreasonable off-heap memory and have a high risk of being killed by NM. See more discussions at apache/parquet-java#982 and apache/iceberg#5681 ### Why are the changes needed? The issue is fixed in the parquet community [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160), but the patched version is not available yet. ### Does this PR introduce _any_ user-facing change? Yes, it's bug fix. ### How was this patch tested? The existing UT should cover the correctness check, I also verified this patch by scanning a large parquet/zstd table. ``` spark-shell --executor-cores 4 --executor-memory 6g --conf spark.executor.memoryOverhead=2g ``` ``` spark.sql("select sum(hash(*)) from parquet_zstd_table ").show(false) ``` - before this patch All executors get killed by NM quickly. ``` ERROR YarnScheduler: Lost executor 1 on hadoop-xxxx.****.org: Container killed by YARN for exceeding physical memory limits. 8.2 GB of 8 GB physical memory used. Consider boosting spark.executor.memoryOverhead. ``` <img width="1872" alt="image" src="https://user-images.githubusercontent.com/26535726/220031678-e9060244-5586-4f0c-8fe7-55bb4e20a580.png"> - after this patch Query runs well, no executor gets killed. <img width="1881" alt="image" src="https://user-images.githubusercontent.com/26535726/220031917-4fe38c07-b38f-49c6-a982-2091a6c2a8ed.png"> Closes #40091 from pan3793/SPARK-41952. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Chao Sun <sunchao@apple.com>

…he#5681)

…und for PARQUET-2160 ### What changes were proposed in this pull request? SPARK-41952 was raised for a while, but unfortunately, the Parquet community does not publish the patched version yet, as a workaround, we can fix the issue on the Spark side first. We encountered this memory issue when migrating data from parquet/snappy to parquet/zstd, Spark executors always occupy unreasonable off-heap memory and have a high risk of being killed by NM. See more discussions at apache/parquet-java#982 and apache/iceberg#5681 ### Why are the changes needed? The issue is fixed in the parquet community [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160), but the patched version is not available yet. ### Does this PR introduce _any_ user-facing change? Yes, it's bug fix. ### How was this patch tested? The existing UT should cover the correctness check, I also verified this patch by scanning a large parquet/zstd table. ``` spark-shell --executor-cores 4 --executor-memory 6g --conf spark.executor.memoryOverhead=2g ``` ``` spark.sql("select sum(hash(*)) from parquet_zstd_table ").show(false) ``` - before this patch All executors get killed by NM quickly. ``` ERROR YarnScheduler: Lost executor 1 on hadoop-xxxx.****.org: Container killed by YARN for exceeding physical memory limits. 8.2 GB of 8 GB physical memory used. Consider boosting spark.executor.memoryOverhead. ``` <img width="1872" alt="image" src="https://user-images.githubusercontent.com/26535726/220031678-e9060244-5586-4f0c-8fe7-55bb4e20a580.png"> - after this patch Query runs well, no executor gets killed. <img width="1881" alt="image" src="https://user-images.githubusercontent.com/26535726/220031917-4fe38c07-b38f-49c6-a982-2091a6c2a8ed.png"> Closes apache#40091 from pan3793/SPARK-41952. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Chao Sun <sunchao@apple.com>

Parquet: close zstd input stream early to avoid memory pressure

9e649b4

github-actions bot added the parquet label Aug 31, 2022

rdblue reviewed Aug 31, 2022

View reviewed changes

added comments

81bf1b8

rdblue added this to the Iceberg 0.14.1 Release milestone Sep 1, 2022

rdblue approved these changes Sep 1, 2022

View reviewed changes

rdblue merged commit 1f3f707 into apache:master Sep 1, 2022

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

b761b68

…he#5681)

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

c6aae2a

…he#5681)

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

a81e355

…he#5681)

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

696fc42

…he#5681)

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

d74bde3

…he#5681)

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022

Parquet: Close zstd input stream early to avoid memory pressure (apac…

b1ce88b

…he#5681)

rdblue pushed a commit that referenced this pull request Sep 3, 2022

Parquet: Close zstd input stream early to avoid memory pressure (#5681)

8a296de

camper42 mentioned this pull request Nov 30, 2022

executor logs ton of INFO CodecPool: Got brand-new decompressor [.zstd] #6318

Closed

pan3793 mentioned this pull request Feb 20, 2023

[SPARK-41952][SQL] Fix Parquet zstd off-heap memory leak as a workaround for PARQUET-2160 apache/spark#40091

Closed

singhpk234 mentioned this pull request Apr 7, 2023

Parquet: Update parquet to 1.13.1 #7301

Merged

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Parquet: Close zstd input stream early to avoid memory pressure (apac…

565bb78

…he#5681)

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 10, 2023

Parquet: Close zstd input stream early to avoid memory pressure (apac…

175b9eb

…he#5681)

bryanck mentioned this pull request May 19, 2023

Parquet: Revert workaround for resource usage with zstd #7664

Closed

bryanck mentioned this pull request Jun 14, 2023

Parquet: Revert workaround for resource usage with zstd #7834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet: close zstd input stream early to avoid memory pressure #5681

Parquet: close zstd input stream early to avoid memory pressure #5681

bryanck commented Aug 31, 2022 •

edited

Loading

rdblue Aug 31, 2022

bryanck Aug 31, 2022

bryanck Aug 31, 2022

kbendick Sep 1, 2022 •

edited

Loading

kbendick commented Sep 1, 2022

rdblue left a comment

Parquet: close zstd input stream early to avoid memory pressure #5681

Parquet: close zstd input stream early to avoid memory pressure #5681

Conversation

bryanck commented Aug 31, 2022 • edited Loading

rdblue Aug 31, 2022

Choose a reason for hiding this comment

bryanck Aug 31, 2022

Choose a reason for hiding this comment

bryanck Aug 31, 2022

Choose a reason for hiding this comment

kbendick Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

kbendick commented Sep 1, 2022

rdblue left a comment

Choose a reason for hiding this comment

bryanck commented Aug 31, 2022 •

edited

Loading

kbendick Sep 1, 2022 •

edited

Loading