Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Oct 13, 2023

What changes were proposed in this pull request?

This pr lower the default -Xmx of build/mvn from 4g to 3g to reduce the peak memory usage of Maven compilation.

Why are the changes needed?

This can potentially fix the snapshot build being failed: https://github.com/apache/spark/actions/runs/6502277099

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Manual check.

run

build/mvn clean install -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Pspark-ganglia-lgpl -Phadoop-cloud

Before

Peak memory usage is at 6.1GB.

After

Peak memory usage is at 5GB, but the compilation time has increased by 10%.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the BUILD label Oct 13, 2023
@LuciferYang LuciferYang changed the title Lower the default -Xmx of build/mvn to 3g. Lower the default -Xmx of build/mvn to 3g Oct 13, 2023
@LuciferYang
Copy link
Contributor Author

@HyukjinKwon Is there a way to manually trigger Publish snapshot using this PR, or do we have to wait until it's merged to get validation?

@LuciferYang LuciferYang changed the title Lower the default -Xmx of build/mvn to 3g [BUILD] Lower the default -Xmx of build/mvn to 3g Oct 13, 2023
@LuciferYang LuciferYang changed the title [BUILD] Lower the default -Xmx of build/mvn to 3g [SPARK-45536][BUILD] Lower the default -Xmx of build/mvn to 3g Oct 13, 2023
@HyukjinKwon
Copy link
Member

We can merge and try. I made another PR #43365 for a different apprach.

@EnricoMi
Copy link
Contributor

EnricoMi commented Oct 13, 2023

@HyukjinKwon Is there a way to manually trigger Publish snapshot using this PR, ...

You could if you'd add this to the publish_snapshot.yml workflow:

on:
  workflow_dispatch

https://docs.github.com/en/actions/using-workflows/manually-running-a-workflow

@beliefer
Copy link
Contributor

@LuciferYang I don't understand This can potentially fix the snapshot build being failed: https://github.com/apache/spark/actions/runs/6502277099. Could you give an explanation why you reduce the -Xmx?

@EnricoMi
Copy link
Contributor

EnricoMi commented Oct 13, 2023

The virtual machine (not the JVM but the host) building the releases has 7GB memory. The build process uses 6.1GB memory. They suspect the build process is killed because it uses too much memory.

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Oct 13, 2023

@LuciferYang I don't understand This can potentially fix the snapshot build being failed: https://github.com/apache/spark/actions/runs/6502277099. Could you give an explanation why you reduce the -Xmx?

It's just a guess, based on historical experience, the compilation container being killed might be due to memory overuse(Java 17 seems to use more metaspace during maven build.), but I indeed don't have concrete evidence for this case. Do you have any better suggestions? @beliefer

@HyukjinKwon
Copy link
Member

Let's just try. If it doesn't work we can revert

@LuciferYang
Copy link
Contributor Author

[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.kafka010.KafkaSourceStressSuite
[error] (sql-kafka-0-10 / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 2132 s (35:32), completed Oct 13, 2023, 12:07:56 PM

Only KafkaSourceStressSuite test failed, this is a known flaky test

@LuciferYang
Copy link
Contributor Author

Merge into master to observe the Publish Snapshot job, if it doesn't work, we can revert it tomorrow.

Thanks @HyukjinKwon @beliefer @EnricoMi

@dongjoon-hyun
Copy link
Member

Thank you, @LuciferYang and all.

Since Java 17 JVM GC is different than the old ParallelGC, we can optimize further.

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Oct 14, 2023

@dongjoon-hyun

https://github.com/apache/spark/actions/runs/6514229181/job/17696846279

It seems to still not work. Do you have any ideas or suggestions for optimizing the compilation options?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Oct 14, 2023

I tried to perform mvn deploy operation to the local nexus, and no failures occurred...

LuciferYang added a commit that referenced this pull request Oct 15, 2023
…to 3g"

This reverts commit 3e2470d.

### What changes were proposed in this pull request?
This pr revert change of #43364.

### Why are the changes needed?
It seems to have no effect on fixing `Publish snapshot`, it still failed
- https://github.com/apache/spark/actions/runs/6514229181/job/17696846279

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43372 from LuciferYang/revert-SPARK-45536.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang LuciferYang deleted the r-xmx-3g branch October 18, 2023 05:23
@EnricoMi
Copy link
Contributor

Another attempt to fix this in #43512 / SPARK-45651.

HyukjinKwon pushed a commit that referenced this pull request Oct 24, 2023
### What changes were proposed in this pull request?
With a manual trigger, the workflow can be executed manually after merging a fix of the workflow to master. This also allows to run the workflow only on a subset of branches (e.g. those that failed).

### Why are the changes needed?
Sometime, publishing snapshots fails. If a fix of the workflow file is needed, that change can only be tested by waiting for the next day when the cron even triggers the next publish. This is quite a long turnaround to test fixes to that workflow (see #43364).

### Does this PR introduce _any_ user-facing change?
No, this is purely build CI related.

### How was this patch tested?
This can only be tested in master. Github workflow syntax tested in a private repo.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43512 from EnricoMi/publish-snapshot-manually.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants