Skip to content

ByteBuddy use contextClassLoader#1087

Merged
richox merged 2 commits intoapache:masterfrom
XorSum:validate-injector-loader
Aug 4, 2025
Merged

ByteBuddy use contextClassLoader#1087
richox merged 2 commits intoapache:masterfrom
XorSum:validate-injector-loader

Conversation

@XorSum
Copy link
Contributor

@XorSum XorSum commented Aug 1, 2025

Which issue does this PR close?

Follow up the PR of #1047

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@wForget
Copy link
Member

wForget commented Aug 1, 2025

@XorSum
Copy link
Contributor Author

XorSum commented Aug 1, 2025

Does ForceApplyShuffledHashJoinInjector also have similar issue?

It seems to have the similar issue. Do you have any code that triggers forceApplyShuffledHashJoin to verify it?

@wForget
Copy link
Member

wForget commented Aug 1, 2025

It seems to have the similar issue. Do you have any code that triggers forceApplyShuffledHashJoin to verify it?

We can just add spark.blaze.forceShuffledHashJoin=true configuration to verify that ForceApplyShuffledHashJoinInjector.inject() is triggered correctly.

@XorSum
Copy link
Contributor Author

XorSum commented Aug 1, 2025

We can just add spark.blaze.forceShuffledHashJoin=true configuration to verify that ForceApplyShuffledHashJoinInjector.inject() is triggered correctly.

I tried running the following code, and it executed normally without any errors. I'm unable to determine if forceApplyShuffledHashJoin wasn't triggered, or if forceApplyShuffledHashJoin isn't affected by this issue.

/opt/app/spark/spark-3.5.5-bin-hadoop3/bin/spark-sql \
--conf "spark.master=local[4]" \
--conf spark.blaze.enable=true \
--conf spark.blaze.forceShuffledHashJoin=true \
--conf spark.memory.offHeap.enabled=false \
--conf spark.executor.memoryOverhead=2g \
--conf spark.shuffle.manager=org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager \
--conf spark.jars=./target/blaze-engine-spark-3.5-release-5.0.0-SNAPSHOT.jar \
--conf spark.sql.extensions=org.apache.spark.sql.blaze.BlazeSparkSessionExtension
spark-sql (default)> explain select * from t2 join t3 where t2.a = t3.a;
25/08/01 19:49:40 WARN NativeHelper: memory total: 1476395008, onheap: 954728448, offheap: 521666560
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [a#5], [a#6], Inner, BuildLeft, false
   :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [plan_id=24]
   :  +- Filter isnotnull(a#5)
   :     +- FileScan orc spark_catalog.default.t2[a#5] Batched: true, DataFilters: [isnotnull(a#5)], Format: ORC, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark_perf/warehouse/t2], PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: struct<a:int>
   +- Filter isnotnull(a#6)
      +- FileScan orc spark_catalog.default.t3[a#6] Batched: true, DataFilters: [isnotnull(a#6)], Format: ORC, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark_perf/warehouse/t3], PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: struct<a:int>

@wForget
Copy link
Member

wForget commented Aug 4, 2025

I tried running the following code, and it executed normally without any errors. I'm unable to determine if forceApplyShuffledHashJoin wasn't triggered, or if forceApplyShuffledHashJoin isn't affected by this issue.

Could you add spark.sql.autoBroadcastJoinThreshold=-1 to disable broadcast and try again?

@XorSum
Copy link
Contributor Author

XorSum commented Aug 4, 2025

Could you add spark.sql.autoBroadcastJoinThreshold=-1 to disable broadcast and try again?

Thanks! The reproduction was successful, so I also modified the classloader of ForceApplyShuffledHashJoinInjector.

org/apache/spark/sql/blaze/ForceApplyShuffledHashJoinInterceptor
java.lang.NoClassDefFoundError: org/apache/spark/sql/blaze/ForceApplyShuffledHashJoinInterceptor
        at org.apache.spark.sql.catalyst.optimizer.JoinSelectionHelper.forceApplyShuffledHashJoin(joins.scala)
        at org.apache.spark.sql.catalyst.optimizer.JoinSelectionHelper.getShuffleHashJoinBuildSide(joins.scala:326)
        at org.apache.spark.sql.catalyst.optimizer.JoinSelectionHelper.getShuffleHashJoinBuildSide$(joins.scala:313)
        at org.apache.spark.sql.execution.SparkStrategies$JoinSelection$.getShuffleHashJoinBuildSide(SparkStrategies.scala:172)
        at org.apache.spark.sql.execution.SparkStrategies$JoinSelection$.createShuffleHashJoin$1(SparkStrategies.scala:250)
        at org.apache.spark.sql.execution.SparkStrategies$JoinSelection$.$anonfun$apply$4(SparkStrategies.scala:286)
        at scala.Option.orElse(Option.scala:447)
        at org.apache.spark.sql.execution.SparkStrategies$JoinSelection$.createJoinWithoutHint$1(SparkStrategies.scala:286)
        at org.apache.spark.sql.execution.SparkStrategies$JoinSelection$.apply(SparkStrategies.scala:301)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)

Copy link
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@richox richox merged commit 70dd273 into apache:master Aug 4, 2025
1235 of 1237 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants