Skip to content

Commit 1decb5d

Browse files
mingkangli-dbilicmarkodb
authored andcommitted
[SPARK-48928] Log Warning for Calling .unpersist() on Locally Checkpointed RDDs
### What changes were proposed in this pull request? This pull request proposes logging a warning message when the `.unpersist()` method is called on RDDs that have been locally checkpointed. The goal is to inform users about the potential risks associated with unpersisting locally checkpointed RDDs without changing the current behavior of the method. ### Why are the changes needed? Local checkpointing truncates the lineage of an RDD, preventing it from being recomputed from its source. If a locally checkpointed RDD is unpersisted, it loses its data and cannot be regenerated, potentially leading to job failures if subsequent actions or transformations are attempted on the RDD (which was seen on some user workloads). Logging a warning message helps users avoid such pitfalls and aids in debugging. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a warning log message when .unpersist() is called on a locally checkpointed RDD, but it does not alter any existing behavior. ### How was this patch tested? This PR does not change any existing behavior and therefore no tests are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47391 from mingkangli-db/warning_unpersist. Authored-by: Mingkang Li <mingkang.li@databricks.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
1 parent 299fddd commit 1decb5d

File tree

1 file changed

+5
-0
lines changed
  • core/src/main/scala/org/apache/spark/rdd

1 file changed

+5
-0
lines changed

core/src/main/scala/org/apache/spark/rdd/RDD.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,11 @@ abstract class RDD[T: ClassTag](
211211
* @return This RDD.
212212
*/
213213
def unpersist(blocking: Boolean = false): this.type = {
214+
if (isLocallyCheckpointed) {
215+
// This means its lineage has been truncated and cannot be recomputed once unpersisted.
216+
logWarning(log"RDD ${MDC(RDD_ID, id)} was locally checkpointed, its lineage has been" +
217+
log" truncated and cannot be recomputed after unpersisting")
218+
}
214219
logInfo(log"Removing RDD ${MDC(RDD_ID, id)} from persistence list")
215220
sc.unpersistRDD(id, blocking)
216221
storageLevel = StorageLevel.NONE

0 commit comments

Comments
 (0)