[WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations #52651

pranavdev022 · 2025-10-17T20:57:49Z

What changes were proposed in this pull request?

This PR optimizes memory management for cached local relations when cloning Spark sessions by implementing reference counting instead of data replication.

Current behavior:

When a session is cloned, cached local relation data stored in the block manager is replicated.
Each clone creates a duplicate copy of the data with a new block ID.
This causes unnecessary memory pressure.

Proposed changes:

Implement reference counting for cached local relations during session cloning.
Retain the same block ID and data reference when cloning sessions, incrementing a ref count instead of copying
Add a hash-to-blockId mapping in ArtifactManager for efficient block lookup
Clean up blocks from block manager memory when ref count reaches zero

Why are the changes needed?

Cloning sessions is a common operation in Spark applications (e.g., for creating isolated execution contexts). The current approach of duplicating cached data can significantly increase memory footprint, especially when:

Sessions are cloned frequently
Cached relations contain large datasets
Multiple clones exist simultaneously

This optimization reduces memory pressure, improves performance by avoiding unnecessary data copies.

Does this PR introduce any user-facing change?

No. This is an internal optimization that improves memory efficiency without changing user-facing APIs or behavior.

How was this patch tested?

Added unit tests to verify the reference count logic functioning.
Existing unit tests for ArtifactManager and session cloning.

Was this patch authored or co-authored using generative AI tooling?

No

use ref-count logic instead of copy

bc18844

github-actions bot added SQL CONNECT labels Oct 17, 2025

Trigger CI

17a2e54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations #52651

[WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations #52651

pranavdev022 commented Oct 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations #52651

Are you sure you want to change the base?

[WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations #52651

Conversation

pranavdev022 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pranavdev022 commented Oct 17, 2025 •

edited

Loading