-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Spilling Strategies #15173
Conversation
00ac981
to
5347917
Compare
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
I'll actually update this with PER_TASK_MEMORY_THRESHOLD as it's own class. Right now MemoryRevokingScheduler is doing too much conditional logic. |
5347917
to
3d0be00
Compare
Still finishing up integration tests, updating revocable memory at the task level without having an SqlTaskManager has been a bit of a challenge. Everything else is complete however. |
import static java.util.Objects.requireNonNull; | ||
import static java.util.concurrent.TimeUnit.SECONDS; | ||
|
||
public class TaskThresholdMemoryRevokingScheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if it's a dumb question. I just realized memory revoking scheduler (and MemoryRevokingScheduler
) doesn't implement any interface. Curious how are they get called? I assume it's related to the @PostConstruct
and @PreDestroy
annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup you got it. They run as a separate thread and are created at injection time. On creation (@PostConstruct
), we register a periodically executing method to check if revoking is necessary (based on whichever spilling strategy we are using). As for which scheduler is created, that is determined in ServerMainModule, using installModuleIf
. I was wondering too if it's worth having an interface but figured there wasn't enough reason to abstract it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sachdevs : I see. Is this a mechanism from Guice? (e.g. automatically call methods annotated with @PostConstruct
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javax annotation: https://docs.oracle.com/javaee/7/api/javax/annotation/PostConstruct.html.
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
0b33842
to
46ed059
Compare
public enum TaskSpillingStrategy | ||
{ | ||
ORDER_BY_CREATE_TIME, // When spilling is triggered, revoke tasks in order of oldest to newest | ||
ORDER_BY_REVOCABLE_BYTES, // When spilling is triggered, revoke tasks by most allocated revocable memory to least allocated revocable memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we treat join and aggregation spill the same way? The cost of spilling them is not uniform.
With join if you spill it then the entire probe side has to be spilled incurring a 3x IO cost.
With aggregation if you spill it will have to merge in one more run, but it can continue to use memory to avoid creating more sorted runs.
In practice whether this distinction matters often???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two levels of spilling strategy:
-
Prioritizing which operator to spill within a given task.
This would be easy to add on top of the existing implementation. Modifying theVoidTraversingQueryContextVisitor
to make a list of operators and then rank them by priority would accomplish this. This refers to how we choose to spill operators within a task. So far, these implementations do not try to distinguish between operators when choosing to spill which isn't ideal. I can look into adding this in as we see fit depending on how spilling works in production for our workload as we start rolling out soon. -
Prioritizing which task to spill in a list of currently running tasks.
As for distinguishing between revocable bytes allocated by join operator vs agg operator and using that to prioritize which tasks to spill, that would be a bit more work. Let's circle back on this if we see issues with this in during shadow.
46ed059
to
400f1fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments only. Can @wenleix also help to take a look?
presto-main/src/test/java/com/facebook/presto/execution/TestMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/execution/TestMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Show resolved
Hide resolved
...o-main/src/main/java/com/facebook/presto/execution/TaskThresholdMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
private final long maxRevocableMemoryPerTask; | ||
|
||
@Nullable | ||
private ScheduledFuture<?> scheduledFuture; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this variable is not thread-safe? Maybe have synchronized methods/blocks + "GuardedBy("this")"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's just copied from existing MemoryRevokingScheduler
😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on lines 95-98
if (scheduledFuture != null) { | ||
scheduledFuture.cancel(true); | ||
scheduledFuture = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class is injected as a SINGLETON + this part is only called on @PreDestroy
, this method won't be called until this class is destroyed (We never call stop()
manually). This won't happen until Presto shuts down? I can't imagine any other case in which we will destroy these schedulers. This method only exists to stop leaking this future for tests I'm guessing (since a similar method is also present in MemoryRevokingScheduler
). Let me know if we still need thread safety for this variable because of this context.
...o-main/src/main/java/com/facebook/presto/execution/TaskThresholdMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
...o-main/src/main/java/com/facebook/presto/execution/TaskThresholdMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
c56cc07
to
43a9b80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Publish some comments since there are changes to the commit
presto-main/src/main/java/com/facebook/presto/execution/MemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
private final long maxRevocableMemoryPerTask; | ||
|
||
@Nullable | ||
private ScheduledFuture<?> scheduledFuture; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's just copied from existing MemoryRevokingScheduler
😂
...o-main/src/main/java/com/facebook/presto/execution/TaskThresholdMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
...o-main/src/main/java/com/facebook/presto/execution/TaskThresholdMemoryRevokingScheduler.java
Outdated
Show resolved
Hide resolved
d6e2c5b
to
4be43f5
Compare
Rebase master due to unrelated test fail. |
This adds ordering tasks by revocable bytes and per task memory threshold alongside ordering by create time.
4be43f5
to
5bf9207
Compare
Tests are green, only added comment. Good to go 👍 |
This PR adds multiple different spilling strategies that can be swapped between using the config
experimental.spiller.task-spilling-strategy
.ORDER_BY_CREATE_TIME - current default strategy. Watch memory pools for revocable memory exceeding threshold, sort tasks by create time, revoke individual operators until we reach the lower threshold.
ORDER_BY_REVOCABLE_BYTES - NEW. Watch memory pools for revocable memory exceeding threshold, sort tasks by most allocated revocable bytes, revoke individual operators until we reach the lower threshold.
PER_TASK_MEMORY_THRESHOLD - NEW. Watch revocable memory pool for memory exceeding per task memory threshold defined by
experimental.spiller.max-revocable-task-memory
. Spill operators in task until it lowers to this threshold.TODO
finish integration testsbetter commit messageRelease note