Conversation
d2091d9 to
215f8db
Compare
|
🤔 there appears to be some sort of deadlock in the tests https://github.com/apache/arrow-datafusion/actions/runs/3702235488/jobs/6272289305 |
|
Lol, even more reason to remove these locks #4617 |
| LogicalPlan::SetVariable(SetVariable { | ||
| variable, value, .. | ||
| }) => { | ||
| let config_options = &self.state.write().config.config_options; |
There was a problem hiding this comment.
I'm actually somewhat surprised this was compiling, there must be some magic going on to extend the lifetime of the temporary lock guard, which is wild
| // original state after it has been cloned, they will not be picked up by the | ||
| // clone but that is okay, as it is equivalent to postponing the state update | ||
| // by keeping the lock until the end of the function scope. | ||
| state.clone() |
There was a problem hiding this comment.
In practice the lock meant that we were still cloning SessionState fairly frequently, better to just be explicit about it and optimise from there
| } | ||
|
|
||
| state_cloned.create_physical_plan(&self.plan).await | ||
| /// Temporary pending #4626 |
| state_cloned.create_physical_plan(&self.plan).await | ||
| /// Temporary pending #4626 | ||
| async fn create_physical_plan_impl(&mut self) -> Result<Arc<dyn ExecutionPlan>> { | ||
| self.session_state.execution_props.start_execution(); |
There was a problem hiding this comment.
This is nice because we are no longer mutating state on the actual SessionContext
alamb
left a comment
There was a problem hiding this comment.
I like it -- thank you @tustvold
There is likely a change of behavior here:
Specifically prior to this PR if you made a dataframe and then modified the SessionContext the dataframe might see some of those changes.
However I think the old behavior would be quite surprising and so I like this change a lot
| impl DataFrame { | ||
| /// Create a new Table based on an existing logical plan | ||
| pub fn new(session_state: Arc<RwLock<SessionState>>, plan: LogicalPlan) -> Self { | ||
| pub fn new(session_state: SessionState, plan: LogicalPlan) -> Self { |
|
|
||
| /// Write a `DataFrame` to a CSV file. | ||
| pub async fn write_csv(self, path: &str) -> Result<()> { | ||
| let state = self.session_state.read().clone(); |
There was a problem hiding this comment.
yeah all the explicit cloning is definitely a warning sign
|
Benchmark runs are scheduled for baseline = 414487c and contender = 42b3a6c. 42b3a6c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
I'm a little surprised by this change. In my opinion, |
|
The idea is that DataFrame contains a snapshot of SessionContext, i.e. a clone of SessionState? This will allow moving planning and execution off SessionContext to both resolve the current circular dependency, and also make the state consistent across planning and execution |
|
@mingmwang I think the key proposal by @tustvold is that once a query is planned, it has only a read-only copy of the This has several seemingly nice properties, such as a configuration value changes that are made after a query is planned, will not affect the plan (it will only affect subsequently planned queries). Do you agree this sounds like reasonable behavior? Update: I didn't see #4617 (comment) so it sounds like this is reasonable from your perspective Do you know of any usecases where running a query needs to modify the session context? |
Which issue does this PR close?
Part of #4617
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?