-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Broadly speaking:
SessionContext/SessionState- state used to plan a queryExecutionProps- state used to lower a logical expression to a physical expressionTaskContext- state used to execute a query
We then have the following
RuntimeEnv- "global" configuration available at plan and query timeSessionConfig- session configuration available at plan and query time
Of these RuntimeEnv, SessionState and SessionConfig are interior mutable, that is they can be modified without a mutable reference.
The result is that queries can and do modify the session and runtime configuration during execution. This is important to support things like CREATE TABLE, SET, etc... This is fine, however, the use of shared mutable state means that modifications will also impact in-flight queries. This feels at best surprising, and there is a fairly high probability of their being consistency bugs already resulting from this.
Describe the solution you'd like
I would ideally like to use Rust's borrow checker to handle this for us, as this would not only eliminate a non-trivial amount of locking complexity from the DataFusion codebase, but would also more clearly communicate what state can be altered when.
This would require separating DDL from DML, with the latter requiring mutable access to the SessionContext. I'm inclined to think this is fine for a couple of reasons:
- Some of the methods on
SessionContextstill take&mut self- refactor: relax the signature of register_* in SessionContext #4612 - Most use-cases aren't using
SessionContextin parallel - Those that are using
SessionContextin parallel will need async state management regardless
It isn't a fully formed thought, but something that came out of #4607 is the need to be able to pre-parse a SQL statement. Perhaps we could provide some sort of SqlStatement wrapper containing a parsed SQL statement. This would facilitate delegation of specific handling of mutating queries to the downstream system, which is far better placed to determine the desired semantics.
Describe alternatives you've considered
Additional context
#4517 #3887 #4349 track improvements to DataFusion's configuration
#3777 tracks async catalog support which introduces another dimension to the out-of-band state modification