Skip to content

Configuration Mutation Isolation #4617

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Broadly speaking:

  • SessionContext / SessionState - state used to plan a query
  • ExecutionProps - state used to lower a logical expression to a physical expression
  • TaskContext - state used to execute a query

We then have the following

  • RuntimeEnv - "global" configuration available at plan and query time
  • SessionConfig - session configuration available at plan and query time

Of these RuntimeEnv, SessionState and SessionConfig are interior mutable, that is they can be modified without a mutable reference.

The result is that queries can and do modify the session and runtime configuration during execution. This is important to support things like CREATE TABLE, SET, etc... This is fine, however, the use of shared mutable state means that modifications will also impact in-flight queries. This feels at best surprising, and there is a fairly high probability of their being consistency bugs already resulting from this.

Describe the solution you'd like

I would ideally like to use Rust's borrow checker to handle this for us, as this would not only eliminate a non-trivial amount of locking complexity from the DataFusion codebase, but would also more clearly communicate what state can be altered when.

This would require separating DDL from DML, with the latter requiring mutable access to the SessionContext. I'm inclined to think this is fine for a couple of reasons:

It isn't a fully formed thought, but something that came out of #4607 is the need to be able to pre-parse a SQL statement. Perhaps we could provide some sort of SqlStatement wrapper containing a parsed SQL statement. This would facilitate delegation of specific handling of mutating queries to the downstream system, which is far better placed to determine the desired semantics.

Describe alternatives you've considered

Additional context

#4517 #3887 #4349 track improvements to DataFusion's configuration

#3777 tracks async catalog support which introduces another dimension to the out-of-band state modification

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions