-
Notifications
You must be signed in to change notification settings - Fork 501
Backend for EXPLAIN (FORMAT TPL) queries. #1589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some highlighted parts.
// Set any compilation settings based on the original query type. | ||
if (portal->GetStatement()->GetQueryType() == network::QueryType::QUERY_EXPLAIN) { | ||
execution::compiler::CompilerSettings settings; | ||
// if (portal->GetStatement()->RootStatement().CastManagedPointerTo<parser::ExplainStatement>().GetFormat()...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration with #1588, check if TPL / TBC should be collected.
@@ -358,6 +357,24 @@ TrafficCopResult TrafficCop::ExecuteExplainStatement( | |||
std::vector<planner::OutputSchema::Column> output_columns; | |||
output_columns.emplace_back("QUERY PLAN", type::TypeId::VARCHAR, nullptr); | |||
|
|||
// TODO(WAN): Integrate with Matt's PR. | |||
#if 0 | |||
// Codegen must happen for certain types of EXPLAIN metadata to be collected, e.g., collection of TPL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration with #1588, check the FORMAT to decide whether you're dumping the plan node, or codegenning the physical plan and then dumping the compile time metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it when a plan comes together.
std::string GetTBC() const { return map_.find("tbc") == map_.end() ? "" : map_.at("tbc"); } | ||
|
||
private: | ||
// TODO(WAN): The value-type may change in the future depending on what tagged dictionaries need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turingcompl33t Any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the flexibility of the string
-> string
map. In my current implementation for the profiling subsystem, instances of tagging dictionaries are owned by the Statement
for the associated query. This works from a lifetime perspective, but isn't great from a "separation of concerns" perspective - profiling information like the tagging dictionary is obviously far too low-level of a detail to push all the way up to the Statement
.
The refactor I was considering goes something like this:
- Make the profiling subsystem its own "top-level" layer, i.e. add
ExecutionProfilingLayer
toDBMain
. This layer is then responsible for two main things:- Starting and stopping the dedicated task that collects samples at query execution time
- Managing instances of tagging dictionaries that are populated by lower-level components of the subsystem
With this approach, we could then generate a unique identifier for each query that is used during compilation / execution to lookup the appropriate tagging dictionary that needs to be updated. It looks like all queries are already assigned a query_id
, but I'm not sure of the uniqueness constraints on this identifier? Could we just invoke std::to_string()
on this identifier and store it as the value for the "tags"
key in the CompileTimeModuleMetadata
? Just one idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how query ids work right now.
And to check that I understand this
CompileTimeModuleMetadata: {"tags": query id}
ExecutionProfilingLayer: {query id -> tagging dict}
?
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
1 similar comment
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, overall a big fan of the design. None of my comments should stop the PR from moving forward so I'm marking as an Approve
. I wrote a quick response to the question of the data structure / types for compile-time metadata collection but we can talk more about this at the meeting I imagine.
std::string GetTBC() const { return map_.find("tbc") == map_.end() ? "" : map_.at("tbc"); } | ||
|
||
private: | ||
// TODO(WAN): The value-type may change in the future depending on what tagged dictionaries need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the flexibility of the string
-> string
map. In my current implementation for the profiling subsystem, instances of tagging dictionaries are owned by the Statement
for the associated query. This works from a lifetime perspective, but isn't great from a "separation of concerns" perspective - profiling information like the tagging dictionary is obviously far too low-level of a detail to push all the way up to the Statement
.
The refactor I was considering goes something like this:
- Make the profiling subsystem its own "top-level" layer, i.e. add
ExecutionProfilingLayer
toDBMain
. This layer is then responsible for two main things:- Starting and stopping the dedicated task that collects samples at query execution time
- Managing instances of tagging dictionaries that are populated by lower-level components of the subsystem
With this approach, we could then generate a unique identifier for each query that is used during compilation / execution to lookup the appropriate tagging dictionary that needs to be updated. It looks like all queries are already assigned a query_id
, but I'm not sure of the uniqueness constraints on this identifier? Could we just invoke std::to_string()
on this identifier and store it as the value for the "tags"
key in the CompileTimeModuleMetadata
? Just one idea.
@@ -358,6 +357,24 @@ TrafficCopResult TrafficCop::ExecuteExplainStatement( | |||
std::vector<planner::OutputSchema::Column> output_columns; | |||
output_columns.emplace_back("QUERY PLAN", type::TypeId::VARCHAR, nullptr); | |||
|
|||
// TODO(WAN): Integrate with Matt's PR. | |||
#if 0 | |||
// Codegen must happen for certain types of EXPLAIN metadata to be collected, e.g., collection of TPL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it when a plan comes together.
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
But nooo. C++. Legacy designs. grumble.
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
2 similar comments
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
Performance Boost!Nice job! This PR has increased the throughput of the system. Could not find any performance results to compare for this commit. |
Minor Decrease in PerformanceBe warned: this PR may have decreased the throughput of the system slightly.
|
Codecov Report
@@ Coverage Diff @@
## master #1589 +/- ##
==========================================
+ Coverage 81.12% 81.21% +0.08%
==========================================
Files 745 747 +2
Lines 52414 52453 +39
==========================================
+ Hits 42523 42599 +76
+ Misses 9891 9854 -37
Continue to review full report at Codecov.
|
Heading
First pass at supplying backend for EXPLAIN (FORMAT TPL) queries.
Description
#1588 provides the frontend for EXPLAIN (FORMAT ...) queries.
This PR attempts to provide the backend to that, structured as follows:
ExecutionSettings
is currently described as "ExecutionSettings stores settings that are passed down from the upper layers.", so we are passing down a newCompilerSettings
that determines what information should be gathered at compile-time. This can be set from the tcop upon detecting that this is an EXPLAIN statement of a relevant type. See comment below.CompilerSettings
just describes whether to collect TPL and/or TBC, and is threaded down intocompiler.cpp
.CompileTimeModuleMetadata
which is currently just aunordered_map<string, string>
. Two main considerations went into this decision:compiler.cpp
, so it might as well be serialized to string there instead of having its lifetime extended.