Backend for EXPLAIN (FORMAT TPL) queries. #1589

lmwnshn · 2021-05-23T02:43:14Z

Heading

First pass at supplying backend for EXPLAIN (FORMAT TPL) queries.

Description

#1588 provides the frontend for EXPLAIN (FORMAT ...) queries.

This PR attempts to provide the backend to that, structured as follows:

ExecutionSettings is currently described as "ExecutionSettings stores settings that are passed down from the upper layers.", so we are passing down a new CompilerSettings that determines what information should be gathered at compile-time. This can be set from the tcop upon detecting that this is an EXPLAIN statement of a relevant type. See comment below.
CompilerSettings just describes whether to collect TPL and/or TBC, and is threaded down into compiler.cpp.
Information is maintained in CompileTimeModuleMetadata which is currently just a unordered_map<string, string>. Two main considerations went into this decision:
- You will either need to extend the lifetime of the objects you want metadata for, or grab the relevant information before the objects get destroyed. In this case, the AST for example is destroyed after compilation is done.
- I cannot foresee any usages of manipulating the AST after compiler.cpp, so it might as well be serialized to string there instead of having its lifetime extended.

lmwnshn

Some highlighted parts.

lmwnshn · 2021-05-23T02:44:18Z

src/traffic_cop/traffic_cop.cpp

+  // Set any compilation settings based on the original query type.
+  if (portal->GetStatement()->GetQueryType() == network::QueryType::QUERY_EXPLAIN) {
+    execution::compiler::CompilerSettings settings;
+    // if (portal->GetStatement()->RootStatement().CastManagedPointerTo<parser::ExplainStatement>().GetFormat()...)


Integration with #1588, check if TPL / TBC should be collected.

lmwnshn · 2021-05-23T02:45:13Z

src/traffic_cop/traffic_cop.cpp

@@ -358,6 +357,24 @@ TrafficCopResult TrafficCop::ExecuteExplainStatement(
  std::vector<planner::OutputSchema::Column> output_columns;
  output_columns.emplace_back("QUERY PLAN", type::TypeId::VARCHAR, nullptr);

+  // TODO(WAN): Integrate with Matt's PR.
+#if 0
+    // Codegen must happen for certain types of EXPLAIN metadata to be collected, e.g., collection of TPL.


Integration with #1588, check the FORMAT to decide whether you're dumping the plan node, or codegenning the physical plan and then dumping the compile time metadata.

I love it when a plan comes together.

lmwnshn · 2021-05-23T02:45:41Z

src/include/execution/vm/module_metadata.h

+  std::string GetTBC() const { return map_.find("tbc") == map_.end() ? "" : map_.at("tbc"); }
+
+ private:
+  // TODO(WAN): The value-type may change in the future depending on what tagged dictionaries need.


@turingcompl33t Any thoughts here?

I like the flexibility of the string -> string map. In my current implementation for the profiling subsystem, instances of tagging dictionaries are owned by the Statement for the associated query. This works from a lifetime perspective, but isn't great from a "separation of concerns" perspective - profiling information like the tagging dictionary is obviously far too low-level of a detail to push all the way up to the Statement.

The refactor I was considering goes something like this:

Make the profiling subsystem its own "top-level" layer, i.e. add ExecutionProfilingLayer to DBMain. This layer is then responsible for two main things:

Starting and stopping the dedicated task that collects samples at query execution time

Managing instances of tagging dictionaries that are populated by lower-level components of the subsystem

With this approach, we could then generate a unique identifier for each query that is used during compilation / execution to lookup the appropriate tagging dictionary that needs to be updated. It looks like all queries are already assigned a query_id, but I'm not sure of the uniqueness constraints on this identifier? Could we just invoke std::to_string() on this identifier and store it as the value for the "tags" key in the CompileTimeModuleMetadata? Just one idea.

I don't know how query ids work right now.

And to check that I understand this
CompileTimeModuleMetadata: {"tags": query id}
ExecutionProfilingLayer: {query id -> tagging dict}
?

noisepage-checks · 2021-05-25T00:00:46Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

noisepage-checks · 2021-05-25T01:19:30Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

turingcompl33t

Looks great, overall a big fan of the design. None of my comments should stop the PR from moving forward so I'm marking as an Approve. I wrote a quick response to the question of the data structure / types for compile-time metadata collection but we can talk more about this at the meeting I imagine.

src/execution/compiler/compilation_context.cpp

src/execution/compiler/compiler.cpp

src/execution/compiler/executable_query.cpp

src/include/execution/compiler/compilation_context.h

src/include/execution/compiler/compiler_settings.h

turingcompl33t · 2021-05-25T13:49:28Z

src/include/execution/vm/module_metadata.h

+  std::string GetTBC() const { return map_.find("tbc") == map_.end() ? "" : map_.at("tbc"); }
+
+ private:
+  // TODO(WAN): The value-type may change in the future depending on what tagged dictionaries need.


I like the flexibility of the string -> string map. In my current implementation for the profiling subsystem, instances of tagging dictionaries are owned by the Statement for the associated query. This works from a lifetime perspective, but isn't great from a "separation of concerns" perspective - profiling information like the tagging dictionary is obviously far too low-level of a detail to push all the way up to the Statement.

The refactor I was considering goes something like this:

Make the profiling subsystem its own "top-level" layer, i.e. add ExecutionProfilingLayer to DBMain. This layer is then responsible for two main things:

Starting and stopping the dedicated task that collects samples at query execution time

Managing instances of tagging dictionaries that are populated by lower-level components of the subsystem

With this approach, we could then generate a unique identifier for each query that is used during compilation / execution to lookup the appropriate tagging dictionary that needs to be updated. It looks like all queries are already assigned a query_id, but I'm not sure of the uniqueness constraints on this identifier? Could we just invoke std::to_string() on this identifier and store it as the value for the "tags" key in the CompileTimeModuleMetadata? Just one idea.

turingcompl33t · 2021-05-25T13:56:15Z

src/traffic_cop/traffic_cop.cpp

@@ -358,6 +357,24 @@ TrafficCopResult TrafficCop::ExecuteExplainStatement(
  std::vector<planner::OutputSchema::Column> output_columns;
  output_columns.emplace_back("QUERY PLAN", type::TypeId::VARCHAR, nullptr);

+  // TODO(WAN): Integrate with Matt's PR.
+#if 0
+    // Codegen must happen for certain types of EXPLAIN metadata to be collected, e.g., collection of TPL.


I love it when a plan comes together.

noisepage-checks · 2021-05-25T18:32:38Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

But nooo. C++. Legacy designs. grumble.

noisepage-checks · 2021-05-25T21:41:16Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

noisepage-checks · 2021-05-25T22:01:58Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

noisepage-checks · 2021-05-25T23:09:40Z

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

noisepage-checks · 2021-05-26T04:29:25Z

Minor Decrease in Performance

Be warned: this PR may have decreased the throughput of the system slightly.

tps (%change)	benchmark_type	wal_device	details
0.63%	tpcc	RAM disk	Details master tps=22426.47, commit tps=22568.82, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=RAM disk, max_connection_threads=32
-3.61%	tpcc	None	Details master tps=29426.38, commit tps=28363.0, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=None, max_connection_threads=32
-0.79%	tpcc	HDD	Details master tps=21850.1, commit tps=21677.48, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=HDD, max_connection_threads=32
4.85%	tatp	RAM disk	Details master tps=6537.49, commit tps=6854.35, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=RAM disk, max_connection_threads=32
0.15%	tatp	None	Details master tps=7393.76, commit tps=7404.58, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=None, max_connection_threads=32
6.83%	tatp	HDD	Details master tps=6406.76, commit tps=6844.42, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=HDD, max_connection_threads=32

codecov · 2021-05-26T04:29:27Z

Codecov Report

Merging #1589 (57ed045) into master (f2fd647) will increase coverage by 0.08%.
The diff coverage is 62.50%.

@@            Coverage Diff             @@
##           master    #1589      +/-   ##
==========================================
+ Coverage   81.12%   81.21%   +0.08%     
==========================================
  Files         745      747       +2     
  Lines       52414    52453      +39     
==========================================
+ Hits        42523    42599      +76     
+ Misses       9891     9854      -37

Impacted Files	Coverage Δ
src/execution/compiler/executable_query.cpp	`50.70% <0.00%> (-0.73%)`	⬇️
...c/include/execution/compiler/compilation_context.h	`100.00% <ø> (ø)`
src/include/execution/compiler/executable_query.h	`88.88% <ø> (ø)`
...lude/execution/compiler/executable_query_builder.h	`100.00% <ø> (ø)`
src/include/execution/vm/module.h	`93.33% <0.00%> (-2.13%)`	⬇️
src/include/traffic_cop/traffic_cop.h	`100.00% <ø> (ø)`
src/include/execution/exec/execution_settings.h	`83.33% <33.33%> (-16.67%)`	⬇️
src/traffic_cop/traffic_cop.cpp	`73.68% <45.45%> (-1.96%)`	⬇️
src/include/execution/compiler/compiler_settings.h	`50.00% <50.00%> (ø)`
src/include/execution/vm/module_metadata.h	`60.00% <60.00%> (ø)`
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2fd647...57ed045. Read the comment docs.

First pass at supplying backend for EXPLAIN (FORMAT TPL) queries.

27fd2c2

lmwnshn added in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. ready-for-ci Indicate that this build should be run through CI. labels May 23, 2021

lmwnshn self-assigned this May 23, 2021

lmwnshn requested review from mbutrovich and turingcompl33t May 23, 2021 02:46

lmwnshn commented May 23, 2021

View reviewed changes

lmwnshn added 3 commits May 22, 2021 22:53

Fix lint.

de03e31

Fix tidy.

dccca59

Fix tests. Store CompilerSettings in Compiler::Input.

7e1f6de

turingcompl33t approved these changes May 25, 2021

View reviewed changes

lmwnshn added 3 commits May 25, 2021 14:34

Add nodiscard.

4176aa6

Revert GeneratePlan API change.

2eb0e37

You would think that nodiscard should be the default.

4f272df

But nooo. C++. Legacy designs. grumble.

lmwnshn added ready-to-merge This PR is ready to be merged. Mark PRs with this. and removed in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. labels May 26, 2021

Merge branch 'master' into explain_tpl_backend

57ed045

lmwnshn merged commit dbc4383 into cmu-db:master May 26, 2021

lmwnshn deleted the explain_tpl_backend branch May 26, 2021 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend for EXPLAIN (FORMAT TPL) queries. #1589

Backend for EXPLAIN (FORMAT TPL) queries. #1589

lmwnshn commented May 23, 2021 •

edited

Loading

lmwnshn left a comment

lmwnshn May 23, 2021

lmwnshn May 23, 2021

turingcompl33t May 25, 2021

lmwnshn May 23, 2021

turingcompl33t May 25, 2021

lmwnshn May 25, 2021

noisepage-checks bot commented May 25, 2021

noisepage-checks bot commented May 25, 2021

turingcompl33t left a comment

turingcompl33t May 25, 2021

turingcompl33t May 25, 2021

noisepage-checks bot commented May 25, 2021

noisepage-checks bot commented May 25, 2021

noisepage-checks bot commented May 25, 2021

noisepage-checks bot commented May 25, 2021

noisepage-checks bot commented May 26, 2021

codecov bot commented May 26, 2021

Backend for EXPLAIN (FORMAT TPL) queries. #1589

Backend for EXPLAIN (FORMAT TPL) queries. #1589

Conversation

lmwnshn commented May 23, 2021 • edited Loading

Heading

Description

lmwnshn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noisepage-checks bot commented May 25, 2021

Performance Boost!

noisepage-checks bot commented May 25, 2021

Performance Boost!

turingcompl33t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noisepage-checks bot commented May 25, 2021

Performance Boost!

noisepage-checks bot commented May 25, 2021

Performance Boost!

noisepage-checks bot commented May 25, 2021

Performance Boost!

noisepage-checks bot commented May 25, 2021

Performance Boost!

noisepage-checks bot commented May 26, 2021

Minor Decrease in Performance

codecov bot commented May 26, 2021

Codecov Report

lmwnshn commented May 23, 2021 •

edited

Loading