Skip to content

Conversation

@psvri
Copy link
Contributor

@psvri psvri commented Aug 27, 2022

Which issue does this PR close?

Closes #2848.

Rationale for this change

Show create table works for external tables

What changes are included in this PR?

Show create table now works with external tables

Are there any user-facing changes?

Yes

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner labels Aug 27, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2022

Codecov Report

Merging #3279 (5cc896d) into master (92110dd) will decrease coverage by 0.42%.
The diff coverage is 80.72%.

@@            Coverage Diff             @@
##           master    #3279      +/-   ##
==========================================
- Coverage   85.89%   85.47%   -0.43%     
==========================================
  Files         294      294              
  Lines       53373    54099     +726     
==========================================
+ Hits        45845    46241     +396     
- Misses       7528     7858     +330     
Impacted Files Coverage Δ
datafusion/expr/src/logical_plan/plan.rs 77.37% <ø> (-1.36%) ⬇️
datafusion/proto/src/logical_plan.rs 17.12% <0.00%> (-0.24%) ⬇️
datafusion/core/src/execution/context.rs 78.87% <71.42%> (+0.63%) ⬆️
...fusion/optimizer/src/pre_cast_lit_in_comparison.rs 94.00% <85.71%> (+10.67%) ⬆️
datafusion/core/src/datasource/listing/table.rs 90.21% <100.00%> (+1.08%) ⬆️
datafusion/core/tests/sql/information_schema.rs 98.60% <100.00%> (+0.05%) ⬆️
datafusion/sql/src/parser.rs 84.38% <100.00%> (+0.41%) ⬆️
datafusion/sql/src/planner.rs 80.36% <100.00%> (-0.30%) ⬇️
datafusion/expr/src/expr_visitor.rs 53.68% <0.00%> (-10.07%) ⬇️
datafusion/expr/src/expr.rs 75.83% <0.00%> (-9.57%) ⬇️
... and 46 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@liukun4515
Copy link
Contributor

Thanks @psvri
Is there any SQL syntax for this function?
It's better to add doc for this feature and it's friendly for reviewers

@psvri
Copy link
Contributor Author

psvri commented Aug 27, 2022

This dosent add new syntax.
Previous create show table <external_table> was not showing the table definition.
This PR is intended to fix that.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the syntax being show create external table vs just show external table? Show Create has two verbs in one sentence and seems like a confusing choice.

Also, I'm wary of passing duplicated information (the raw SQL + the result of that raw SQL) - it seems better to store the information once (in the registered table), and regenerate the SQL from that in a normalized fashion.

@avantgardnerio
Copy link
Contributor

Is there a reason for the syntax being

Oh, interesting, I see this follows the spark precedent.

@liukun4515
Copy link
Contributor

This dosent add new syntax. Previous create show table <external_table> was not showing the table definition. This PR is intended to fix that.

Can you add some SQL level test for this fix? @psvri

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing @psvri -- I have been away. We are working on increasing the review capacity for this project as well as those who can commit PRs

All in all, I really like the feature -- I have one style suggestion

"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+",
"| table_catalog | table_schema | table_name | definition |",
"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+",
"| datafusion | public | abc | CREATE EXTERNAL TABLE abc STORED AS CSV LOCATION ../../testing/data/csv/aggregate_test_100.csv |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this is great

pub fn try_new(config: ListingTableConfig) -> Result<Self> {
pub fn try_new(
config: ListingTableConfig,
definition: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing I would prefer to change in this PR. Specifically, rather than change the API of try_new I think we could add a function like

pub fn with_definition(mut self, defintion: Option<String>) -> Self {
...
}

This change I think would reduce the size of this PR significantly

Since I realize this PR has been waiting for a week for feedback I will propose this change via PR

@alamb
Copy link
Contributor

alamb commented Sep 3, 2022

Here is my proposed change (targeting this PR) : psvri#1

Use builder-style API to set ListingTable definition
@psvri
Copy link
Contributor Author

psvri commented Sep 3, 2022

NP @alamb

I too was on a break for a while. Thanks for PR. I will merge it.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psvri

@alamb alamb requested a review from avantgardnerio September 4, 2022 11:46
@alamb
Copy link
Contributor

alamb commented Sep 4, 2022

@avantgardnerio you had an outstanding comment on this PR, I believe -- can you please review again and see if this meets your expectations now?

@avantgardnerio
Copy link
Contributor

review again and see if this meets your expectations now?

@alamb and @psvri , my concern was regarding the implementation based upon retaining the original raw SQL string. The issue this is resolving had a suggested implementation:

This will probably require implementing Display for DFStatement to convert it back into SQL, or generating SQL in register_listing_table.

And that was more in line with my thoughts:

  • we parse it into fields, and store those fields
  • we also store the original parse string

This seems strange to me, in that it will carry over irrelevant syntax differences, and also stores redundant information. When I ask postgres to show a table, I don't get my raw SQL back, I get back generated SQL that matches the definition postgres has an internal representation for.

It bothers me, but I'm not sure if I'm being pedantic I'll leave it to you @alamb .

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

This seems strange to me, in that it will carry over irrelevant syntax differences, and also stores redundant information. When I ask postgres to show a table, I don't get my raw SQL back, I get back generated SQL that matches the definition postgres has an internal representation for.

I agree that most other systems store some sort of "canonicalized" view definition. The sqlparser-rs code offers that the output of Display can be parsed back into the same structure, so I think we could simply use that property (or potentially store the view as a serialized LogicalPlan).

It bothers me, but I'm not sure if I'm being pedantic I'll leave it to you @alamb .

I think the question of "how do we store a view definition" is somewhat orthogonal to the feature in this PR to show that view definition (though they are of course related!) and I think we can handle it in a follow on

Thus, I plan to file an issue where we can discus the "what to use / how to store a view definition" issue and then merge this PR.

This may also be related to the discussion we are having about view implementation on #3249 (comment)

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

🤔 so I actually tried out 'show create table' with a view and it already does canonicalize the SQL, it seems:

DataFusion CLI v11.0.0
❯ create table t as select * from (values (1), (2), (3)) as sq;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
| 3       |
+---------+
3 rows in set. Query took 0.050 seconds.
❯ create view v as select column1 /*my random comment*/ from t where column1 != 1;
0 rows in set. Query took 0.001 seconds.
❯ show create table v;
+---------------+--------------+------------+-----------------------------------------------------------+
| table_catalog | table_schema | table_name | definition                                                |
+---------------+--------------+------------+-----------------------------------------------------------+
| datafusion    | public       | v          | CREATE VIEW v AS SELECT column1 FROM t WHERE column1 <> 1 |
+---------------+--------------+------------+-----------------------------------------------------------+
1 row in set. Query took 0.015 seconds.

(note the lack of /*my random comment*/ and the transformation of != to <>)

I believe this means that DataFusion is storing the canonicalized SQL string from sqlparser

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

Ok, so now I am really confused. When I tried to run show create table locally for an external table in the datafusion-cli that didn't work 🤔

❯ CREATE EXTERNAL TABLE abc STORED AS /*random comment*/ CSV LOCATION '../testing/data/csv/aggregate_test_100.csv';
0 rows in set. Query took 0.063 seconds.
❯ show create table abc;
+---------------+--------------+------------+------------+
| table_catalog | table_schema | table_name | definition |
+---------------+--------------+------------+------------+
| datafusion    | public       | abc        |            |
+---------------+--------------+------------+------------+

&self,
statement: CreateExternalTable,
) -> Result<LogicalPlan> {
let definition = Some(statement.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to_string() call converts the parsed statement into a string.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Member

Thanks @psvri. Could you rebase or upmerge to fix the conflicts?

@andygrove andygrove merged commit 0084aeb into apache:master Sep 7, 2022
@ursabot
Copy link

ursabot commented Sep 7, 2022

Benchmark runs are scheduled for baseline = 7c04964 and contender = 0084aeb. 0084aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement "SHOW CREATE TABLE" for external tables

7 participants