Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573

Merged
merged 16 commits into from
May 22, 2024

Conversation

goldmedal
Copy link
Contributor

Which issue does this PR close?

Closes #10557

Rationale for this change

What changes are included in this PR?

Only implement the default dialect in this PR. We need other follow-up PR for other dialects.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@@ -47,6 +47,7 @@ arrow-schema = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true }
log = { workspace = true }
regex = { version = "1.8" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move regex to top level, as it is used in much of packages. It can be done as followup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it in 32aa0e9.

@@ -504,6 +508,14 @@ impl Unparser<'_> {
.collect::<Result<Vec<_>>>()
}

pub(super) fn new_ident_quoted_if_needs(&self, ident: String) -> ast::Ident {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a method comments for a pub method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments in 7a534fb.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal
I'm thinking how this will work with whitespaces columns like

select 1 as "a a";

@goldmedal goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 4acde31 to 44e9baa Compare May 21, 2024 11:49
@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels May 21, 2024
@goldmedal
Copy link
Contributor Author

Thanks @goldmedal I'm thinking how this will work with whitespaces columns like

select 1 as "a a";

Thanks @comphead :)
I'm not sure what you mean but I think it also works like other illegal char for SQL identifiers. I add a test case for it in 44e9baa.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

@@ -52,7 +52,7 @@ fn simple_expr_to_sql_demo() -> Result<()> {
let expr = col("a").lt(lit(5)).or(col("a").eq(lit(8)));
let ast = expr_to_sql(&expr)?;
let sql = format!("{}", ast);
assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#);
assert_eq!(sql, r#"((a < 5) OR (a = 8))"#);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this change, perhaps we can remove the next example in the file simple_expr_to_sql_demo_no_escape as I don't think it serves any purpose

@@ -145,7 +145,7 @@ postgres-protocol = "0.6.4"
postgres-types = { version = "0.2.4", features = ["derive", "with-chrono-0_4"] }
rand = { workspace = true, features = ["small_rng"] }
rand_distr = "0.4.3"
regex = "1.5.4"
regex = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is certainly nice to use the same version of regex everywhere 👍

@@ -15,19 +15,30 @@
// specific language governing permissions and limitations
// under the License.

use regex::Regex;
use sqlparser::keywords::ALL_KEYWORDS;

/// Dialect is used to capture dialect specific syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Dialect is used to capture dialect specific syntax.
/// `Dialect` to usse for Unparsing
///
/// The default dialect tries to avoid quoting identifiers unless necessary (e.g. `a` instead of `"a"`)
/// but this behavior can be overridden as needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Look nice.

@goldmedal
Copy link
Contributor Author

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

Thanks @alamb !
I think the suggestions is very simple and reasonable. So, I just fixed them in this PR quickly.

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks @goldmedal 🥇

use regex::Regex;
use sqlparser::keywords::ALL_KEYWORDS;

/// `Dialect` to usse for Unparsing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `Dialect` to usse for Unparsing
/// `Dialect` to use for Unparsing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Note: this trait will eventually be replaced by the Dialect in the SQLparser package
///
/// See <https://github.com/sqlparser-rs/sqlparser-rs/pull/1170>
pub trait Dialect {
fn identifier_quote_style(&self) -> Option<char>;
fn identifier_needs_quote(&self, _: &str) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above note said this trait will eventually be replaced by the Dialect in the SQLparser package. Seems this pr make this harder. Should we extend sqlparser Dialect using something like DialectExt trait?

Copy link
Contributor

@backkem backkem May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to note that this functionality could also be covered within the existing SQLparser::Dialect::identifier_quote_style. It's signature looks as follows:

identifier_quote_style(&self, _identifier: &str) -> Option<char>

It is passed the identifier and can optionally return a quote character if needed. This way the trait doesn't need extending at all. See also apache/datafusion-sqlparser-rs#1170.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per @backkem 's suggestion in a follow on PR, or would you like to update this PR?

Copy link
Contributor Author

@goldmedal goldmedal May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per
@backkem 's suggestion in a follow on PR, or would you like to update this PR?

Thanks @lewiszlw @backkem @alamb
I think I have time to fix it now. I can fix it in this PR.

Copy link
Contributor

@backkem backkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one small nit.

}
pub struct DefaultDialect {}

impl Dialect for DefaultDialect {
fn identifier_quote_style(&self) -> Option<char> {
Some('"')
fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {
fn identifier_quote_style(&self, identifier: &str) -> Option<char> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@backkem I want to check if I should also change the signature (L29) in the Dialect trait. I'm not familiar with naming conventions in Rust. I guess _identifier means this parameter is an identifier, but we ignore it in this method, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, prefixing the identifier with an _ is a convention for silencing a linter warning that the variable is unused. Since it is being used now, the _ prefix is no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

@goldmedal goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 654c836 to 8ed1525 Compare May 22, 2024 15:38
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @goldmedal and @backkem and @phillipleblanc and @lewiszlw and @comphead -- I think this PR looks really nice now and this makes unparsing much nicer looking for humans 🏆

@alamb alamb merged commit 7bd4b53 into apache:main May 22, 2024
25 checks passed
@goldmedal
Copy link
Contributor Author

Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :)

@goldmedal goldmedal deleted the feature/10557-dialect-need-qutoed branch May 23, 2024 00:13
@Omega359
Copy link
Contributor

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

@alamb
Copy link
Contributor

alamb commented May 25, 2024

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

Yeah, I don't know why that is a warning and not an error -- here is a PR to fix it: #10662

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
… without quotes (apache#10573)

* add ident needs quote check

* implement the check for default dialect and fix tests

* add test for need-quoted cases

* update cargo lock

* fomrat cargo toml

* fix the example test

* move regex to top level

* add comments for new_ident_quoted_if_needs func

* fix typo and add test for space

* fix example test

* fix example test

* fix the test fail

* remove unused example and modified comments

* fix typo

* follow the latest Dialect trait in sqlparser

* fix the parameter name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make SQL strings generated from Exprs "prettier"
7 participants