Skip to content

[SPARK-33468][SQL] ParseUrl in ANSI mode should fail if input string is not a valid url #30399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

With ParseUrl, instead of return null we throw exception if input string is not a vaild url.

Why are the changes needed?

For ANSI mode.

Does this PR introduce any user-facing change?

Yes, user will get exception if set spark.sql.ansi.enabled=true.

How was this patch tested?

Add test.

@github-actions github-actions bot added the SQL label Nov 17, 2020
case e: URISyntaxException => null
// We fail on error if in ansi mode.
case e: URISyntaxException if SQLConf.get.ansiEnabled =>
throw new IllegalArgumentException(s"Find an invaild url string ${url.toString}", e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the raw exception message is something like Illegal character in query at index 33: https://a.b.c/index.php?params1=a|b&params2=x.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-33468][SQL] ParseUrl should fail if input string is not a valid url [SPARK-33468][SQL] ParseUrl in ANSI mode should fail if input string is not a valid url Nov 17, 2020
@@ -1404,7 +1404,10 @@ case class ParseUrl(children: Seq[Expression])
try {
new URI(url.toString)
} catch {
case e: URISyntaxException => null
// We fail on error if in ansi mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this comment.

@@ -943,6 +943,21 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
GenerateUnsafeProjection.generate(ParseUrl(Seq(Literal("\"quote"), Literal("\"quote"))) :: Nil)
}

test("SPARK-33468: ParseUrl should fail if input string is not a valid url") {
// fail on error if in ansi mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this.

@@ -943,6 +943,21 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
GenerateUnsafeProjection.generate(ParseUrl(Seq(Literal("\"quote"), Literal("\"quote"))) :: Nil)
}

test("SPARK-33468: ParseUrl should fail if input string is not a valid url") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseUrl -> ParseUrl in ANSI mode?

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35829/

@ulysses-you
Copy link
Contributor Author

@dongjoon-hyun thanks for the review. Updated the comment.

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35829/

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35830/

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35830/

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Test build #131226 has finished for PR 30399 at commit 62bf87f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @cloud-fan and @maropu

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Test build #131227 has finished for PR 30399 at commit a408734.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1404,7 +1404,9 @@ case class ParseUrl(children: Seq[Expression])
try {
new URI(url.toString)
} catch {
case e: URISyntaxException => null
case e: URISyntaxException if SQLConf.get.ansiEnabled =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed here, updated !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow other expressions and add a failOnError: Boolean parameter? See #30297 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it !

@maropu
Copy link
Member

maropu commented Nov 17, 2020

Looks fine otherwise.

@github-actions github-actions bot added the DOCS label Nov 18, 2020
@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35848/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35848/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131245 has finished for PR 30399 at commit 2202ca4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -114,6 +114,7 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql.
- `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `element_at`: This function throws `NoSuchElementException` if key does not exist in map.
- `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `parse_url`: This function throws `IllegalArgumentException` if input string is not a valid url.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: input string -> an input string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed here. Updated !

@maropu
Copy link
Member

maropu commented Nov 20, 2020

Looks ok otherwise.

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35980/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35980/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131377 has finished for PR 30399 at commit 497015c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ParseUrl(children: Seq[Expression], failOnError: Boolean = SQLConf.get.ansiEnabled)

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35989/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131387 has finished for PR 30399 at commit ecd2143.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35989/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35995/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35995/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131391 has finished for PR 30399 at commit d7e437b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3384bda Nov 20, 2020
@ulysses-you
Copy link
Contributor Author

thanks for merging !

@ulysses-you ulysses-you deleted the SPARK-33468 branch November 22, 2021 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants