-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc #24259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The issue here is that this isn't quite sufficient, because users don't always read docs (as a matter of fact, for people that have something working, they don't need to look up the docs to get it work), and SQL users also don't see the warning messages in logs. |
Test build #104133 has finished for PR 24259 at commit
|
Thanks, @rxin. I agree that practically users tend to don't read the doc although they are supposed to read the documentation. I was thinking it can be a matter that we officially document the deprecation or not in a way (for instance, I don't think so many users would read release docs or actual JIRAs often ..). I quickly gave a rough try to add a warning with all default settings with
I think Scala, Python, SparkR users also see similar stuff. Yes, SQL itself can be different in its nature but I think we should better have another mechanism for users to encourage to read them in this case. In the way of throwing an error (proposed and being discussed at #24195), I think it can be problematic in the future to manage all those configurations. For instance, currently it sounded like we will go ahead with this approach from now on. As we know, Spark has a lot of configurations and flags, and I am worried of the maintenance overhead (I remember that I read an extreme example of flags). At least this PR encourages Spark developers to note deprecated SQL functions which can be accessed in runtime (as well as they are shown in SQL documentation). |
Or how about we have one configuration to enable and disable all deprecated SQL expressions? I think I can fix it here futher to add such mechanism if it's inevitable to throw an error. |
I don't mind this support. I also think a WARN level message is pretty good. |
* for example, "2.2.0". It should not start with non-number characters. | ||
* | ||
* `deprecated()` contains deprecation information for the expression optionally, for example, | ||
* "Deprecated since 2.2.0. Use something else instead". This property should have 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but why do we have such a weird requirement for the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it pretty in SQL documentation. Actually, we can loose the requirement by adding some logics to handle in gen-sql-markdown.py
; however, I thought it's better to leave it strict to have uniformed code style.
I'm fine with this approach, with the proposal |
""" | ||
|
||
if deprecated != "": | ||
deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, for instance this logic matters.
If we put some strings in ..
@ExpressionDescription(
...
deprecated = """
blah blah blah
I am super long line
1. one
2. I am markdwn
- bullet marks
- aa
You know what? actually you can write a markdown table as well.
| Tables | Are | Cool |
| ------------- |:-------------:| -----:|
| col 3 is | right-aligned | $1600 |
| col 2 is | centered | $12 |
""")
gen-sql-markdown.py
here split the doc into each line and then get rid of the first 4 characters for each. Last newline makes a proper indentation between each field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will people use multi-line string literal in Scala? e.g.
deprecated = """
|xxx
|xxx
""".stripMargin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That cannot be used here due to compilation error. It seems annotation parameters only allow constants. There was a discussion about it before when I first add examples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the expression is inside an object? e.g.
object A {
@ExpressionDescription(
note = """
blabla
""" )
class B
}
Then the string has 6 spaces at the beginning.
It looks to me that we can relax the format a little bit: just trim all the spaces at the beginning. Then we can even use a single line string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in that case it might be a problem although there look no case so far after this fix (I skimmed the built doc).
The thing we should take into account is we shouldn't trim all the white spaces all because spacing matters in md rendering. For instance, if we have a doc like ..
object A {
@ExpressionDescription(
note = """
blabla
- first
blabla
- second
blabla
""" )
class B
}
We should only trim leading 6 spaces for each line. So, it will bring a bit of complication. It's doable though.
If you don't mind, I will handle it later. Was thinking it better have a single format for now. We should encourage to have more verbose and detailed doc anyway (rather than single line doc) in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that makes sense, it's more tricky than I thought...
So the issue with a single flag for all the deprecated functions is that the more functions we deprecate, the more likely one will just turn it on always, and as a result it becomes completely useless. Why do you think it's a lot of overhead to add new flags? Isn't it just a few lines of code? We don't need to define an extra function in the SQLConf object. |
It is a few pieces of codes, yes. I was worrying about we start to have many flags under Adding a configuration for each deprecation with throwing an exception looked like it makes harder to deprecate and get out of the legacy stuff. (for instance, the SparkR and PySpark tests had to be fixed few times in #24195). |
Or, how about adding a configuration that lists up deprecated functions? In this way, everything could be fine I suspect. |
Another point of view: how often do we deprecate SQL functions? |
My impression was that it's been difficult to deprecate SQL functions so it has not been deprecated often (I suspect there might be some more cases that we should deprecate). But .. okay ... we can revisit this when we see this issue again. BTW, the PR itself here can be merged orthogonally I believe anyway since it just adds some more notes for SQL function deprecation. |
+1, it makes the generated doc better |
Sorry guys your arguments don't really hold. If we deprecate only a small number of expressions, why does it matter than for each of them we'd need to add a few lines? If we don't want to pollute SQLConf because it is getting long, we can create a new object called SQLConfDeprecated or something like that. A single global deprecation flag doesn't work. Imagine a library you use that uses one of those deprecated configs. Now you will never get any warning anywhere ... |
Yes, this one just adds some more info about deprecation. For conf with failing, I'm okie with going ahead for now and talking about that later when we see it's often. |
Test build #104180 has finished for PR 24259 at commit
|
Test build #104201 has finished for PR 24259 at commit
|
Test build #104202 has finished for PR 24259 at commit
|
Test build #104203 has finished for PR 24259 at commit
|
retest this please |
Test build #104228 has finished for PR 24259 at commit
|
retest this please |
Test build #104238 has finished for PR 24259 at commit
|
Hmmmhm? I haven;t seen such test failures before. |
retest this please |
Test build #104245 has finished for PR 24259 at commit
|
@HyukjinKwon I'm not familiar with Python, so not sure that this is helpful. :) Isn't this failure related somehow to this? Was pytz version updated in this node? I could see some version updates here:
but these are related to Python 3, and the test failed with 2. |
See #24266 |
retest this please |
Test build #104388 has finished for PR 24259 at commit
|
I think this is ready for a look. |
LGTM except one comment: https://github.com/apache/spark/pull/24259/files#r273316356 |
thanks, merging to master! |
Thanks all! |
What changes were proposed in this pull request?
This PR proposes to two things:
Add
deprecated
field toExpressionDescription
so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown viaDESCRIBE FUNCTION EXTENDED
.While I am here, add some more restrictions for
note()
andsince()
. Looks some documentations are broken due to malformednote
:It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0).
This PR adds
deprecated
property intofrom_utc_timestamp
andto_utc_timestamp
(it's deprecated as of #24195) as examples of using this field.Now it shows the deprecation information as below:
SQL documentation is shown as below:
DESCRIBE FUNCTION EXTENDED from_utc_timestamp;
:How was this patch tested?
Manually tested via:
For documentation verification:
For checking description: