Skip to content

[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc #24259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,35 +31,58 @@
* `usage()` will be used for the function usage in brief way.
*
* These below are concatenated and used for the function usage in verbose way, suppose arguments,
* examples, note and since will be provided.
* examples, note, since and deprecated will be provided.
*
* `arguments()` describes arguments for the expression. This should follow the format as below:
* `arguments()` describes arguments for the expression.
*
* Arguments:
* * arg0 - ...
* ....
* * arg1 - ...
* ....
*
* `examples()` describes examples for the expression. This should follow the format as below:
*
* Examples:
* > SELECT ...;
* ...
* > SELECT ...;
* ...
* `examples()` describes examples for the expression.
*
* `note()` contains some notes for the expression optionally.
*
* `since()` contains version information for the expression. Version is specified by,
* for example, "2.2.0".
*
* We can refer the function name by `_FUNC_`, in `usage`, `arguments` and `examples`, as it's
* registered in `FunctionRegistry`.
* `deprecated()` contains deprecation information for the expression optionally, for example,
* "Deprecated since 2.2.0. Use something else instead".
*
* The format, in particular for `arguments()`, `examples()`,`note()`, `since()` and
* `deprecated()`, should strictly be as follows.
*
* <pre>
* <code>@ExpressionDescription(
* ...
* arguments = """
* Arguments:
* * arg0 - ...
* ....
* * arg1 - ...
* ....
* """,
* examples = """
* Examples:
* > SELECT ...;
* ...
* > SELECT ...;
* ...
* """,
* note = """
* ...
* """,
* since = "3.0.0",
* deprecated = """
* ...
* """)
* </code>
* </pre>
*
* We can refer the function name by `_FUNC_`, in `usage()`, `arguments()` and `examples()` as
* it is registered in `FunctionRegistry`.
*
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `since()` and
* `deprecated()` should be not defined together. `extended()` exists for backward compatibility.
*
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()` and `since()` will
* be ignored and `extended()` will be used for the extended description for backward
* compatibility.
* Note this contents are used in the SparkSQL documentation for built-in functions. The contents
* here are considered as a Markdown text and then rendered.
*/
@DeveloperApi
@Retention(RetentionPolicy.RUNTIME)
Expand All @@ -70,4 +93,5 @@
String examples() default "";
String note() default "";
String since() default "";
String deprecated() default "";
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ public class ExpressionInfo {
private String examples;
private String note;
private String since;
private String deprecated;

public String getClassName() {
return className;
Expand Down Expand Up @@ -63,6 +64,10 @@ public String getNote() {
return note;
}

public String getDeprecated() {
return deprecated;
}

public String getDb() {
return db;
}
Expand All @@ -75,13 +80,15 @@ public ExpressionInfo(
String arguments,
String examples,
String note,
String since) {
String since,
String deprecated) {
assert name != null;
assert arguments != null;
assert examples != null;
assert examples.isEmpty() || examples.startsWith(System.lineSeparator() + " Examples:");
assert note != null;
assert since != null;
assert deprecated != null;

this.className = className;
this.db = db;
Expand All @@ -91,32 +98,52 @@ public ExpressionInfo(
this.examples = examples;
this.note = note;
this.since = since;
this.deprecated = deprecated;

// Make the extended description.
this.extended = arguments + examples;
if (this.extended.isEmpty()) {
this.extended = "\n No example/argument for _FUNC_.\n";
}
if (!note.isEmpty()) {
if (!note.contains(" ") || !note.endsWith(" ")) {
throw new IllegalArgumentException("'note' is malformed in the expression [" +
this.name + "]. It should start with a newline and 4 leading spaces; end " +
"with a newline and two spaces; however, got [" + note + "].");
}
this.extended += "\n Note:\n " + note.trim() + "\n";
}
if (!since.isEmpty()) {
if (Integer.parseInt(since.split("\\.")[0]) < 0) {
throw new IllegalArgumentException("'since' is malformed in the expression [" +
this.name + "]. It should not start with a negative number; however, " +
"got [" + since + "].");
}
this.extended += "\n Since: " + since + "\n";
}
if (!deprecated.isEmpty()) {
if (!deprecated.contains(" ") || !deprecated.endsWith(" ")) {
throw new IllegalArgumentException("'deprecated' is malformed in the " +
"expression [" + this.name + "]. It should start with a newline and 4 " +
"leading spaces; end with a newline and two spaces; however, got [" +
deprecated + "].");
}
this.extended += "\n Deprecated:\n " + deprecated.trim() + "\n";
}
}

public ExpressionInfo(String className, String name) {
this(className, null, name, null, "", "", "", "");
this(className, null, name, null, "", "", "", "", "");
}

public ExpressionInfo(String className, String db, String name) {
this(className, db, name, null, "", "", "", "");
this(className, db, name, null, "", "", "", "", "");
}

// This is to keep the original constructor just in case.
public ExpressionInfo(String className, String db, String name, String usage, String extended) {
// `arguments` and `examples` are concatenated for the extended description. So, here
// simply pass the `extended` as `arguments` and an empty string for `examples`.
this(className, db, name, usage, extended, "", "", "");
this(className, db, name, usage, extended, "", "", "", "");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -621,7 +621,7 @@ object FunctionRegistry {
val clazz = scala.reflect.classTag[Cast].runtimeClass
val usage = "_FUNC_(expr) - Casts the value `expr` to the target data type `_FUNC_`."
val expressionInfo =
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "")
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "")
(name, (expressionInfo, builder))
}

Expand All @@ -641,7 +641,8 @@ object FunctionRegistry {
df.arguments(),
df.examples(),
df.note(),
df.since())
df.since(),
df.deprecated())
} else {
// This exists for the backward compatibility with old `ExpressionDescription`s defining
// the extended description in `extended()`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -964,7 +964,9 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi
> SELECT _FUNC_(array(1, 20, null, 3));
[20,null,3,1]
""",
note = "The function is non-deterministic.",
note = """
The function is non-deterministic.
""",
since = "2.4.0")
case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
extends UnaryExpression with ExpectsInputTypes with Stateful with ExpressionWithRandomSeed {
Expand Down Expand Up @@ -1048,7 +1050,9 @@ case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
[3,4,1,2]
""",
since = "1.5.0",
note = "Reverse logic for arrays is available since 2.4.0."
note = """
Reverse logic for arrays is available since 2.4.0.
"""
)
case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {

Expand Down Expand Up @@ -2062,7 +2066,9 @@ case class ElementAt(left: Expression, right: Expression)
> SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
[1,2,3,4,5,6]
""",
note = "Concat logic for arrays is available since 2.4.0.")
note = """
Concat logic for arrays is available since 2.4.0.
""")
case class Concat(children: Seq[Expression]) extends ComplexTypeMergingExpression {

private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ object CreateStruct extends FunctionBuilder {
"",
"",
"",
"",
"")
("struct", (info, this))
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1018,7 +1018,10 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
""",
since = "1.5.0")
since = "1.5.0",
deprecated = """
Deprecated since 3.0.0. See SPARK-25496.
""")
// scalastyle:on line.size.limit
case class FromUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {
Expand Down Expand Up @@ -1229,7 +1232,10 @@ case class MonthsBetween(
> SELECT _FUNC_('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
""",
since = "1.5.0")
since = "1.5.0",
deprecated = """
Deprecated since 3.0.0. See SPARK-25496.
""")
// scalastyle:on line.size.limit
case class ToUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,9 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
> SELECT _FUNC_();
46707d92-02f4-4817-8116-a4c3b23e6266
""",
note = "The function is non-deterministic.")
note = """
The function is non-deterministic.
""")
// scalastyle:on line.size.limit
case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with Stateful
with ExpressionWithRandomSeed {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,9 @@ trait ExpressionWithRandomSeed {
> SELECT _FUNC_(null);
0.8446490682263027
""",
note = "The function is non-deterministic in general case.")
note = """
The function is non-deterministic in general case.
""")
// scalastyle:on line.size.limit
case class Rand(child: Expression) extends RDG with ExpressionWithRandomSeed {

Expand Down Expand Up @@ -118,7 +120,9 @@ object Rand {
> SELECT _FUNC_(null);
1.1164209726833079
""",
note = "The function is non-deterministic in general case.")
note = """
The function is non-deterministic in general case.
""")
// scalastyle:on line.size.limit
case class Randn(child: Expression) extends RDG with ExpressionWithRandomSeed {

Expand Down
33 changes: 31 additions & 2 deletions sql/gen-sql-markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from collections import namedtuple

ExpressionInfo = namedtuple(
"ExpressionInfo", "className name usage arguments examples note since")
"ExpressionInfo", "className name usage arguments examples note since deprecated")


def _list_function_infos(jvm):
Expand All @@ -42,7 +42,8 @@ def _list_function_infos(jvm):
arguments=jinfo.getArguments().replace("_FUNC_", name),
examples=jinfo.getExamples().replace("_FUNC_", name),
note=jinfo.getNote(),
since=jinfo.getSince()))
since=jinfo.getSince(),
deprecated=jinfo.getDeprecated()))
return sorted(infos, key=lambda i: i.name)


Expand Down Expand Up @@ -136,6 +137,27 @@ def _make_pretty_note(note):
return "**Note:**\n%s\n" % note


def _make_pretty_deprecated(deprecated):
"""
Makes the deprecated description pretty and returns a formatted string if `deprecated`
is not an empty string. Otherwise, returns None.

Expected input:

...

Expected output:
**Deprecated:**

...

"""

if deprecated != "":
deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n")))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, for instance this logic matters.

If we put some strings in ..

@ExpressionDescription(
  ...
  deprecated = """
    blah blah blah
    I am super long line
    
    1. one
    2. I am markdwn

    - bullet marks
    - aa

    You know what? actually you can write a markdown table as well.

    | Tables        | Are           | Cool  |
    | ------------- |:-------------:| -----:|
    | col 3 is      | right-aligned | $1600 |
    | col 2 is      | centered      |   $12 |
  """)

gen-sql-markdown.py here split the doc into each line and then get rid of the first 4 characters for each. Last newline makes a proper indentation between each field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will people use multi-line string literal in Scala? e.g.

deprecated = """
  |xxx
  |xxx
""".stripMargin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That cannot be used here due to compilation error. It seems annotation parameters only allow constants. There was a discussion about it before when I first add examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the expression is inside an object? e.g.

object A {
  @ExpressionDescription(
    note = """
      blabla
    """ )
  class B
}

Then the string has 6 spaces at the beginning.

It looks to me that we can relax the format a little bit: just trim all the spaces at the beginning. Then we can even use a single line string.

Copy link
Member Author

@HyukjinKwon HyukjinKwon Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in that case it might be a problem although there look no case so far after this fix (I skimmed the built doc).

The thing we should take into account is we shouldn't trim all the white spaces all because spacing matters in md rendering. For instance, if we have a doc like ..

object A {
  @ExpressionDescription(
    note = """
      blabla
      - first
          blabla
      - second
          blabla
    """ )
  class B
}

We should only trim leading 6 spaces for each line. So, it will bring a bit of complication. It's doable though.

If you don't mind, I will handle it later. Was thinking it better have a single format for now. We should encourage to have more verbose and detailed doc anyway (rather than single line doc) in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that makes sense, it's more tricky than I thought...

return "**Deprecated:**\n%s\n" % deprecated


def generate_sql_markdown(jvm, path):
"""
Generates a markdown file after listing the function information. The output file
Expand All @@ -162,6 +184,10 @@ def generate_sql_markdown(jvm, path):

**Since:** SINCE

**Deprecated:**

DEPRECATED

<br/>

"""
Expand All @@ -174,6 +200,7 @@ def generate_sql_markdown(jvm, path):
examples = _make_pretty_examples(info.examples)
note = _make_pretty_note(info.note)
since = info.since
deprecated = _make_pretty_deprecated(info.deprecated)

mdfile.write("### %s\n\n" % name)
if usage is not None:
Expand All @@ -186,6 +213,8 @@ def generate_sql_markdown(jvm, path):
mdfile.write(note)
if since is not None and since != "":
mdfile.write("**Since:** %s\n\n" % since.strip())
if deprecated is not None:
mdfile.write(deprecated)
mdfile.write("<br/>\n\n")


Expand Down