Skip to content

[SPARK-26982][SQL] Enhance describe framework to describe the output of a query. #23883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sql-reserved-and-non-reserved-keywords.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ The list of reserved and non-reserved keywords can change according to the confi
<tr><td>DEREF</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>DESC</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>DESCRIBE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>QUERY</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>DESCRIPTOR</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>DETERMINISTIC</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>DFS</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ statement
| (DESC | DESCRIBE) DATABASE EXTENDED? identifier #describeDatabase
| (DESC | DESCRIBE) TABLE? option=(EXTENDED | FORMATTED)?
tableIdentifier partitionSpec? describeColName? #describeTable
| (DESC | DESCRIBE) QUERY? queryToDesc #describeQuery
| REFRESH TABLE tableIdentifier #refreshTable
| REFRESH (STRING | .*?) #refreshResource
| CACHE LAZY? TABLE tableIdentifier
Expand Down Expand Up @@ -255,6 +256,10 @@ query
: ctes? queryNoWith
;

queryToDesc
: queryTerm queryOrganization
;

insertInto
: INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)? #insertOverwriteTable
| INSERT INTO TABLE? tableIdentifier partitionSpec? #insertIntoTable
Expand Down Expand Up @@ -780,7 +785,7 @@ ansiNonReserved
| INPUTFORMAT | INSERT | INTERVAL | ITEMS | KEYS | LAST | LATERAL | LAZY | LIKE | LIMIT | LINES | LIST | LOAD
| LOCAL | LOCATION | LOCK | LOCKS | LOGICAL | MACRO | MAP | MSCK | NO | NULLS | OF | OPTION | OPTIONS | OUT
| OUTPUTFORMAT | OVER | OVERWRITE | PARTITION | PARTITIONED | PARTITIONS | PERCENT | PERCENTLIT | PIVOT | PRECEDING
| PRINCIPALS | PURGE | RANGE | RECORDREADER | RECORDWRITER | RECOVER | REDUCE | REFRESH | RENAME | REPAIR | REPLACE
| PRINCIPALS | PURGE | QUERY | RANGE | RECORDREADER | RECORDWRITER | RECOVER | REDUCE | REFRESH | RENAME | REPAIR | REPLACE
| RESET | RESTRICT | REVOKE | RLIKE | ROLE | ROLES | ROLLBACK | ROLLUP | ROW | ROWS | SCHEMA | SEPARATED | SERDE
| SERDEPROPERTIES | SET | SETS | SHOW | SKEWED | SORT | SORTED | START | STATISTICS | STORED | STRATIFY | STRUCT
| TABLES | TABLESAMPLE | TBLPROPERTIES | TEMPORARY | TERMINATED | TOUCH | TRANSACTION | TRANSACTIONS | TRANSFORM
Expand All @@ -805,7 +810,7 @@ nonReserved
| LATERAL | LAZY | LEADING | LIKE | LIMIT | LINES | LIST | LOAD | LOCAL | LOCATION | LOCK | LOCKS | LOGICAL | MACRO
| MAP | MSCK | NO | NOT | NULL | NULLS | OF | ONLY | OPTION | OPTIONS | OR | ORDER | OUT | OUTER | OUTPUTFORMAT
| OVER | OVERLAPS | OVERWRITE | PARTITION | PARTITIONED | PARTITIONS | PERCENTLIT | PIVOT | POSITION | PRECEDING
| PRIMARY | PRINCIPALS | PURGE | RANGE | RECORDREADER | RECORDWRITER | RECOVER | REDUCE | REFERENCES | REFRESH
| PRIMARY | PRINCIPALS | PURGE | QUERY | RANGE | RECORDREADER | RECORDWRITER | RECOVER | REDUCE | REFERENCES | REFRESH
| RENAME | REPAIR | REPLACE | RESET | RESTRICT | REVOKE | RLIKE | ROLE | ROLES | ROLLBACK | ROLLUP | ROW | ROWS
| SELECT | SEPARATED | SERDE | SERDEPROPERTIES | SESSION_USER | SET | SETS | SHOW | SKEWED | SOME | SORT | SORTED
| START | STATISTICS | STORED | STRATIFY | STRUCT | TABLE | TABLES | TABLESAMPLE | TBLPROPERTIES | TEMPORARY
Expand Down Expand Up @@ -883,6 +888,7 @@ WITH: 'WITH';
VALUES: 'VALUES';
CREATE: 'CREATE';
TABLE: 'TABLE';
QUERY: 'QUERY';
DIRECTORY: 'DIRECTORY';
VIEW: 'VIEW';
REPLACE: 'REPLACE';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,10 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
}
}

override def visitQueryToDesc(ctx: QueryToDescContext): LogicalPlan = withOrigin(ctx) {
plan(ctx.queryTerm).optionalMap(ctx.queryOrganization)(withQueryResultClauses)
}

/**
* Create a named logical plan.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ class TableIdentifierParserSuite extends SparkFunSuite {
"cursor", "date", "decimal", "delete", "describe", "double", "drop", "exists", "external",
"false", "fetch", "float", "for", "grant", "group", "grouping", "import", "in",
"insert", "int", "into", "is", "pivot", "lateral", "like", "local", "none", "null",
"of", "order", "out", "outer", "partition", "percent", "procedure", "range", "reads", "revoke",
"rollup", "row", "rows", "set", "smallint", "table", "timestamp", "to", "trigger",
"of", "order", "out", "outer", "partition", "percent", "procedure", "query", "range", "reads",
"revoke", "rollup", "row", "rows", "set", "smallint", "table", "timestamp", "to", "trigger",
"true", "truncate", "update", "user", "values", "with", "regexp", "rlike",
"bigint", "binary", "boolean", "current_date", "current_timestamp", "date", "double", "float",
"int", "smallint", "timestamp", "at", "position", "both", "leading", "trailing", "extract")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import java.sql.{Date, Timestamp}

import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, TimestampFormatter}
import org.apache.spark.sql.execution.command.{DescribeTableCommand, ExecutedCommandExec, ShowTablesCommand}
import org.apache.spark.sql.execution.command.{DescribeCommandBase, ExecutedCommandExec, ShowTablesCommand}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

Expand All @@ -35,7 +35,7 @@ object HiveResult {
* `SparkSQLDriver` for CLI applications.
*/
def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
case ExecutedCommandExec(desc: DescribeTableCommand) =>
case ExecutedCommandExec(_: DescribeCommandBase) =>
// If it is a describe command for a Hive table, we want to have the output format
// be similar with Hive.
executedPlan.executeCollectPublic().map {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,13 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
}
}

/**
* Create a [[DescribeQueryCommand]] logical command.
*/
override def visitDescribeQuery(ctx: DescribeQueryContext): LogicalPlan = withOrigin(ctx) {
DescribeQueryCommand(visitQueryToDesc(ctx.queryToDesc()))
}

This comment was marked as outdated.


/**
* Type to keep track of a table header: (identifier, isTemporary, ifNotExists, isExternal).
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ import org.apache.hadoop.fs.{FileContext, FsConstants, Path}

import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.analysis.{NoSuchPartitionException, UnresolvedAttribute}
import org.apache.spark.sql.catalyst.analysis.{NoSuchPartitionException, UnresolvedAttribute, UnresolvedRelation}
import org.apache.spark.sql.catalyst.catalog._
import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.Histogram
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.util.{escapeSingleQuotedString, quoteIdentifier}
import org.apache.spark.sql.execution.datasources.{DataSource, PartitioningUtils}
import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
Expand Down Expand Up @@ -494,6 +494,34 @@ case class TruncateTableCommand(
}
}

abstract class DescribeCommandBase extends RunnableCommand {
override val output: Seq[Attribute] = Seq(
// Column names are based on Hive.
AttributeReference("col_name", StringType, nullable = false,
new MetadataBuilder().putString("comment", "name of the column").build())(),
AttributeReference("data_type", StringType, nullable = false,
new MetadataBuilder().putString("comment", "data type of the column").build())(),
AttributeReference("comment", StringType, nullable = true,
new MetadataBuilder().putString("comment", "comment of the column").build())()
)

protected def describeSchema(
schema: StructType,
buffer: ArrayBuffer[Row],
header: Boolean): Unit = {
if (header) {
append(buffer, s"# ${output.head.name}", output(1).name, output(2).name)
}
schema.foreach { column =>
append(buffer, column.name, column.dataType.simpleString, column.getComment().orNull)
}
}

protected def append(
buffer: ArrayBuffer[Row], column: String, dataType: String, comment: String): Unit = {
buffer += Row(column, dataType, comment)
}
}
/**
* Command that looks like
* {{{
Expand All @@ -504,17 +532,7 @@ case class DescribeTableCommand(
table: TableIdentifier,
partitionSpec: TablePartitionSpec,
isExtended: Boolean)
extends RunnableCommand {

override val output: Seq[Attribute] = Seq(
// Column names are based on Hive.
AttributeReference("col_name", StringType, nullable = false,
new MetadataBuilder().putString("comment", "name of the column").build())(),
AttributeReference("data_type", StringType, nullable = false,
new MetadataBuilder().putString("comment", "data type of the column").build())(),
AttributeReference("comment", StringType, nullable = true,
new MetadataBuilder().putString("comment", "comment of the column").build())()
)
extends DescribeCommandBase {

override def run(sparkSession: SparkSession): Seq[Row] = {
val result = new ArrayBuffer[Row]
Expand Down Expand Up @@ -603,22 +621,31 @@ case class DescribeTableCommand(
}
table.storage.toLinkedHashMap.foreach(s => append(buffer, s._1, s._2, ""))
}
}

private def describeSchema(
schema: StructType,
buffer: ArrayBuffer[Row],
header: Boolean): Unit = {
if (header) {
append(buffer, s"# ${output.head.name}", output(1).name, output(2).name)
}
schema.foreach { column =>
append(buffer, column.name, column.dataType.simpleString, column.getComment().orNull)
}
}
/**
* Command that looks like
* {{{
* DESCRIBE [QUERY] statement
* }}}
*
* Parameter 'statement' can be one of the following types :
* 1. SELECT statements
* 2. SELECT statements inside set operators (UNION, INTERSECT etc)
* 3. VALUES statement.
* 4. TABLE statement. Example : TABLE table_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean we support DESCRIBE QUERY TABLE t?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, just checked and we do support it...

* 5. statements of the form 'FROM table SELECT *'
*
* TODO : support CTEs.
*/
case class DescribeQueryCommand(query: LogicalPlan)
extends DescribeCommandBase {

private def append(
buffer: ArrayBuffer[Row], column: String, dataType: String, comment: String): Unit = {
buffer += Row(column, dataType, comment)
override def run(sparkSession: SparkSession): Seq[Row] = {
val result = new ArrayBuffer[Row]
val queryExecution = sparkSession.sessionState.executePlan(query)
describeSchema(queryExecution.analyzed.schema, result, header = false)
result
}
}

Expand Down
27 changes: 27 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/describe-query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- Test tables
CREATE table desc_temp1 (key int COMMENT 'column_comment', val string) USING PARQUET;
CREATE table desc_temp2 (key int, val string) USING PARQUET;

-- Simple Describe query
DESC SELECT key, key + 1 as plusone FROM desc_temp1;
DESC QUERY SELECT * FROM desc_temp2;
DESC SELECT key, COUNT(*) as count FROM desc_temp1 group by key;
DESC SELECT 10.00D as col1;
DESC QUERY SELECT key FROM desc_temp1 UNION ALL select CAST(1 AS DOUBLE);
DESC QUERY VALUES(1.00D, 'hello') as tab1(col1, col2);
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add DESC QUERY FROM desc_temp1 a SELECT * at line 13?

DESC QUERY FROM desc_temp1 a SELECT *;


-- Error cases.
DESC WITH s AS (SELECT 'hello' as col1) SELECT * FROM s;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dilipbiswal .
Is this the only exception case in SELECT syntax? Do we have a plan to support this SELECT query with WITH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Yeah.. Wenchen suggested that we start with simple selects and then improve on it. I am planning to look into CTEs next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

DESCRIBE QUERY WITH s AS (SELECT * from desc_temp1) SELECT * FROM s;
DESCRIBE INSERT INTO desc_temp1 values (1, 'val1');
DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2;
DESCRIBE
FROM desc_temp1 a
insert into desc_temp1 select *
insert into desc_temp2 select *;

-- cleanup
DROP TABLE desc_temp1;
DROP TABLE desc_temp2;
Loading