-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-6322][SQL] CTAS should consider the case where no file format or storage handler is given #5014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6322][SQL] CTAS should consider the case where no file format or storage handler is given #5014
Changes from all commits
a8d3756
d3c9f6b
6cf2cfe
5b45999
b7c1346
5b611cb
4389607
d5fcf67
b348094
7d82ede
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache} | |
import org.apache.hadoop.hive.metastore.api.{FieldSchema, Partition => TPartition, Table => TTable} | ||
import org.apache.hadoop.hive.metastore.{TableType, Warehouse} | ||
import org.apache.hadoop.hive.ql.metadata._ | ||
import org.apache.hadoop.hive.ql.lib.Node | ||
import org.apache.hadoop.hive.ql.plan.CreateTableDesc | ||
import org.apache.hadoop.hive.serde.serdeConstants | ||
import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe | ||
|
@@ -600,7 +601,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |
|
||
// TODO extra is in type of ASTNode which means the logical plan is not resolved | ||
// Need to think about how to implement the CreateTableAsSelect.resolved | ||
case CreateTableAsSelect(db, tableName, child, allowExisting, Some(extra: ASTNode)) => | ||
case CreateHiveTableAsSelect(db, tableName, child, allowExisting, extra) => | ||
val (dbName, tblName) = processDatabaseAndTableName(db, tableName) | ||
val databaseName = dbName.getOrElse(hive.sessionState.getCurrentDatabase) | ||
|
||
|
@@ -619,7 +620,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |
} | ||
} | ||
|
||
sa.analyze(extra, new Context(hive.hiveconf)) | ||
sa.analyze(extra.asInstanceOf[ASTNode], new Context(hive.hiveconf)) | ||
Some(sa.getQB().getTableDesc) | ||
} | ||
|
||
|
@@ -658,34 +659,6 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |
} | ||
|
||
case p: LogicalPlan if p.resolved => p | ||
|
||
case p @ CreateTableAsSelect(db, tableName, child, allowExisting, None) => | ||
val (dbName, tblName) = processDatabaseAndTableName(db, tableName) | ||
if (hive.convertCTAS) { | ||
if (dbName.isDefined) { | ||
throw new AnalysisException( | ||
"Cannot specify database name in a CTAS statement " + | ||
"when spark.sql.hive.convertCTAS is set to true.") | ||
} | ||
|
||
val mode = if (allowExisting) SaveMode.Ignore else SaveMode.ErrorIfExists | ||
CreateTableUsingAsSelect( | ||
tblName, | ||
hive.conf.defaultDataSourceName, | ||
temporary = false, | ||
mode, | ||
options = Map.empty[String, String], | ||
child | ||
) | ||
} else { | ||
val databaseName = dbName.getOrElse(hive.sessionState.getCurrentDatabase) | ||
execution.CreateTableAsSelect( | ||
databaseName, | ||
tableName, | ||
child, | ||
allowExisting, | ||
None) | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deleting this probably not right. Though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is mostly duplicate with previous case. They are combined. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I see your point, but it seems more reasonable to keep both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically they are almost the same codes except for checking file format or storage handler. So I think we don't need to keep two pattern matchings here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @chenghao-intel Why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The change @viirya made seems reasonable for me here, the only concern is what if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For data source table, it does not use For There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I see. I'll agree with you to not mix the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. Sounds good. |
||
} | ||
|
||
|
@@ -764,6 +737,14 @@ private[hive] case class InsertIntoHiveTable( | |
} | ||
} | ||
|
||
private[hive] case class CreateHiveTableAsSelect( | ||
databaseName: Option[String], | ||
tableName: String, | ||
child: LogicalPlan, | ||
allowExisting: Boolean, | ||
desc: Node) extends CreateTableAsSelect { | ||
} | ||
|
||
private[hive] case class MetastoreRelation | ||
(databaseName: String, tableName: String, alias: Option[String]) | ||
(val table: TTable, val partitions: Seq[TPartition]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this trait here, can we just make the implementation below a
Command
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we cannot use
Command
because aCommand
is aLeafNode
. If we make it aCommand
, the child logical plan will not be analyzed and optimized.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it will be good to have a different type of
Command
that is not aLeafNode
.