Skip to content

Commit

Permalink
[SPARK-49501][SQL] Fix double-escaping of table location
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call.

In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.

### Why are the changes needed?

Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`:
```
spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))
```

Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'`

### Does this PR introduce _any_ user-facing change?

Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.

### How was this patch tested?

Updated existing test in `CatalogSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47976 from cstavr/location-double-escaping.

Authored-by: Christos Stavrakakis <christos.stavrakakis@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
cstavr authored and cloud-fan committed Sep 9, 2024
1 parent 532aaaf commit dc3333b
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
import org.apache.spark.sql.catalyst.parser.ParseException
import org.apache.spark.sql.catalyst.plans.logical.{ColumnDefinition, CreateTable, LocalRelation, LogicalPlan, OptionList, RecoverPartitions, ShowFunctions, ShowNamespaces, ShowTables, UnresolvedTableSpec, View}
import org.apache.spark.sql.catalyst.types.DataTypeUtils
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
import org.apache.spark.sql.connector.catalog.{CatalogManager, SupportsNamespaces, TableCatalog}
import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.{CatalogHelper, MultipartIdentifierHelper, NamespaceHelper, TransformHelper}
import org.apache.spark.sql.errors.QueryCompilationErrors
Expand Down Expand Up @@ -671,12 +672,9 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
} else {
CatalogTableType.MANAGED
}
val location = if (storage.locationUri.isDefined) {
val locationStr = storage.locationUri.get.toString
Some(locationStr)
} else {
None
}

// The location in UnresolvedTableSpec should be the original user-provided path string.
val location = CaseInsensitiveMap(options).get("path")

val newOptions = OptionList(options.map { case (key, value) =>
(key, Literal(value).asInstanceOf[Expression])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,8 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
val description = "this is a test table"

withTable("t") {
withTempDir { dir =>
withTempDir { baseDir =>
val dir = new File(baseDir, "test%prefix")
spark.catalog.createTable(
tableName = "t",
source = "json",
Expand Down

0 comments on commit dc3333b

Please sign in to comment.