Skip to content

Commit 63bced9

Browse files
Branden SmithHyukjinKwon
Branden Smith
authored andcommitted
[SPARK-26745][SQL][TESTS] JsonSuite test case: empty line -> 0 record count
## What changes were proposed in this pull request? This PR consists of the `test` components of #23665 only, minus the associated patch from that PR. It adds a new unit test to `JsonSuite` which verifies that the `count()` returned from a `DataFrame` loaded from JSON containing empty lines does not include those empty lines in the record count. The test runs `count` prior to otherwise reading data from the `DataFrame`, so as to catch future cases where a pre-parsing optimization might result in `count` results inconsistent with existing behavior. This PR is intended to be deployed alongside #23667; `master` currently causes the test to fail, as described in [SPARK-26745](https://issues.apache.org/jira/browse/SPARK-26745). ## How was this patch tested? Manual testing, existing `JsonSuite` unit tests. Closes #23674 from sumitsu/json_emptyline_count_test. Authored-by: Branden Smith <branden.smith@publicismedia.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent c624f5d commit 63bced9

File tree

1 file changed

+12
-0
lines changed
  • sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json

1 file changed

+12
-0
lines changed

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2426,6 +2426,18 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
24262426
countForMalformedJSON(0, Seq(""))
24272427
}
24282428

2429+
test("SPARK-26745: count() for non-multiline input with empty lines") {
2430+
withTempPath { tempPath =>
2431+
val path = tempPath.getCanonicalPath
2432+
Seq("""{ "a" : 1 }""", "", """ { "a" : 2 }""", " \t ")
2433+
.toDS()
2434+
.repartition(1)
2435+
.write
2436+
.text(path)
2437+
assert(spark.read.json(path).count() === 2)
2438+
}
2439+
}
2440+
24292441
test("SPARK-25040: empty strings should be disallowed") {
24302442
def failedOnEmptyString(dataType: DataType): Unit = {
24312443
val df = spark.read.schema(s"a ${dataType.catalogString}")

0 commit comments

Comments
 (0)