[SPARK-10310] [SQL] Fixes script transformation field/line delimiters #8860

liancheng · 2015-09-22T01:59:49Z

Please attribute this PR to Zhichao Li <zhichao.li@intel.com>.

This PR is based on PR #8476 authored by @zhichao-li. It fixes SPARK-10310 by adding field delimiter SerDe property to the default LazySimpleSerDe, and enabling default record reader/writer classes.

Currently, we only support LazySimpleSerDe, used together with TextRecordReader and TextRecordWriter, and don't support customizing record reader/writer using RECORDREADER/RECORDWRITER clauses. This should be addressed in separate PR(s).

SparkQA · 2015-09-22T03:44:23Z

Test build #42803 has finished for PR 8860 at commit 7c4b03b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhichao-li · 2015-09-22T04:18:28Z

@liancheng I guess there's still some issue like:
i.e: If user specify serde with "LazySimpleSerde" explicitly, then it would use Text.write to serialize which would not match the behavior of tab as field delimeter and \n as the line delimiter.
Maybe we can adding more support in the other PR since most of the user would only depend on the default behavior, and this can cover the majority usages. :)

liancheng · 2015-09-22T19:03:09Z

@zhichao-li I further special cased LazySimpleSerDe, so that we always use TextRecordReader/TextRecordWriter together with it, and uses can customize field delimiter now. Please check this test case.

yhuai · 2015-09-22T20:05:10Z

test this please

SparkQA · 2015-09-22T20:30:43Z

Test build #42846 has finished for PR 8860 at commit f86b5bc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-09-22T20:33:08Z

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42846/testReport/junit/org.apache.spark.sql.hive.execution/SQLQuerySuite/SPARK_10310__script_transformation_using_LazySimpleSerDe/

Looks like a legitimate failure?

SparkQA · 2015-09-22T21:10:10Z

Test build #42848 has finished for PR 8860 at commit 387ac72.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-22T22:10:46Z

Test build #42855 has finished for PR 8860 at commit 387ac72.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CountVectorizer(JavaEstimator, HasInputCol, HasOutputCol):
- class CountVectorizerModel(JavaModel):
- s"Failed to convert value $v (class of $
- s"Failed to convert value $v (class of $
- case class Sort(

yhuai · 2015-09-23T00:54:46Z

@zhichao-li Can you try this PR?

zhichao-li · 2015-09-23T01:42:12Z

LGTM

yhuai · 2015-09-23T02:40:51Z

Thanks! Merging to master and branch 1.5.

**Please attribute this PR to `Zhichao Li <zhichao.liintel.com>`.** This PR is based on PR #8476 authored by zhichao-li. It fixes SPARK-10310 by adding field delimiter SerDe property to the default `LazySimpleSerDe`, and enabling default record reader/writer classes. Currently, we only support `LazySimpleSerDe`, used together with `TextRecordReader` and `TextRecordWriter`, and don't support customizing record reader/writer using `RECORDREADER`/`RECORDWRITER` clauses. This should be addressed in separate PR(s). Author: Cheng Lian <lian@databricks.com> Closes #8860 from liancheng/spark-10310/fix-script-trans-delimiters. (cherry picked from commit 84f81e0) Signed-off-by: Yin Huai <yhuai@databricks.com>

**Please attribute this PR to `Zhichao Li <zhichao.liintel.com>`.** This PR is based on PR apache#8476 authored by zhichao-li. It fixes SPARK-10310 by adding field delimiter SerDe property to the default `LazySimpleSerDe`, and enabling default record reader/writer classes. Currently, we only support `LazySimpleSerDe`, used together with `TextRecordReader` and `TextRecordWriter`, and don't support customizing record reader/writer using `RECORDREADER`/`RECORDWRITER` clauses. This should be addressed in separate PR(s). Author: Cheng Lian <lian@databricks.com> Closes apache#8860 from liancheng/spark-10310/fix-script-trans-delimiters. (cherry picked from commit 84f81e0) Signed-off-by: Yin Huai <yhuai@databricks.com> (cherry picked from commit 73d0621)

liancheng mentioned this pull request Sep 22, 2015

[SPARK-10310][SQL]Using \t as the field delimeter and \n as the line delimeter #8476

Closed

Fixes script transformation field/line delimiters

2e7f1c6

liancheng force-pushed the spark-10310/fix-script-trans-delimiters branch 2 times, most recently from f86b5bc to 8d36775 Compare September 22, 2015 18:59

Special cases LazySimpleSerDe

387ac72

liancheng force-pushed the spark-10310/fix-script-trans-delimiters branch from 8d36775 to 387ac72 Compare September 22, 2015 19:02

asfgit closed this in 84f81e0 Sep 23, 2015

liancheng deleted the spark-10310/fix-script-trans-delimiters branch September 24, 2015 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10310] [SQL] Fixes script transformation field/line delimiters #8860

[SPARK-10310] [SQL] Fixes script transformation field/line delimiters #8860

Uh oh!

liancheng commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

zhichao-li commented Sep 22, 2015

Uh oh!

liancheng commented Sep 22, 2015

Uh oh!

yhuai commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

yhuai commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

yhuai commented Sep 23, 2015

Uh oh!

zhichao-li commented Sep 23, 2015

Uh oh!

yhuai commented Sep 23, 2015

Uh oh!

Uh oh!

[SPARK-10310] [SQL] Fixes script transformation field/line delimiters #8860

[SPARK-10310] [SQL] Fixes script transformation field/line delimiters #8860

Uh oh!

Conversation

liancheng commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

zhichao-li commented Sep 22, 2015

Uh oh!

liancheng commented Sep 22, 2015

Uh oh!

yhuai commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

yhuai commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

SparkQA commented Sep 22, 2015

Uh oh!

yhuai commented Sep 23, 2015

Uh oh!

zhichao-li commented Sep 23, 2015

Uh oh!

yhuai commented Sep 23, 2015

Uh oh!

Uh oh!