Support loading of RDDs (of case classes) from CSV. #52

rayortigas · 2015-04-19T01:55:47Z

I'm still in RDD-land, so I'd like something like this to avoid writing things like

val rdd = sqlContext.csvFile(path, useHeader = false).map { row =>
  Foo(row.getString(0).toInt, row.getString(1).toInt, row.getString(2).toDouble)
}

So instead we can write

val rdd = sqlContext.csvFileToRDD[Foo](path, useHeader = false)

I tried to be minimally invasive here by building on top of csvFile. With more refactoring, I probably would've teased out some stuff in CsvRelation, but I hope this PR is useful in its present form.

Regards,
Ray

Squashed commit of the following: commit e75167f Author: Ray Ortigas <rayo@linkedin.com> Date: Sat Apr 18 15:39:30 2015 -0700 Test for rejection of case classes with non-primitive fields. commit c4a1de0 Author: Ray Ortigas <rayo@linkedin.com> Date: Sat Apr 18 11:54:53 2015 -0700 Don't inherit from csv.CsvContext. commit 674672d Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:37:52 2015 -0700 Add TSV support. commit e93ec4c Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:22:52 2015 -0700 Add comment about not handling inner case classes. commit 1495f51 Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:22:38 2015 -0700 Add test for headerless CSV. commit 6f7fcf3 Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:12:19 2015 -0700 Add test for permissive mode (which is invalid). commit ccbb6ba Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:10:54 2015 -0700 Add test for fail-fast mode. commit fb0f50d Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 19:04:33 2015 -0700 Add test. commit 51a9868 Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 17:21:13 2015 -0700 Move RDD-related methods to own package. commit f5a2c2c Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 16:31:10 2015 -0700 Use TypeTag and ClassTag instead of manifest. commit ffed4fc Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 15:41:32 2015 -0700 Express csvFileToRDD() in terms of csvFile(). commit b52f582 Author: Ray Ortigas <rayo@linkedin.com> Date: Fri Apr 17 15:38:43 2015 -0700 First cut at typed RDD.

rxin · 2015-04-19T06:23:59Z

@rayortigas this seems like something that can easily live outside of the CSV package. There isn't anything specific to CSV about this one.

As a matter of fact it probably deserves to either be part of the DataFrame API, or just an implicit conversion on DataFrame to add the following:

// or called toTyped, or typedRDD
def toTypedRDD[T : scala.reflect.runtime.universe.TypeTag : scala.reflect.ClassTag]: RDD[T] = {
   ...
}

rayortigas · 2015-04-19T13:58:13Z

@rxin I'd love for DataFrames to support it directly... I picked CSV first because the conversion was more straightforward (just a row of primitives). :D

Maybe I'll put together a PR for spark proper that handles more complex objects? I see what ScalaReflection is doing (and I think I saw the latest refactoring), so I'll take a cue from that.

rayortigas · 2015-04-27T06:47:14Z

OK, I opened apache/spark#5713. Thanks for the suggestion @rxin!

rayortigas added 2 commits April 18, 2015 18:43

Address style checking by spark-csv Travis CI.

2649026

rayortigas mentioned this pull request Apr 27, 2015

[SPARK-7160][SQL] Support converting DataFrames to typed RDDs. apache/spark#5713

Closed

rayortigas closed this Apr 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support loading of RDDs (of case classes) from CSV. #52

Support loading of RDDs (of case classes) from CSV. #52

Uh oh!

rayortigas commented Apr 19, 2015

Uh oh!

rxin commented Apr 19, 2015

Uh oh!

rayortigas commented Apr 19, 2015

Uh oh!

rayortigas commented Apr 27, 2015

Uh oh!

Uh oh!

Support loading of RDDs (of case classes) from CSV. #52

Support loading of RDDs (of case classes) from CSV. #52

Uh oh!

Conversation

rayortigas commented Apr 19, 2015

Uh oh!

rxin commented Apr 19, 2015

Uh oh!

rayortigas commented Apr 19, 2015

Uh oh!

rayortigas commented Apr 27, 2015

Uh oh!

Uh oh!