Skip to content

[SPARK-16814][SQL] Fix deprecated parquet constructor usage #14419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Jul 30, 2016

What changes were proposed in this pull request?

Replace deprecated ParquetWriter with the new builders

How was this patch tested?

Existing tests

@SparkQA
Copy link

SparkQA commented Jul 30, 2016

Test build #63039 has finished for PR 14419 at commit e10c147.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ParquetWriterBuilder() extends

@srowen
Copy link
Member

srowen commented Jul 30, 2016

Do we know if this works across versions of Parquet that are still in play for Spark builds? I know we've got some YARN and Scala warnings that we can't adjust for that reason. Just double checking. @HyukjinKwon did you look at this too or am I imagining it?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 30, 2016

Thank you for cc me! Yes, actually I took a look for this as well and it seems okay (I was thinking doing this with other issues together).
If my understanding is correct, this would not affect actual writing with Parquet in Spark but just tests.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 30, 2016

Could I cc @liancheng who has more insight about this please?

@holdenk
Copy link
Contributor Author

holdenk commented Jul 30, 2016

So yes as @HyukjinKwon says its just in the tests.
I figured this would be ok across the hadoop versions we target because I looked at the build deps files and they all listed parquet-hadoop-1.8.1 for the different hadoop profiles we build with (grep -r parquet-hadoop ./dev/deps/)

@holdenk
Copy link
Contributor Author

holdenk commented Jul 30, 2016

@HyukjinKwon do you have a PR for this? If so I can go ahead and close this one (and lets maybe sync some on the deprecation stuff we are both cleaning up :))

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 30, 2016

No, I don't :). I just meant I wanted to say that this fix is reasonable and the reason I didn't submit a PR has not been because there is potential incompatability. It's just been because just I was thinking doing this with other ones in the future. I like this PR :).

Sorry for confusing you.

@@ -119,8 +119,19 @@ private[sql] object ParquetCompatibilityTest {
metadata: Map[String, String],
recordWriters: (RecordConsumer => Unit)*): Unit = {
val messageType = MessageTypeParser.parseMessageType(schema)
val writeSupport = new DirectWriteSupport(messageType, metadata)
val parquetWriter = new ParquetWriter[RecordConsumer => Unit](new Path(path), writeSupport)
val testWriteSupport = new DirectWriteSupport(messageType, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what's going on here? This seems like pretty complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so the parquetWriter constructors are deprecated now and its been replaced with a builder interface. For Avro and others there is a standard builder - but for sort "raw" formats you need to implement your own builder. This is equivalent to the old constructor we were using - you can see the deprecation in https://github.com/apache/parquet-mr/pull/199/files as well as how the builder interface ends up calling an equivalent (now protected) constructor. Also since our WriteSupport doesn't need to change based on the configuration we always return the same writesupport regardless of conf.

If it would be useful I can add some of this as a comment in the sourcecode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin Does this explanation make sense for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might just go ahead and add it as a comment for good measure.

Isn't @Override the Java annotation? thought Scala needed @override but could be missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know Java annotation passes Scala style checking. Should we consider adding a rule for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, because I don't think it would actually cause the Scala compiler to verify it overrides. It's not illegal to use a Java annotation, just doesn't do anything in this case?

Copy link
Member

@HyukjinKwon HyukjinKwon Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed here, https://issues.apache.org/jira/browse/SPARK-16877 but I will do some researches and take a look into this deeper first before opening a PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala just do override, not even @OverRide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack yeah that's what I meant, thank you.

@SparkQA
Copy link

SparkQA commented Jul 31, 2016

Test build #63051 has finished for PR 14419 at commit 5e92e6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Jul 31, 2016

@HyukjinKwon no worries - just wanted to make sure we weren't accidentally duplicating eachothers efforts :)

@holdenk holdenk changed the title [SPARK-16814][SQL][WIP] Fix deprecated parquet constructor usage [SPARK-16814][SQL] Fix deprecated parquet constructor usage Aug 1, 2016
val writeSupport = new DirectWriteSupport(messageType, metadata)
val parquetWriter = new ParquetWriter[RecordConsumer => Unit](new Path(path), writeSupport)
val testWriteSupport = new DirectWriteSupport(messageType, metadata)
case class ParquetWriterBuilder() extends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am a bit wondering though is why this has to be case class instead of just class. Because it seems it is self-contained, so it seems not have to be serializable and it seems there is no pattern matching with this. Just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point, they probably don't need to be case classes (just happened to write that way when I started the code) - I'll switch them to regular classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63062 has finished for PR 14419 at commit 2067b63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val testWriteSupport = new DirectWriteSupport(messageType, metadata)
class ParquetWriterBuilder() extends
ParquetWriter.Builder[RecordConsumer => Unit, ParquetWriterBuilder](new Path(path)) {
@Override def getWriteSupport(conf: org.apache.hadoop.conf.Configuration) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import Configuration? and maybe just write the trivial body of both methods inline without braces on one line. Just personal taste really, to keep this anonymous class compact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure :)

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63183 has finished for PR 14419 at commit 1a37b8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 4, 2016

Merged to master

@asfgit asfgit closed this in c5eb1df Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants