Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

javaConversions Doesnt work #19

Closed
Jacke opened this issue Apr 21, 2015 · 24 comments
Closed

javaConversions Doesnt work #19

Jacke opened this issue Apr 21, 2015 · 24 comments

Comments

@Jacke
Copy link

Jacke commented Apr 21, 2015

Hello. My env scala 2.11 protobuf 3 alpha

syntax = "proto3";

package grpc.playground.helloworld;

option java_multiple_files = true;
option java_package = "grpc.playground.helloworld";
option java_outer_classname = "HelloWorldProto";

// The greeting service definition.
service Greeter {
  rpc SayHello (HelloRequest) returns (HelloResponse) {}
}

// The request message containing the user's name.
message HelloRequest {
  string name = 1;
}

// The response message containing the greetings
message HelloResponse {
  string message = 1;
}

When i compiled it, java compiler show me this

HelloRequest.scala:50: type HelloRequest is not a member of object grpc.playground.helloworld.HelloWorldProto

For every class. At first I thought it was error with package, but when i comment class_name in protofie, it appeared again


hello-world/target/scala-2.11/src_managed/main/compiled_protobuf/grpc/playground/helloworld/hello_world/HelloRequest.scala:50: type HelloRequest is not a member of object grpc.playground.helloworld.HelloWorld
[error] object HelloRequest extends com.trueaccord.scalapb.GeneratedMessageCompanion[HelloRequest] with com.trueaccord.scalapb.JavaProtoSupport[HelloRequest, grpc.playground.helloworld.HelloWorld.HelloRequest]  {


When i turn off java conv it all work fine. It's bug?

@thesamet
Copy link
Contributor

ScalaPB is currently only compatible with proto2, not proto3 language level. Support for proto3 will be added soon.

@Jacke
Copy link
Author

Jacke commented May 2, 2015

Alright, we will wait for it. Thanks for the excellent library. We will use it in new google grpc-scala project

@vyshane
Copy link

vyshane commented Jul 13, 2015

Is there an ETA on proto3 support?

@thesamet
Copy link
Contributor

Proto3 is supported in v0.5.9. Can you try it and send feedback?

On Mon, Jul 13, 2015 at 7:58 AM, Vy-Shane notifications@github.com wrote:

Is there an ETA on proto3 support?


Reply to this email directly or view it on GitHub
#19 (comment).

-Nadav

@vyshane
Copy link

vyshane commented Jul 13, 2015

Awesome, I'll try it out.

@thesamet
Copy link
Contributor

Closing. Please feel to re-open if there are additional issues.

@skoppar
Copy link

skoppar commented May 23, 2016

I am facing lot of errors as below:
/Users/skoppar/workspace/NrwlSbt/target/scala-2.11/src_managed/main/compiled_protobuf/general/general.scala:511: value general is not a member of object general.general

I am using protobuf 2.5.0 as sparksql-protobuf didnt work with 2.6.1. My Scala version however is 2.11

I have changed scalapb.sbt to be:
libraryDependencies ++= Seq(
"com.github.os72" % "protoc-jar" % "2.5.0.0"
)
addSbtPlugin("com.trueaccord.scalapb" % "sbt-scalapb" % "0.5.26")

and build.sbt to have:
PB.protobufSettings
PB.scalapbVersion := "0.5.26"
PB.runProtoc in PB.protobufConfig := (args => com.github.os72.protocjar.Protoc.runProtoc(args.toArray))

Without these settings, protobuf3.0.0-beta was being used and it was not working

@thesamet
Copy link
Contributor

Are you using sparksql-protobuf or sparksql-scalapb (https://github.com/trueaccord/sparksql-scalapb) ? The latter is known to work with 2.6.1, and ScalaPB is no longer tested with 2.5.0.

I suggest to use latest protoc-jar, but tell it to use protoc 2.6.1:

PB.runProtoc in PB.protobufConfig := (args =>
  com.github.os72.protocjar.Protoc.runProtoc("-v261" +: args.toArray))

If this doesn't help can you create a small project that demonstrate this issue?

@skoppar
Copy link

skoppar commented May 23, 2016

Thanks for the quick response. I am using https://github.com/saurfang/sparksql-protobuf https://github.com/saurfang/sparksql-protobuf. I wasn't aware of https://github.com/trueaccord/sparksql-scalapb https://github.com/trueaccord/sparksql-scalapb
Thanks for the heads up. Hopefully these 2 can mix better.
I will try this out

regards
Sunita

On May 23, 2016, at 12:49 PM, Nadav Samet notifications@github.com wrote:

Are you using sparksql-protobuf or sparksql-scalapb (https://github.com/trueaccord/sparksql-scalapb https://github.com/trueaccord/sparksql-scalapb) ? The latter is known to work with 2.6.1, and ScalaPB is no longer tested with 2.5.0.

I suggest to use latest protoc-jar, but tell it to use protoc 2.6.1:

PB.runProtoc in PB.protobufConfig := (args =>
com.github.os72.protocjar.Protoc.runProtoc("-v261" +: args.toArray))
If this doesn't help can you create a small project that demonstrate this issue?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #19 (comment)

@nadavsr
Copy link
Contributor

nadavsr commented May 23, 2016

Some more hints here: http://trueaccord.github.io/ScalaPB/sparksql.html

On Mon, May 23, 2016 at 12:52 PM, skoppar notifications@github.com wrote:

Thanks for the quick response. I am using
https://github.com/saurfang/sparksql-protobuf <
https://github.com/saurfang/sparksql-protobuf>. I wasn't aware of
https://github.com/trueaccord/sparksql-scalapb <
https://github.com/trueaccord/sparksql-scalapb>
Thanks for the heads up. Hopefully these 2 can mix better.
I will try this out

regards
Sunita

On May 23, 2016, at 12:49 PM, Nadav Samet notifications@github.com
wrote:

Are you using sparksql-protobuf or sparksql-scalapb (
https://github.com/trueaccord/sparksql-scalapb <
https://github.com/trueaccord/sparksql-scalapb>) ? The latter is known to
work with 2.6.1, and ScalaPB is no longer tested with 2.5.0.

I suggest to use latest protoc-jar, but tell it to use protoc 2.6.1:

PB.runProtoc in PB.protobufConfig := (args =>
com.github.os72.protocjar.Protoc.runProtoc("-v261" +: args.toArray))
If this doesn't help can you create a small project that demonstrate
this issue?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub <
https://github.com/trueaccord/ScalaPB/issues/19#issuecomment-221075949>


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#19 (comment)

@skoppar
Copy link

skoppar commented May 23, 2016

Thank you :) I will keep you posted with my progress

regards
Sunita

On May 23, 2016, at 12:53 PM, Nadav S. Samet notifications@github.com wrote:

Some more hints here: http://trueaccord.github.io/ScalaPB/sparksql.html

On Mon, May 23, 2016 at 12:52 PM, skoppar notifications@github.com wrote:

Thanks for the quick response. I am using
https://github.com/saurfang/sparksql-protobuf <
https://github.com/saurfang/sparksql-protobuf>. I wasn't aware of
https://github.com/trueaccord/sparksql-scalapb <
https://github.com/trueaccord/sparksql-scalapb>
Thanks for the heads up. Hopefully these 2 can mix better.
I will try this out

regards
Sunita

On May 23, 2016, at 12:49 PM, Nadav Samet notifications@github.com
wrote:

Are you using sparksql-protobuf or sparksql-scalapb (
https://github.com/trueaccord/sparksql-scalapb <
https://github.com/trueaccord/sparksql-scalapb>) ? The latter is known to
work with 2.6.1, and ScalaPB is no longer tested with 2.5.0.

I suggest to use latest protoc-jar, but tell it to use protoc 2.6.1:

PB.runProtoc in PB.protobufConfig := (args =>
com.github.os72.protocjar.Protoc.runProtoc("-v261" +: args.toArray))
If this doesn't help can you create a small project that demonstrate
this issue?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub <
https://github.com/trueaccord/ScalaPB/issues/19#issuecomment-221075949>


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#19 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #19 (comment)

@skoppar
Copy link

skoppar commented May 23, 2016

Hi Nadav,

Just sent you the project as a tgz file. Also FYI, the -v261 option in com.github.os72.protocjar.Protoc.runProtoc("-v261" +: args.toArray))
doesnt work. It throws the below error during sbt compile:

Unknown flag: -v
java.lang.RuntimeException: protoc returned exit code: 1
at scala.sys.package$.error(package.scala:27)
at sbtprotobuf.ProtobufPlugin$.sbtprotobuf$ProtobufPlugin$$compile(ProtobufPlugin.scala:88)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:115)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:114)

To workaround, I just removed the flag.

@thesamet
Copy link
Contributor

The -v261 flag will work if you update protoc-jar to the latest version. I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't have a package declaration )or java_package option). Try adding package protos; to the top of each proto file. I'll add this to a future FAQ page.

@skoppar
Copy link

skoppar commented May 23, 2016

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of
http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet notifications@github.com
wrote:

The -v261 flag will work if you update protoc-jar to the latest version. I
verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't have a
package declaration )or java_package option). Try adding package protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@skoppar
Copy link

skoppar commented May 25, 2016

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as the
library expects the protobuf in parquet format. The file I have is encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a Row. Is
there an easier way to do this? I can see that the library already have
ProtoParquetWriter, so I am hoping I will be able to convert from protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of
http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet notifications@github.com
wrote:

The -v261 flag will work if you update protoc-jar to the latest version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't have a
package declaration )or java_package option). Try adding package protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@thesamet
Copy link
Contributor

You can do it in two steps, convert the base64 string to Array[Byte], using
decodeBase64
https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar notifications@github.com wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as the
library expects the protobuf in parquet format. The file I have is encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a Row. Is
there an easier way to do this? I can see that the library already have
ProtoParquetWriter, so I am hoping I will be able to convert from protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet notifications@github.com
wrote:

The -v261 flag will work if you update protoc-jar to the latest version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't have
a
package declaration )or java_package option). Try adding package protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#19 (comment)

-Nadav

@skoppar
Copy link

skoppar commented May 25, 2016

Thanks for your suggestions. I'm probably being dumb :(. It still doesnt
work.

override def buildScan(): RDD[Row] = {
sqlContext.sparkContext.textFile(location).map{line =>
Accesslog.accesslog.parseFrom(Base64.decodeBase64(line))}
.map(ProtoSQL.messageToRow(_))
}

From the IDE, it seems to recognize the method calls and seems to
accept all signatures, but when I try to compile, I

still get

[error]
/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
inferred type arguments [com.aol.proto.Accesslog.accesslog] do not conform to
method messageToRow's type parameter bounds [T <:
com.trueaccord.scalapb.GeneratedMessage
with com.trueaccord.scalapb.Message[T]]

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error]
/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
type mismatch;

[error]
found :
com.aol.proto.Accesslog.accesslog

[error]
required: T

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error] two errors found

The case class generated does have

*"extends *com.trueaccord.scalapb.GeneratedMessage *with
*com.trueaccord.scalapb.Message[accesslog] *with
*com.trueaccord.lenses.Updatable[accesslog]"

Why is it not recognized as the type of the GeneratedMessage?

regards
Sunita

On Wed, May 25, 2016 at 10:41 AM, Nadav Samet notifications@github.com
wrote:

You can do it in two steps, convert the base64 string to Array[Byte], using
decodeBase64

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar notifications@github.com
wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as the
library expects the protobuf in parquet format. The file I have is
encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a Row. Is
there an easier way to do this? I can see that the library already have
ProtoParquetWriter, so I am hoping I will be able to convert from
protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet <notifications@github.com

wrote:

The -v261 flag will work if you update protoc-jar to the latest
version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't
have
a
package declaration )or java_package option). Try adding package
protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#19 (comment)

-Nadav


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@skoppar
Copy link

skoppar commented May 25, 2016

Also, curious as to why the method resolution turns out to be the java
implementation. I thought the java and scala classes could be independently
used. The method being invoked resolves to the below definition in
AccessLog.java:

public static com.aol.proto.Accesslog.accesslog parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return PARSER.parseFrom(data);
}

The package structure (com.aol.proto.Accesslog.accesslog) is that of the
scala classes, but the actual definition is in the java file.

regards
Sunita
On Wed, May 25, 2016 at 12:02 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thanks for your suggestions. I'm probably being dumb :(. It still doesnt
work.

override def buildScan(): RDD[Row] = {
sqlContext.sparkContext.textFile(location).map{line => Accesslog.accesslog.parseFrom(Base64.decodeBase64(line))}
.map(ProtoSQL.messageToRow(_))
}

From the IDE, it seems to recognize the method calls and seems to accept all signatures, but when I try to compile, I

still get

[error]
/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
inferred type arguments [com.aol.proto.Accesslog.accesslog] do not conform to
method messageToRow's type parameter bounds [T <: com.trueaccord.scalapb.GeneratedMessage
with com.trueaccord.scalapb.Message[T]]

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error]
/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
type mismatch;

[error]
found :
com.aol.proto.Accesslog.accesslog

[error]
required: T

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error] two errors found

The case class generated does have

*"extends *com.trueaccord.scalapb.GeneratedMessage *with *com.trueaccord.scalapb.Message[accesslog] *with *com.trueaccord.lenses.Updatable[accesslog]"

Why is it not recognized as the type of the GeneratedMessage?

regards
Sunita

On Wed, May 25, 2016 at 10:41 AM, Nadav Samet notifications@github.com
wrote:

You can do it in two steps, convert the base64 string to Array[Byte],
using
decodeBase64

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar notifications@github.com
wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as the
library expects the protobuf in parquet format. The file I have is
encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a Row. Is
there an easier way to do this? I can see that the library already have
ProtoParquetWriter, so I am hoping I will be able to convert from
protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet <
notifications@github.com>
wrote:

The -v261 flag will work if you update protoc-jar to the latest
version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't
have
a
package declaration )or java_package option). Try adding package
protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
<#19 (comment)

-Nadav


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@thesamet
Copy link
Contributor

Can you upload a test project?

On Wed, May 25, 2016, 12:20 PM skoppar notifications@github.com wrote:

Also, curious as to why the method resolution turns out to be the java
implementation. I thought the java and scala classes could be independently
used. The method being invoked resolves to the below definition in
AccessLog.java:

public static com.aol.proto.Accesslog.accesslog parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return PARSER.parseFrom(data);
}

The package structure (com.aol.proto.Accesslog.accesslog) is that of the
scala classes, but the actual definition is in the java file.

regards
Sunita
On Wed, May 25, 2016 at 12:02 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thanks for your suggestions. I'm probably being dumb :(. It still doesnt
work.

override def buildScan(): RDD[Row] = {
sqlContext.sparkContext.textFile(location).map{line =>
Accesslog.accesslog.parseFrom(Base64.decodeBase64(line))}
.map(ProtoSQL.messageToRow(_))
}

From the IDE, it seems to recognize the method calls and seems to accept
all signatures, but when I try to compile, I

still get

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
inferred type arguments [com.aol.proto.Accesslog.accesslog] do not
conform to
method messageToRow's type parameter bounds [T <:
com.trueaccord.scalapb.GeneratedMessage
with com.trueaccord.scalapb.Message[T]]

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:
type mismatch;

[error]
found :
com.aol.proto.Accesslog.accesslog

[error]
required: T

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error] two errors found

The case class generated does have

*"extends *com.trueaccord.scalapb.GeneratedMessage *with
*com.trueaccord.scalapb.Message[accesslog] *with
*com.trueaccord.lenses.Updatable[accesslog]"

Why is it not recognized as the type of the GeneratedMessage?

regards
Sunita

On Wed, May 25, 2016 at 10:41 AM, Nadav Samet notifications@github.com
wrote:

You can do it in two steps, convert the base64 string to Array[Byte],
using
decodeBase64

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar notifications@github.com
wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as the
library expects the protobuf in parquet format. The file I have is
encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a Row.
Is
there an easier way to do this? I can see that the library already
have
ProtoParquetWriter, so I am hoping I will be able to convert from
protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet <
notifications@github.com>
wrote:

The -v261 flag will work if you update protoc-jar to the latest
version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file don't
have
a
package declaration )or java_package option). Try adding package
protos;
to the top of each proto file. I'll add this to a future FAQ page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
<
#19 (comment)

-Nadav


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<#19 (comment)


You are receiving this because you modified the open/close state.

Reply to this email directly or view it on GitHub
#19 (comment)

@skoppar
Copy link

skoppar commented May 26, 2016

In the process of creating a test project I realized that I was using Java
class name instead of scala package name which differ my case. Sorry for
that, I could see the below errors resolve. However it is still looking for
a parquet file. It complains at runtime that the data is not in parquet.
Will create a test project for this shortly. I will have to mock up some
data as well.

FYI,
In the below line the only change I did is accesslog.accesslog instead of
Accesslog.accesslog.

Regards
Sunita
On Wednesday, May 25, 2016, Nadav Samet notifications@github.com wrote:

Can you upload a test project?

On Wed, May 25, 2016, 12:20 PM skoppar <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Also, curious as to why the method resolution turns out to be the java
implementation. I thought the java and scala classes could be
independently
used. The method being invoked resolves to the below definition in
AccessLog.java:

public static com.aol.proto.Accesslog.accesslog parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return PARSER.parseFrom(data);
}

The package structure (com.aol.proto.Accesslog.accesslog) is that of the
scala classes, but the actual definition is in the java file.

regards
Sunita
On Wed, May 25, 2016 at 12:02 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com
javascript:_e(%7B%7D,'cvml','sunita.koppar@verizondigitalmedia.com');>
wrote:

Thanks for your suggestions. I'm probably being dumb :(. It still
doesnt
work.

override def buildScan(): RDD[Row] = {
sqlContext.sparkContext.textFile(location).map{line =>
Accesslog.accesslog.parseFrom(Base64.decodeBase64(line))}
.map(ProtoSQL.messageToRow(_))
}

From the IDE, it seems to recognize the method calls and seems to
accept
all signatures, but when I try to compile, I

still get

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:

inferred type arguments [com.aol.proto.Accesslog.accesslog] do not
conform to
method messageToRow's type parameter bounds [T <:
com.trueaccord.scalapb.GeneratedMessage
with com.trueaccord.scalapb.Message[T]]

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:

type mismatch;

[error]
found :
com.aol.proto.Accesslog.accesslog

[error]
required: T

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error] two errors found

The case class generated does have

*"extends *com.trueaccord.scalapb.GeneratedMessage *with
*com.trueaccord.scalapb.Message[accesslog] *with
*com.trueaccord.lenses.Updatable[accesslog]"

Why is it not recognized as the type of the GeneratedMessage?

regards
Sunita

On Wed, May 25, 2016 at 10:41 AM, Nadav Samet <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

You can do it in two steps, convert the base64 string to Array[Byte],
using
decodeBase64

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html

Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to
through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as
the
library expects the protobuf in parquet format. The file I have is
encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a
Row.
Is
there an easier way to do this? I can see that the library already
have
ProtoParquetWriter, so I am hoping I will be able to convert from
protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com
javascript:_e(%7B%7D,'cvml','sunita.koppar@verizondigitalmedia.com');>
wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get
object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

The -v261 flag will work if you update protoc-jar to the latest
version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file
don't
have
a
package declaration )or java_package option). Try adding package
protos;
to the top of each proto file. I'll add this to a future FAQ
page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
<
#19 (comment)

-Nadav


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#19 (comment)


You are receiving this because you modified the open/close state.

Reply to this email directly or view it on GitHub
#19 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@skoppar
Copy link

skoppar commented May 28, 2016

Attaching a publicly sharable project for this exercise. Please note, I was
not able to get a sample log file. The schema is partial Person object in
the google tutorials for Protobuffer.
|-- name: string (nullable = false)
|-- id: integer (nullable = false)

The error on executing this shows:
16/05/28 01:45:20 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
com.google.protobuf.InvalidProtocolBufferException: Protocol message
contained an invalid tag (zero).
at
com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)

Which is probable because I couldnt mock up the data well.

The actual error I see is on the main project is:
java.io.IOException: Could not read footer: java.lang.RuntimeException:
file:/Users/skoppar/workspace/pyspark-beacon/stream/allproto.log is not a
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [55,
73, 67, 10]
at
org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:812)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:801)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException:
file:/Users/skoppar/workspace/pyspark-beacon/stream/allproto.log is not a
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [55,
73, 67, 10]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:423)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more

Is only parquet supported? After seeing the code for sparksql-protobuf, I
thought probably only Parquet would be readable, however, the base project
https://github.com/twitter/elephant-bird#hadoop-input-and-output-formats
mentions
the formats I am looking for.
NOTE: This is a streaming application so any sort of compression is not
advisable for us. I am currently doing a load() instead of stream() only
because the spark version I am testing with is not supporting stream
functionality yet.

regards
Sunita

On Wed, May 25, 2016 at 5:49 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

In the process of creating a test project I realized that I was using Java
class name instead of scala package name which differ my case. Sorry for
that, I could see the below errors resolve. However it is still looking for
a parquet file. It complains at runtime that the data is not in parquet.
Will create a test project for this shortly. I will have to mock up some
data as well.

FYI,
In the below line the only change I did is accesslog.accesslog instead of
Accesslog.accesslog.

Regards
Sunita

On Wednesday, May 25, 2016, Nadav Samet notifications@github.com wrote:

Can you upload a test project?

On Wed, May 25, 2016, 12:20 PM skoppar notifications@github.com wrote:

Also, curious as to why the method resolution turns out to be the java
implementation. I thought the java and scala classes could be
independently
used. The method being invoked resolves to the below definition in
AccessLog.java:

public static com.aol.proto.Accesslog.accesslog parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return PARSER.parseFrom(data);
}

The package structure (com.aol.proto.Accesslog.accesslog) is that of the
scala classes, but the actual definition is in the java file.

regards
Sunita
On Wed, May 25, 2016 at 12:02 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thanks for your suggestions. I'm probably being dumb :(. It still
doesnt
work.

override def buildScan(): RDD[Row] = {
sqlContext.sparkContext.textFile(location).map{line =>
Accesslog.accesslog.parseFrom(Base64.decodeBase64(line))}
.map(ProtoSQL.messageToRow(_))
}

From the IDE, it seems to recognize the method calls and seems to
accept
all signatures, but when I try to compile, I

still get

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:

inferred type arguments [com.aol.proto.Accesslog.accesslog] do not
conform to
method messageToRow's type parameter bounds [T <:
com.trueaccord.scalapb.GeneratedMessage
with com.trueaccord.scalapb.Message[T]]

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error]

/Users/skoppar/workspace/NrwlSbt/src/main/scala/com/aol/customreader/proto/ProtoRelation.scala:39:

type mismatch;

[error]
found :
com.aol.proto.Accesslog.accesslog

[error]
required: T

[error]
.map(ProtoSQL.messageToRow(_))

[error] ^

[error] two errors found

The case class generated does have

*"extends *com.trueaccord.scalapb.GeneratedMessage *with
*com.trueaccord.scalapb.Message[accesslog] *with
*com.trueaccord.lenses.Updatable[accesslog]"

Why is it not recognized as the type of the GeneratedMessage?

regards
Sunita

On Wed, May 25, 2016 at 10:41 AM, Nadav Samet <
notifications@github.com>
wrote:

You can do it in two steps, convert the base64 string to Array[Byte],
using
decodeBase64

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html

Then use protoMsg.parseFrom(arrayOfBytes) and finally map it to
through
messageToRow.

-Nadav

On Wed, May 25, 2016 at 10:16 AM, skoppar notifications@github.com
wrote:

Hi Nadav,

I am trying to extend the DataSources API and facing challenges as
the
library expects the protobuf in parquet format. The file I have is
encoded
in base64.

override def buildScan(): RDD[Row] = {
sqlContext.read.load(location).map(ProtoSQL.messageToRow(_)).rdd

}

This does not work as messageToRow expects an object T and not a
Row.
Is
there an easier way to do this? I can see that the library already
have
ProtoParquetWriter, so I am hoping I will be able to convert from
protobuf
base 64 to protobuf parquet.
Appreciate your help.

On Mon, May 23, 2016 at 3:20 PM, Sunita Koppar <
sunita.koppar@verizondigitalmedia.com> wrote:

Thank you. It works now.

FYI:
I could get it to compile, but when I try executing it, I get
object
already defined errors.
I was able to resolve this with the help of

http://stackoverflow.com/questions/16885489/intellij-compile-failures-is-already-defined-as

Just in case you would like to update the FAQ

Details below:

import com.xxx.proto.General

val sfSchema = ProtoSQL.schemaFor[General]

I get errors like :

On Mon, May 23, 2016 at 2:02 PM, Nadav Samet <
notifications@github.com>
wrote:

The -v261 flag will work if you update protoc-jar to the latest
version.
I verified it works with 3.0.0-b3

For the original issue, the problem is that your protos file
don't
have
a
package declaration )or java_package option). Try adding package
protos;
to the top of each proto file. I'll add this to a future FAQ
page.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#19 (comment)


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
<
#19 (comment)

-Nadav


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#19 (comment)


You are receiving this because you modified the open/close state.

Reply to this email directly or view it on GitHub
<#19 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

@thesamet
Copy link
Contributor

Let's move the discussion to our mailing list as we are using Github Issues for bug reports.

  • It looks like the attachment is missing.
  • From the exception, it seems like the parquet file you are reading is corrupt.

Can't say much more without seeing the code or how you generate the test data.

@skoppar
Copy link

skoppar commented May 29, 2016

Not sure which is the mailing list id. Hopefully the one I am replying to
is.I will upload it in a day again. For now can you just confirm if data
needs to be parquet or not.

Appreciate your timely responses. Didn't expect it on long weekend also.

Regards
Sunita

On Saturday, May 28, 2016, Nadav Samet notifications@github.com wrote:

Let's move the discussion to our mailing list
https://groups.google.com/forum/#!forum/scalapb as we are using Github
Issues for bug reports.

  • It looks like the attachment is missing.
  • From the exception, it seems like the parquet file you are reading
    is corrupt.

Can't say much more without seeing the code or how you generate the test
data.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ARAoyfRj-A5b2QYu2ynzvBKDi98KSCYVks5qGTbAgaJpZM4EFL08
.

@thesamet
Copy link
Contributor

I linked to the mailing list above. I'm not sure I understand your question "if data needs to be parquet or not" - SparkSQL can read parquet or Json. ScalaPB can generate both.

thesamet added a commit that referenced this issue May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants