Skip to content

create and use hadoop-palantir profile #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sjrand
Copy link

@sjrand sjrand commented Nov 14, 2016

No description provided.

@sjrand
Copy link
Author

sjrand commented Nov 14, 2016

@ash211 , @robert3005 , @pwoody , here's the version of #51 that will (hopefully) not fail the build.

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

Some classpath fun to work out before this is g2g. Tried running Spark 2.1-palantir15 (I know, should be using palantir16/17 instead) built against Hadoop 2.9.0-SNAPSHOT-palantir2, and got this when starting up a spark-shell session:

java.lang.IllegalAccessError: tried to access field org.apache.hadoop.hdfs.server.namenode.ha.AbstractNNFailoverProxyProvider.fallbackToSimpleAuth from class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
  at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124)
  at org.apache.hadoop.io.retry.RetryInvocationHandler$ProxyDescriptor.<init>(RetryInvocationHandler.java:195)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:304)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:298)
  at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:59)
  at org.apache.hadoop.hdfs.NameNodeProxiesClient.createHAProxy(NameNodeProxiesClient.java:308)
  at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:135)
  at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:343)
  at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)
  at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:156)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2904)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2941)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2923)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:183)
  at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:122)
  at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:122)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:122)
  at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:69)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:154)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2296)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:843)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:835)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:835)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
  ... 47 elided

Will (hopefully) report back with more info.

@robert3005
Copy link

Looks like you have mismatched hadoop libs. You should verify that you only
bring in the version you want

On Tue, 15 Nov 2016, 02:49 sjrand, notifications@github.com wrote:

Some classpath fun to work out before this is g2g. Tried running Spark
2.1-palantir15 (I know, should be using palantir16/17 instead) built
against Hadoop 2.9.0-SNAPSHOT-palantir2, and got when starting up a
spark-shell session:

java.lang.IllegalAccessError: tried to access field org.apache.hadoop.hdfs.server.namenode.ha.AbstractNNFailoverProxyProvider.fallbackToSimpleAuth from class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124)
at org.apache.hadoop.io.retry.RetryInvocationHandler$ProxyDescriptor.(RetryInvocationHandler.java:195)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:304)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:298)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:59)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createHAProxy(NameNodeProxiesClient.java:308)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:135)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:343)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:287)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:156)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2904)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2941)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2923)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:183)
at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:122)
at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:122)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.(Client.scala:122)
at org.apache.spark.deploy.yarn.Client.(Client.scala:69)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:154)
at org.apache.spark.SparkContext.(SparkContext.scala:501)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2296)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:843)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:835)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:835)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
... 47 elided

Will (hopefully) report back with more info.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#54 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAfQVIgnzOzoETbUo8D6dvhSgv-0T6j6ks5q-R2ggaJpZM4KxomF
.

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

Yep, verbose classloading found the 2.7.3 JARs that were sneaking in. More weirdness now, but should be manageable. Will comment once it's working.

(Wasn't a problem with this PR, just with the way I was deploying it.)

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

Bleh, now AMRMClientImpl is throwing an NPE on

List<ResourceRequestInfo<T>> matchingRequests =
        remoteRequestsTable.getMatchingRequests(priority, resourceName,
            executionType, capability);

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

There's just a bug in YARN -- https://issues.apache.org/jira/browse/YARN-5753. They've fixed it on trunk but not in branch-2. I'm going to backport the fix to palantir-hadoop, and I've asked on the ticket if they're wiling to backport it to branch-2.

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

Whoo, more failures. The whack-a-mole continues.

16/11/15 08:11:29 INFO client.RMProxy: Connecting to ResourceManager at il-pg-alpha-569665.use1.palantir.global/10.0.25.166:8030
16/11/15 08:11:29 INFO yarn.YarnRMClient: Registering the ApplicationMaster
16/11/15 08:11:29 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
16/11/15 08:11:29 INFO yarn.YarnAllocator: Submitted 2 unlocalized container requests.
16/11/15 08:11:29 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
16/11/15 08:11:31 INFO retry.RetryInvocationHandler: Exception while invoking ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after sleeping for 30000ms.
java.io.EOFException: End of File Exception between local host is: "il-pg-alpha-569667.use1.palantir.global/10.0.26.72"; destination host is: "il-pg-alpha-569665.use1.palantir.global":8030; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:815)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:779)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1487)
    at org.apache.hadoop.ipc.Client.call(Client.java:1429)
    at org.apache.hadoop.ipc.Client.call(Client.java:1339)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy15.allocate(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
    at com.sun.proxy.$Proxy16.allocate(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:294)
    at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:265)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:458)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1786)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1157)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1054)

Not an RPC version mismatch -- version is 9 on both branch-2 and 2.6.0-cdh5.8.2.

@sjrand
Copy link
Author

sjrand commented Nov 15, 2016

And now it inexplicably works, at least running a trivial job:

scala> sc.parallelize(1 to 10).collect()
res0: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

EDIT: It usually fails. Not sure what was happening the time it worked.

@sjrand
Copy link
Author

sjrand commented Nov 30, 2016

Note on the subject of trying to use more recent hadoop-aws code: Trying to use Spark that's built entirely against 2.7.3, but swapping out the hadoop-aws JAR for the 2.9.0-SNAPSHOT version doesn't work. S3AFileSystem has a private S3AStorageStatistics storageStatistics; variable, but S3AStorageStatistics extends StorageStatistics, which doens't exist in 2.7.3.

cc @pwoody since you were asking about this.

@robert3005 robert3005 closed this Jan 26, 2017
@robert3005 robert3005 deleted the sr/use-palantir-hadoop branch January 26, 2017 14:24
@sjrand
Copy link
Author

sjrand commented Jan 28, 2017

@robert3005 I'll make a new version of this once I have a patch for YARN-6013.

LorenzoMartini pushed a commit that referenced this pull request May 19, 2021
* Add flag to coerce rows to match schema

* Remove extra comment

* Fix python style
16pierre pushed a commit to 16pierre/spark that referenced this pull request May 24, 2021
* Add flag to coerce rows to match schema

* Remove extra comment

* Fix python style
rshkv added a commit that referenced this pull request May 25, 2021
* Add flag to coerce rows to match schema

* Remove extra comment

* Fix python style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants