Skip to content

[Python] After upgrade pyarrow from 0.15 to 0.17.1 connect to hdfs don`t work with libdfs jni #17210

@asfimport

Description

@asfimport

Problem

After upgrade pyarrow from 0.15 to 0.17, I have a some troubles. I understand, that libhdfs3 no support now. However, in my case, libhdfs not work too. See below.

My experience in the Hadoop ecosystem is not big. Maybe, I took a some wrongs. I installed Hortonworks HDP  from Ambari service on the virtual machine, installed on my PC.

I try that..

1.  just connect..

%xmode Verbose
import pyarrow as pa

hdfs = pa.hdfs.connect(host='hdp.test.com', port=8020, user='hdfs')

FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop' ([#1.txt])

2. to bypass if driver == 'libhdfs'..

%xmode Verbose

import pyarrow as pa

hdfs = pa.HadoopFileSystem(host='hdp.test.com', port=8020, user='hdfs', driver=None')

OSError: Unable to load libjvm: /usr/java/latest//lib/server/libjvm.so: cannot open shared object file: No such file or directory ([#2.txt])

  1. With libhdfs3 it working:

import hdfs3 

hdfs = hdfs3.HDFileSystem(host='hdp.test.com', port=8020, user='hdfs')

#ls remote folder
hdfs.ls('/data/', detail=False)

['/data/TimeSheet.2020-04-11', '/data/test', '/data/test.json']

Environment.

Client PC:

OS: Debian 10. Dev.: Anaconda3 (python 3.7.6), Jupyter Lab 2, pyarrow 0.17.1 (from conda-forge)

Hadoop (on VM – Oracle VirtualBox):

OS: Oracle Linux 7.6.  Distr.: Hortonworks HDP 3.1.4

libhdfs.so:

[root@hdp /]# find / -name libhdfs.so
/usr/lib/ams-hbase/lib/hadoop-native/libhdfs.so
/usr/hdp/3.1.4.0-315/usr/lib/libhdfs.so

 

 Java path:

[root@hdp /]# sudo alternatives -~~config java

-~~---------------------------------------------
*+ 1           java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)

 

libjvm:               

[root@hdp /]# find / -name libjvm.*
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
/usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so

 

I tried many settings (. Below last :

  1. etc/profile.
    ...
    export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
    export JRE_HOME=$JAVA_HOME/jre
    export JAVA_CLASSPATH=$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
    export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
    export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
    export ARROW_LIBHDFS_DIR=/usr/lib/ams-hbase/lib/hadoop-native

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export CLASSPATH==.:$CLASSPATH:$JAVA_CLASSPATH:$HADOOP_CLASSPATH

    export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JRE_HOME/lib/amd64/server

     
     

Reporter: Pavel Dourugyan

Original Issue Attachments:

Note: This issue was originally created as ARROW-8988. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions