Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect hdfs error #29187

Open
asfimport opened this issue Aug 3, 2021 · 6 comments
Open

connect hdfs error #29187

asfimport opened this issue Aug 3, 2021 · 6 comments

Comments

@asfimport
Copy link
Collaborator

when i use pyarrow to connect my hdfs, I meet error 

I use 

from pyarrow import fs
print(fs.FileSystem.from_uri("hdfs://"))

the error shows loadFileSystems error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
/arrow/cpp/src/arrow/filesystem/hdfs.cc:51: Failed to disconnect hdfs client: IOError: HDFS hdfsFS::Disconnect failed, errno: 9 (Bad file descriptor)
Traceback (most recent call last):
File "/home/tdops/fucheng.pan/ray-code/read.py", line 15, in
print(fs.FileSystem.from_uri("hdfs:"))
File "pyarrow/_fs.pyx", line 347, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: HDFS connection failed

Reporter: cheng pan

Note: This issue was originally created as ARROW-13535. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Ian Cook / @ianmcook:
To refer to the HDFS filesystem root without specifying the host, I think you need to use three slashes after the colon, like "hdfs:///"

@asfimport
Copy link
Collaborator Author

cheng pan:
I try this like like "hdfs:///"

fs.FileSystem.from_uri("hdfs:///nameservice1/user/tdops/1.parquet")

But it still failed,the error shows

oadFileSystems error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
/arrow/cpp/src/arrow/filesystem/hdfs.cc:51: Failed to disconnect hdfs client: IOError: HDFS hdfsFS::Disconnect failed, errno: 9 (Bad file descriptor)
Traceback (most recent call last):
print(fs.FileSystem.from_uri("hdfs:///nameservice1/user/tdops/1.parquet"))
File "pyarrow/_fs.pyx", line 347, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: HDFS connection failed

@radiophysicist
Copy link

Any updates?

@westonpace
Copy link
Member

The Arrow HDFS filesystem is a pretty thin wrapper around a vendored copy of libhdfs. I'm afraid that many of the maintainers here aren't very familiar with how libhdfs works.

@radiophysicist
Copy link

As described here pandas-dev/pandas#50639
for me it was helpful to downgrade fsspec to version 2022.8.2.

@ysbarney
Copy link

need configure CLASSPATH for libjvm.so

add command:
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath --glob)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants