Skip to content

Conversation

@zeroshade
Copy link
Member

https://issues.apache.org/jira/browse/ARROW-2661

Both the JNI and libhdfs3 support hdfsBuilderConfSetStr so we can utilize that to allow passing arbitrary configuration values for hdfs connection similiar to how https://hdfs3.readthedocs.io/en/latest/hdfs.html supports passing them.

I've added a param called extra_conf to facilitate it in pyarrow, such as:

import pyarrow
conf = {"dfs.nameservices": "nameservice1",
        "dfs.ha.namenodes.nameservice1": "namenode113,namenode188",
        "dfs.namenode.rpc-address.nameservice1.namenode113": "hostname_of_server1:8020",
        "dfs.namenode.rpc-address.nameservice1.namenode188": "hostname_of_server2:8020",
        "dfs.namenode.http-address.nameservice1.namenode188": "hostname_of_server1:50070",
        "dfs.namenode.http-address.nameservice1.namenode188": "hostname_of_server2:50070",
        "hadoop.security.authentication": "kerberos"
}
hdfs = pyarrow.hdfs.connect(host='nameservice1', driver='libhdfs3', extra_conf=conf)

Matthew Topol added 2 commits June 1, 2018 18:01
@codecov-io
Copy link

codecov-io commented Jun 1, 2018

Codecov Report

Merging #2097 into master will decrease coverage by 0.02%.
The diff coverage is 12.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2097      +/-   ##
==========================================
- Coverage   86.39%   86.36%   -0.03%     
==========================================
  Files         230      230              
  Lines       40488    40414      -74     
==========================================
- Hits        34979    34904      -75     
- Misses       5509     5510       +1
Impacted Files Coverage Δ
cpp/src/arrow/io/hdfs.h 50% <ø> (ø) ⬆️
cpp/src/arrow/io/hdfs.cc 0.33% <0%> (-0.01%) ⬇️
python/pyarrow/hdfs.py 36.84% <0%> (ø) ⬆️
cpp/src/arrow/io/hdfs-internal.cc 33.62% <0%> (-0.45%) ⬇️
cpp/src/arrow/io/hdfs-internal.h 100% <100%> (ø) ⬆️
python/pyarrow/array.pxi 66.12% <0%> (-2.42%) ⬇️
python/pyarrow/types.pxi 57.9% <0%> (-1.6%) ⬇️
python/pyarrow/feather.pxi 67.44% <0%> (-1.45%) ⬇️
python/pyarrow/table.pxi 68.69% <0%> (-1.07%) ⬇️
python/pyarrow/io.pxi 61.81% <0%> (-0.96%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79a2207...047dd4b. Read the comment docs.

int port;
std::string user;
std::string kerb_ticket;
std::map<std::string, std::string> extra_conf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ordering relevant? If not, please use an std::unordered_map

@wesm wesm changed the title [ARROW-2661] Adding the ability to programmatically pass hdfs configration key/value pairs in the C++ and via pyarrow ARROW-2661: [Python] Adding the ability to programmatically pass hdfs configration key/value pairs via pyarrow Jun 2, 2018
@zeroshade
Copy link
Member Author

@xhochy switched over to unordered_map as requested. honestly i should have thought to do that in the first place haha. 😃

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

Thanks for adding this!

@xhochy xhochy closed this in b1d1633 Jun 4, 2018
@wesm
Copy link
Member

wesm commented Jun 4, 2018

There's no documentation for this -- can we add to the docstring and/or Sphinx? Feel free to open a new JIRA so we don't forget

@wesm
Copy link
Member

wesm commented Jun 5, 2018

@zeroshade zeroshade deleted the configs branch September 12, 2021 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants