-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column names are not read for ORC tables #1556
Comments
According to this hive issue, this is a problem with ORC tables create through hive and given that issue was reported in 2016 and it is still open I think it would be great to be able to assign column names on the fly/manually through the |
This isn't a high priority issue for us right now but you could accomplish this yourself by doing something like.
|
You could also use our hive connection API. I believe that when we implemented this, we too into consideration the Hive issue you mention. |
Thanks for the swift replies as always! Regarding the hive connection, have you seen any performance difference between using the HDFS connector vs the Hive connector? @williamBlazing |
@wmalpica could you please follow up on this? |
Is your feature request related to a problem? Please describe.
Hi! I mostly work with parquet and csv files but there are also some
orc
files in the db I use. I've noticed that the column names of my ORC tables are not inferred and instead the column names default to_col0
,_col1
,_col2
... Additionally, thenames
argument is only enabled for reading (/creating)csv
tables (withcreate_tables
) hence I am not able to set the column names manually. My tables look like these when read withbsql
Describe the solution you'd like
I would like a
names
argument whenfile_format
is set toorc
increate_table
: (https://github.com/BlazingDB/blazingsql/blob/92ed45f5af438fedc8cad82e4ef8ed3f3fb7eed6/docsrc/source/reference/python/tables/apache-orc.rst)----For BlazingSQL Developers----
How and where should this be implemented?
What part of the code should be feature be implemented? What should the APIs and/or classes look like?
Other design considerations
What components of the engine could be affected by this? What functions should we make sure we use/reuse?
Testing considerations?
What sort of unit tests and/or End to End tests be implemented to test this?
The text was updated successfully, but these errors were encountered: