-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed CSV/JSON support #3642
Conversation
This looks very cool @Licht-T -- thank you for the contribution. Hopefully we'll get a chance to review it in the next few days |
Thanks, @alamb. The test works on my Windows environment but still fails on the Windows CI. I will push a debug code. |
I found
|
Now all green. Ready to get reviewed. |
async-trait = "0.1.41" | ||
bytes = "1.1" | ||
bzip2 = "0.4.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas how many dependencies we are adding? I guess some of them are already transitive dependencies through parquet.
I took a quick look through this PR and it looks great -- thank you @Licht-T Can you please merge / rebase from master to resolve conflicts so that I can merge it? Thanks again! |
@alamb Merged from master. I believe we need some documentation about this, right? I am willing to do that. |
Yes please @Licht-T -- that would be super helpful. Perhaps https://arrow.apache.org/datafusion/user-guide/sql/ddl.html?highlight=external+table and https://arrow.apache.org/datafusion/user-guide/sql/index.html?highlight=external+table ? If you are not able to do so, we should at least file a ticket to document the feature |
@@ -101,6 +105,7 @@ ctor = "0.1.22" | |||
doc-comment = "0.3" | |||
env_logger = "0.9" | |||
fuzz-utils = { path = "fuzz-utils" } | |||
rstest = "0.15.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dandandan what do we think about adding this new testing library?
Thanks again @Licht-T |
Benchmark runs are scheduled for baseline = 58afdf7 and contender = b8a3a78. b8a3a78 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@Licht-T Were you able to test this change? I am trying these csv options and it simple returns nothing on a valid .gz file.
|
Which issue does this PR close?
Closes #3641.
Rationale for this change
Explained in #3641.
What changes are included in this PR?
FileCompressionType
as the text file compression type definition.Read/Stream
.COMPRESSION TYPE
SQL token.Are there any user-facing changes?
Yes.
I am not sure which are public APIs, but maybe yes.