support format in options of COPY command#9744
Conversation
tinfoil-knight
left a comment
There was a problem hiding this comment.
Left some notes for the reviewer.
| ---- | ||
| 1 | ||
|
|
||
| query error DataFusion error: Invalid or Unsupported Configuration: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT |
There was a problem hiding this comment.
The error feels a bit long to me compared to others. We can reduce it to:
DataFusion error: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT
I didn't do this because all the other branches were wrapping downstream errors with DataFusionError::Configuration & then returning them.
WDYT?
There was a problem hiding this comment.
I agree the error could be nicer, but in this case I think we should keep the same basic pattern as the rest of the code (and perhaps update the pattern in a follow on PR)
devinjdangelo
left a comment
There was a problem hiding this comment.
This looks good to me, thank you @tinfoil-knight!
We could perhaps add a test to validate the behavior when both stored as and format are specified in the same query. (i.e. stored as should take precedent).
@devinjdangelo I've added the test. Thank you for the review. |
There was a problem hiding this comment.
Thank you @tinfoil-knight and @devinjdangelo -- the code looks good to me
I also tested it out locally and it worked great 🙏
DataFusion CLI v36.0.0
❯ COPY (select * from (values (1))) to '/tmp/copy.dat' OPTIONS (format parquet);
+-------+
| count |
+-------+
| 1 |
+-------+
1 row in set. Query took 0.030 seconds.
❯ \q
(venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion/datafusion-cli$ file /tmp/copy.dat
/tmp/copy.dat: Apache Parquet
While reviewing the PR I think it would be good to verify that the files created are actually the specified format as well as update the documentation. I'll make a follow on PR
Update: #9753
This reverts commit 40fb1b8.
Which issue does this PR close?
Closes #9713 .
Rationale for this change
Please see the description of the issue.
What changes are included in this PR?
If a
formatkey is passed in the format / write options of theCOPYcommand, we're using it to determine thefile_typewhich is used to resolve format options for theCOPYstatement.Are these changes tested?
Yes. New tests have been added.
Are there any user-facing changes?
This PR is to improve backwards compatibility before the next release so no new changes from user's PoV.