- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Describe the bug
I have a data set created by Apache Spark and I tried to query it from the DataFusion CLI. It failed, saying that a parquet file was corrupt.
 CREATE EXTERNAL TABLE store_sales STORED AS PARQUET LOCATION 'store_sales.dat';
0 rows in set. Query took 0.002 seconds.
❯ select count(*) from store_sales;
Parquet reader thread terminated due to error: ParquetError(General("Invalid Parquet file. Corrupt footer"))
I added some debug logging and found that it was actually trying to read the following file, which is not a Parquet file.
store_sales.dat/.part-00005-5142b177-bacb-499d-b14f-12de4b94d9d9-c000.snappy.parquet.crc
To Reproduce
Create a non-Parquet file with a non-Parquet extension and put it in a directory along with some valid parquet files.
Expected behavior
Should only try and read files with file extension .parquet.
Additional context
None
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed