Skip to content

User guide is incorrect regarding using CLI to register CSV files using schema inference #3001

Closed
@andygrove

Description

Describe the bug

The user guide page at https://arrow.apache.org/datafusion/cli/index.html states that "It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files." but this is not true, as demonstrated below:

DataFusion CLI v10.0.0
❯ create external table a stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.017 seconds.
❯ select * from a;
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
1 row in set. Query took 0.011 seconds.
❯ describe a;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| a           | Int64     | NO          |
| b           | Int64     | NO          |
| c           | Int64     | NO          |
| d           | Int64     | NO          |
+-------------+-----------+-------------+
4 rows in set. Query took 0.017 seconds.

To Reproduce
See above.

Expected behavior
We should update the user guide to state that specifying a schema is optional.

Additional context
None

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentationgood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions