-
Notifications
You must be signed in to change notification settings - Fork 33
Description
When reading an .xlsx file using ST_READ() from the duckdb npm package , the date column is inconsistently interpreted.
- Some dates are returned as strings (e.g., '22/05/2000')
- Others are returned as Excel serial numbers (e.g., '37349' for '04/03/2002')
Environment :
npm package :
duckdb - 1.3.1
duckdb-async - 1.3.1
Platform - macos
node version - v18.20.8
Excel Editor: Microsoft Excel on macOS
My node js code below :
duckDbConnection.executeQuery( CREATE TABLE ${excelTempTableName} AS SELECT * FROM ST_READ('${filePath}', layer='${sheet}',
open_options=['HEADERS=FORCE','CELL_TEXT=YES'], allowed_drivers=["XLSX"]), )
steps to reproduce :
- create an xlsx file
-
Upload the excel and try to read the excel using duckdb npm package like the code i shared above.
-
Observed output :
so when i write below query to get the type of data for that column, i receive a bigint type.
const res = await duckDbConnection.executeQuery(SELECT * FROM ${excelTempTableName}); console.log('table ============> ', res);
- Expected output :
All date1 values should be consistently interpreted as proper date type.
I tried below things to fix :
- I formatted the date1 column as date in excel file but when i upload the file then that date1 column is interpreted as varchar type.
[
{
column_name: 'id',
column_type: 'INTEGER',
min: '1',
max: '7',
approx_unique: 5n,
avg: '3.6',
std: '2.4083189157584592',
q25: '2',
q50: '3',
q75: '6',
count: 5n,
null_percentage: 0
},
{
column_name: 'date1',
column_type: 'VARCHAR',
min: '22/05/2000',
max: '38689',
approx_unique: 5n,
avg: null,
std: null,
q25: null,
q50: null,
q75: null,
count: 5n,
null_percentage: 0
}
]
Request :
- Consistently infer date columns (preferably to DATE or ISO string format)
- Provide an option like PARSE_DATES=TRUE to always convert serials to readable date strings
- Respect CELL_TEXT=YES to return visible Excel values, not underlying serials