-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't Parse Dates from Excel files into DataFrame Properly #463
Comments
This fix work for me const df = new DataFrame(excelToJson(file_path)); excelToJson(file_path: string) { |
This is because of the default format of date columns when reading excel files with the xlsx package which we use internally. This can be solved by passing a couple of parsingOptions down to xlsx (see https://stackoverflow.com/questions/53163552/format-date-with-sheetjs). I'm working on a fix so you can pass extra options down to xlsx read/export function. So you will be able to do this: const df = await readExcel('test.xlsx', {
parsingOptions: { cellDates: true}
}) |
Hello, I'm new to using Danfo.js, which is quite impressive. I'm hoping that I'm so new that I simply don't know how to do what I'm trying to do, but it appears that the functionality isn't there (after looking at the documentation). I'll elaborate below.
Is your feature request related to a problem? Please describe.
I'm trying to add functionality to a web app where the user can upload an excel file that bulk uploads data into the backend. To do this, I'm using Danfo and readExcel() to ready a DataFrame, manipulate the data, and then convert the df to JSON to send to the API. One of the columns contains date data, represented as strings (with '/' or '-' characters in it). The problem I'm having is that when the data is read in, Danfo doesn't seem to know how to interpret it, and decides to do so as a float32 (which I'm assuming is the default, see additional context for examples). This ends up making the data completely un-human-readable as a five-digit number instead of a date string, and it may also be losing information though the casting, but I can't tell.
Describe the solution you'd like
I would like to be able to tell readExcel to parse certain columns as certain data types (in this case, string), similar to how you can assign dtype in pandas. I see when I print the dataframes to console that $dtypes has this information stored, but through much trial and error and documentation combing I was unable to figure out how to pass params or otherwise set dtypes. I tried passing an object since you cannot pass params by name in JS (something like
readExcel(file, { dtypes: ['int', 'int', string'] })
), but that didn't work either. Perhaps this is how it can be implemented.Describe alternatives you've considered
At the moment I've figured out how to do this through parsing CSV files, which Danfo has no problem with (same data). My users are not tech savvy, so even though this works I'd like to avoid making them export to .csv before they upload, but this can likely be done if it needs to be. I've also considered doing some math and trying to restore the date information from the float data, but I'd have to rack my brain over how the bits are being set and I'm not sure it's worth the effort. If there's an easy formula to recover the string data and reconvert it after it's been interpreted as float, I'm open to that as well. I've tried using df.asType('target', 'string') as well, but it looks like that just casts the float by converting to int then to string (returning a 5-digit number instead of reverting to the original chars).
Additional context
Treat this as an appendix. Here are the data values I'm passing in and what I'm getting out:
The text was updated successfully, but these errors were encountered: