-
Notifications
You must be signed in to change notification settings - Fork 3k
feat: Add built-in CSV loader #19167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If sometimes Bun will be good in Jupiter notebooks it would be awesome feature |
Very exciting. Thank you for this. Initial thoughts:
|
@Jarred-Sumner thank you for the quick feedback :D
Will do 🫡
I don't know the answers directly, but I'll try to find some time this week to do more research. |
ParsingBoth my go to CSV library and the creator of PapaParse agrees with the RFC, that leading/trailing whitespace is part of the field
There was a discussion AFAICT, all JS based CSV parsers support Unicode, so there is no reason why we shouldn't - I'll update the code to use
I'll stay consistent with the RFC, just allow any of those symbols at the place where RFC uses Another feature present in other parsers is dynamic typing ( I think we should skip all nice to haves at least until we can pass options to imports - otherwise the number of loaders will become unmanageable ( Test SuitI've found some sets of exhaustive tests we can use / get inspired by:
(will have to check licenses) I'll implement them as soon as I find time 😅 Footnotes
|
That's awesome! It would be great to have the attributes for using the header, but also the attributes for the field and row delimiters. That way, all "csv-like" formats (tsv, excel csv with ';', and even "ascii-delimited files") would work with a single implementation. Thank you for your work! |
What does this PR do?
Bun already supports natively importing various file types, which makes quick scripting much easier. However, one commonly used and straightforward format—CSV—was missing. CSV is a ubiquitous, very basic format1, and having built-in support for it would be a helpful addition.
This pull request adds new loaders that allow importing CSV files as JavaScript arrays of records (objects) or arrays. This implementation is minimal and slightly constrained due to the current limitations in passing import options2. For example, it's not yet possible to do:
I've based this implementation on the official CSV specification (RFC 4180), and extended it to also support TSV (tab-separated values) files.
Design Choices and Rationale
One design decision worth noting is the inclusion of four new loaders, rather than just a single
csv
loader. Originally, I intended to provide one generic loader. However, due to the current architecture—which doesn't allow accessing import options from within the loader itself (see this relevant section of the code)—it made more sense to cover the most common use cases explicitly.These loaders handle two variables:
,
for CSV) or a tab (\t
for TSV)true
(default) orfalse
By covering these combinations, we support the most typical use cases out of the box.
To enable this, I've added multiple new module types in
packages/bun-types/extensions.d.ts
. The type of the default export depends on the presence of a header row:To distinguish these cases, I’ve used the
?no_header
query string (e.g.,data.csv?no_header
). This approach works because:Edge Case: Empty File Import
While writing tests, I encountered a bug related to importing empty files: Issue #19164. Currently, importing an empty file results in an empty object. However, for CSV and TSV, I believe the correct behavior should be to return an empty array, as the default export.
Checklist
How did you verify your code works?
I’ve written tests for CSV and TSV imports, following the pattern of the existing TOML import tests.
If Zig files changed:
bun-debug test test-file-name.test
I'm still new to Zig, so I haven’t yet verified the memory handling manually. If someone can guide me on how to do that, I’d love to learn!
This is my first contribution to Bun, so feedback is very welcome. Please let me know if I’ve missed anything, done something incorrectly, or should add more context or documentation.
It should also address this issue: #6722
Footnotes
The format is so old it doesn't really change anymore, so once the parser is working it should require no further work in the future, meaning it should be a net positive (more features; no new stuff to maintain). Of course, in a long run, one could think about iterators, streaming from disk, SIMD and other optimizations, but for not having to install anything is more than enough :) ↩
I spent around 10 hours exploring how import options might be accessed within the loader—see this section of the source. From what I understand, parsing and transpiling are currently decoupled, and the loader is chosen based solely on the file extension. That makes it difficult to pass custom import options to the parser. This might be worth discussing or exploring in a future PR. ↩