feat: Add built-in CSV loader #19167

mastermakrela · 2025-04-21T20:25:22Z

What does this PR do?

Bun already supports natively importing various file types, which makes quick scripting much easier. However, one commonly used and straightforward format—CSV—was missing. CSV is a ubiquitous, very basic format¹, and having built-in support for it would be a helpful addition.

This pull request adds new loaders that allow importing CSV files as JavaScript arrays of records (objects) or arrays. This implementation is minimal and slightly constrained due to the current limitations in passing import options². For example, it's not yet possible to do:

import table from "./data.csv" with { type: "csv", header: "false" };

I've based this implementation on the official CSV specification (RFC 4180), and extended it to also support TSV (tab-separated values) files.

Design Choices and Rationale

One design decision worth noting is the inclusion of four new loaders, rather than just a single csv loader. Originally, I intended to provide one generic loader. However, due to the current architecture—which doesn't allow accessing import options from within the loader itself (see this relevant section of the code)—it made more sense to cover the most common use cases explicitly.

These loaders handle two variables:

The delimiter: either a comma (, for CSV) or a tab (\t for TSV)
The presence of a header row: either true (default) or false

By covering these combinations, we support the most typical use cases out of the box.

To enable this, I've added multiple new module types in packages/bun-types/extensions.d.ts. The type of the default export depends on the presence of a header row:

If headers are present: the loader returns an array of objects
If headers are absent: it returns an array of arrays

To distinguish these cases, I’ve used the ?no_header query string (e.g., data.csv?no_header). This approach works because:

It’s currently the only TypeScript-compatible way to define distinct types for the same file extension
The query string is otherwise ignored during import, making it a potential candidate for future enhancements

Edge Case: Empty File Import

While writing tests, I encountered a bug related to importing empty files: Issue #19164. Currently, importing an empty file results in an empty object. However, for CSV and TSV, I believe the correct behavior should be to return an empty array, as the default export.

Checklist

Code changes
Documentation or TypeScript types (not required for this PR)

How did you verify your code works?

I’ve written tests for CSV and TSV imports, following the pattern of the existing TOML import tests.

If Zig files changed:

I verified memory lifetimes (allocation and deallocation) where applicable
I included tests for the new code, or existing tests cover the changes
I wrote TypeScript/JavaScript tests, and they pass locally using bun-debug test test-file-name.test

I'm still new to Zig, so I haven’t yet verified the memory handling manually. If someone can guide me on how to do that, I’d love to learn!

This is my first contribution to Bun, so feedback is very welcome. Please let me know if I’ve missed anything, done something incorrectly, or should add more context or documentation.

It should also address this issue: #6722

The format is so old it doesn't really change anymore, so once the parser is working it should require no further work in the future, meaning it should be a net positive (more features; no new stuff to maintain). Of course, in a long run, one could think about iterators, streaming from disk, SIMD and other optimizations, but for not having to install anything is more than enough :) ↩
I spent around 10 hours exploring how import options might be accessed within the loader—see this section of the source. From what I understand, parsing and transpiling are currently decoupled, and the loader is chosen based solely on the file extension. That makes it difficult to pass custom import options to the parser. This might be worth discussing or exploring in a future PR. ↩

kravetsone · 2025-04-21T20:29:28Z

If sometimes Bun will be good in Jupiter notebooks it would be awesome feature

Jarred-Sumner · 2025-04-21T20:43:33Z

Very exciting. Thank you for this.

Initial thoughts:

How do PappaParse and other CSV parsers handle leading/trailing quotes and whitespace both between cells and within cells ? Do they handle non-ascii newlines? If yes; we should assume that we need to as well and that means using strings.CodepointIterator instead of iterating byte by byte
Can you add about 50 more tests for various cases involving headers, no headers, trailing whitespace, leading whitespace, inconsistent number of commas?
what is in the test suite of other CSV parsers that we should copy?

mastermakrela · 2025-04-21T20:55:53Z

@Jarred-Sumner thank you for the quick feedback :D

Can you add about 50 more tests for various cases involving headers, no headers, trailing whitespace, leading whitespace, inconsistent number of commas?

Will do 🫡

How do PappaParse and other CSV parsers handle leading/trailing quotes and whitespace both between cells and within cells ? Do they handle non-ascii newlines? If yes; we should assume that we need to as well and that means using strings.CodepointIterator instead of iterating byte by byte

what is in the test suite of other CSV parsers that we should copy?

I don't know the answers directly, but I'll try to find some time this week to do more research.

mastermakrela · 2025-04-23T21:47:19Z

Parsing

Both my go to CSV library and the creator of PapaParse agrees with the RFC, that leading/trailing whitespace is part of the field

Unfortunately, the CSV spec specifically says: "Spaces are considered part of a field and should not be ignored." - if your CSV files are created with spaces after the commas, then the spaces are errors in the input and the generator needs to be fixed.
~ mholt/PapaParse#241 (comment)

There was a discussion
if there should exit an option to trim the whitespace, but it was decided against it.
Someday it could be behind a flag.

AFAICT, all JS based CSV parsers support Unicode, so there is no reason why we shouldn't - I'll update the code to use strings.CodepointIterator.
That also means we should support all known types of line breaks:

ASCII line breaks:
- \n (LF, Line Feed, U+000A)
- \r (CR, Carriage Return, U+000D)
- \r\n (CRLF, Windows-style line endings)
Non-ASCII Unicode line breaks:
- U+0085 (NEL, Next Line)
- U+2028 (LS, Line Separator)
- U+2029 (PS, Paragraph Separator)

I'll stay consistent with the RFC, just allow any of those symbols at the place where RFC uses CRLF.

Another feature present in other parsers is dynamic typing (dynamicTyping in papaparse ; infer in csv-simple-parser), which automatically parses the fields into JS types that "make sense" ¹.

I think we should skip all nice to haves at least until we can pass options to imports - otherwise the number of loaders will become unmanageable (delimiter x header x trimWhitespace x dynamicTyping x escapeCharacter x ??? = a lot).

Test Suit

I've found some sets of exhaustive tests we can use / get inspired by:

(will have to check licenses)

I'll implement them as soon as I find time 😅

It might be controversial, leading to PapaParse fork (https://www.npmjs.com/package/@simwrapper/papaparse), so it should definitely be opt in ↩

A-D-E-A · 2025-04-24T06:53:33Z

That's awesome!
If we can find a way to use import attributes, it would be even better!
I don't know how, but I know there's a type of import that actually checks the attributes, the sqlite database import. In case we build the app as a single-file executable, the attribute embed can be read (https://bun.sh/docs/bundler/executables#embed-sqlite-databases). No matter how hard I tried to understand how it was fetched from the source code, I couldn't find/understand it.

It would be great to have the attributes for using the header, but also the attributes for the field and row delimiters. That way, all "csv-like" formats (tsv, excel csv with ';', and even "ascii-delimited files") would work with a single implementation.

Thank you for your work!

feat: implement basic csv parsing

1db331c

Merge branch 'main' into csv-parser

89d29f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add built-in CSV loader #19167

feat: Add built-in CSV loader #19167

mastermakrela commented Apr 21, 2025 •

edited

Loading

kravetsone commented Apr 21, 2025

Jarred-Sumner commented Apr 21, 2025

mastermakrela commented Apr 21, 2025

mastermakrela commented Apr 23, 2025

A-D-E-A commented Apr 24, 2025 •

edited

Loading

feat: Add built-in CSV loader #19167

Are you sure you want to change the base?

feat: Add built-in CSV loader #19167

Conversation

mastermakrela commented Apr 21, 2025 • edited Loading

What does this PR do?

Design Choices and Rationale

Edge Case: Empty File Import

Checklist

How did you verify your code works?

If Zig files changed:

Footnotes

kravetsone commented Apr 21, 2025

Jarred-Sumner commented Apr 21, 2025

mastermakrela commented Apr 21, 2025

mastermakrela commented Apr 23, 2025

Parsing

Test Suit

Footnotes

A-D-E-A commented Apr 24, 2025 • edited Loading

mastermakrela commented Apr 21, 2025 •

edited

Loading

A-D-E-A commented Apr 24, 2025 •

edited

Loading