Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle byte order marks (BOMs) in CSV files #250

Closed
simonw opened this issue Mar 22, 2021 · 3 comments
Closed

Handle byte order marks (BOMs) in CSV files #250

simonw opened this issue Mar 22, 2021 · 3 comments
Labels
bug Something isn't working cli-tool

Comments

@simonw
Copy link
Owner

simonw commented Mar 22, 2021

I often find sqlite-utils insert ... --csv creates a first column with a weird character at the start of it - which it turns out is the UTF-8 BOM. Fix that.

@simonw simonw added bug Something isn't working cli-tool labels Mar 22, 2021
@simonw
Copy link
Owner Author

simonw commented May 29, 2021

https://stackoverflow.com/a/44573867/6083 says:

There is no reason to check if a BOM exists or not, utf-8-sig manages that for you and behaves exactly as utf-8 if the BOM does not exist

@simonw
Copy link
Owner Author

simonw commented May 29, 2021

The other option is to check if the file starts with codecs.BOM_UTF8 - which is b'\xef\xbb\xbf'.

@simonw
Copy link
Owner Author

simonw commented May 29, 2021

I needed to find some CSV files on my computer with a BOM at the beginning - I figured out this recipe:

% rg -U -E none '^(?-u:\xEF\xBB\xBF)' --glob '*.csv' .

TIL here: https://til.simonwillison.net/bash/finding-bom-csv-files-with-ripgrep

@simonw simonw closed this as completed in 8de5595 May 29, 2021
simonw added a commit that referenced this issue May 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli-tool
Projects
None yet
Development

No branches or pull requests

1 participant