Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-utils insert --convert option #356

Closed
simonw opened this issue Dec 11, 2021 · 11 comments
Closed

sqlite-utils insert --convert option #356

simonw opened this issue Dec 11, 2021 · 11 comments
Labels
cli-tool enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Dec 11, 2021

Idea come to me while re-reading this: https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/

This is a bit of a hack:

cat /tmp/log.txt | \
    jq --raw-input '{line: .}' --compact-output | \
    sqlite-utils insert /tmp/logs.db log - --nl

Would be great if you could pipe lines to insert and transform them on the way in.

A --convert python-code option, modeled after sqlite-utils convert, could do this.

@simonw simonw added enhancement New feature or request cli-tool labels Dec 11, 2021
@simonw
Copy link
Owner Author

simonw commented Dec 11, 2021

By default this will accept single lines, but maybe there could be a --all option which instead grabs all of stdin into a single string against which the conversion function runs - like git-history file.

@simonw
Copy link
Owner Author

simonw commented Dec 19, 2021

@simonw
Copy link
Owner Author

simonw commented Dec 19, 2021

This is going to need a --import multi option too.

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

I think this should work on JSON, or CSV, or individual lines, or the entire content at once.

So I'll require --lines --convert ... to import individual lines, or --all --convert to run the conversion against the entire input at once.

What would --lines or --all do without --convert? Maybe insert records as {"line": "line of text"} or {"all": "whole input}.

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

The implementation of this gets a tiny bit complicated.

Ignoring --convert, the --lines option can internally produce {"line": ...} records and the --all option can produce {"all": ...} records.

But... when --convert is used, what should the code run against?

It could run against those already-converted records but that's a little bit strange, since you'd have to do this:

sqlite-utils insert blah.db blah myfile.txt --all --convert '{"item": s for s in value["all"].split("-")}'

Having to use value["all"] there is unintuitive. It would be nicer to have a all variable to work against.

But then for --lines should the local variable be called line? And how best to summarize these different names for local variables in the inline help for the feature?

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

% sqlite-utils insert --help
Usage: sqlite-utils insert [OPTIONS] PATH TABLE JSON_FILE

  Insert records from JSON file into a table, creating the table if it does
  not already exist.

  Input should be a JSON array of objects, unless --nl or --csv is used.

Options:
  --pk TEXT                 Columns to use as the primary key, e.g. id
  --nl                      Expect newline-delimited JSON
  --flatten                 Flatten nested JSON objects
  -c, --csv                 Expect CSV
  --tsv                     Expect TSV
  --convert TEXT            Python code to convert each item
  --import TEXT             Python modules to import
  --delimiter TEXT          Delimiter to use for CSV files
  --quotechar TEXT          Quote character to use for CSV/TSV
  --sniff                   Detect delimiter and quote character
  --no-headers              CSV file has no header row
  --batch-size INTEGER      Commit every X records
  --alter                   Alter existing table to add any missing columns
  --not-null TEXT           Columns that should be created as NOT NULL
  --default <TEXT TEXT>...  Default value that should be set for a column
  --encoding TEXT           Character encoding for input, defaults to utf-8
  -d, --detect-types        Detect types for columns in CSV/TSV data
  --load-extension TEXT     SQLite extensions to load
  --silent                  Do not show progress bar
  --ignore                  Ignore records if pk already exists
  --replace                 Replace records if pk already exists
  --truncate                Truncate table before inserting records, if table
                            already exists
  -h, --help                Show this message and exit.

I can add a bunch of extra help at the top there to explain all of this stuff. That "Input should be a JSON array of objects" bit could be expanded to several paragraphs.

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

--flatten could do with a better description too.

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

Work-in-progress improved help:

Usage: sqlite-utils insert [OPTIONS] PATH TABLE FILE

  Insert records from FILE into a table, creating the table if it does not
  already exist.

  By default the input is expected to be a JSON array of objects. Or:

  - Use --nl for newline-delimited JSON objects
  - Use --csv or --tsv for comma-separated or tab-separated input
  - Use --lines to write each incoming line to a column called "line"
  - Use --all to write the entire input to a column called "all"

  You can also use --convert to pass a fragment of Python code that will be
  used to convert each input.

  Your Python code will be passed a "row" variable representing the imported
  row, and can return a modified row.

  If you are using --lines your code will be passed a "line" variable, and for
  --all an "all" variable.

Options:
  --pk TEXT                 Columns to use as the primary key, e.g. id
  --flatten                 Flatten nested JSON objects, so {"a": {"b": 1}}
                            becomes {"a_b": 1}
  --nl                      Expect newline-delimited JSON
  -c, --csv                 Expect CSV input
  --tsv                     Expect TSV input
  --lines                   Treat each line as a single value called 'line'
  --all                     Treat input as a single value called 'all'
  --convert TEXT            Python code to convert each item
  --import TEXT             Python modules to import
  --delimiter TEXT          Delimiter to use for CSV files
  --quotechar TEXT          Quote character to use for CSV/TSV
  --sniff                   Detect delimiter and quote character
  --no-headers              CSV file has no header row
  --batch-size INTEGER      Commit every X records
  --alter                   Alter existing table to add any missing columns
  --not-null TEXT           Columns that should be created as NOT NULL
  --default <TEXT TEXT>...  Default value that should be set for a column
  --encoding TEXT           Character encoding for input, defaults to utf-8
  -d, --detect-types        Detect types for columns in CSV/TSV data
  --load-extension TEXT     SQLite extensions to load
  --silent                  Do not show progress bar
  --ignore                  Ignore records if pk already exists
  --replace                 Replace records if pk already exists
  --truncate                Truncate table before inserting records, if table
                            already exists
  -h, --help                Show this message and exit.

@simonw
Copy link
Owner Author

simonw commented Dec 20, 2021

(This makes me want --extract from #352 even more.)

@simonw
Copy link
Owner Author

simonw commented Jan 6, 2022

Implemented in #361.

@simonw simonw closed this as completed Jan 6, 2022
simonw added a commit that referenced this issue Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli-tool enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant