Skip to content

Commit

Permalink
#307: Ingest cache (#354)
Browse files Browse the repository at this point in the history
- Support for ingest cache, download cache, and progress bars.
  • Loading branch information
neilotoole authored Jan 15, 2024
1 parent 315917d commit db55986
Show file tree
Hide file tree
Showing 262 changed files with 13,468 additions and 4,020 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ _testmain.go
/scratch
.envrc
**/*.bench
go.work*

# Some apps create temp files when editing, e.g. Excel with drivers/xlsx/testdata/~$test_header.xlsx
**/testdata/~*
Expand All @@ -54,5 +55,4 @@ goreleaser-test.sh
/cli/test.db
/*.db
/.CHANGELOG.delta.md
go.work
go.work.sum
/testh/progress-remove.test.sh
292 changes: 292 additions & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ run:
# This package is such a mess, and needs to be rewritten completely.
- cli/output/tablew/internal

# The tint package is a fork of an upstream package.
- libsq/core/lg/devlog/tint

# Non-committed scratch dir
- scratch

Expand Down Expand Up @@ -187,6 +190,295 @@ linters-settings:
# Default: false
require-specific: true

revive:
# https://golangci-lint.run/usage/linters/#revive
rules:

# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#add-constant
- name: add-constant
disabled: true
arguments:
- maxLitCount: "3"
allowStrs: '""'
allowInts: "0,1,2"
allowFloats: "0.0,0.,1.0,1.,2.0,2."
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#argument-limit
- name: argument-limit
disabled: false
arguments: [ 7 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#atomic
- name: atomic
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#banned-characters
- name: banned-characters
disabled: true
arguments: [ "Ω", "Σ", "σ", "7" ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#bare-return
- name: bare-return
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#blank-imports
- name: blank-imports
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#bool-literal-in-expr
- name: bool-literal-in-expr
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#call-to-gc
- name: call-to-gc
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#cognitive-complexity
- name: cognitive-complexity
disabled: true
arguments: [ 7 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#comment-spacings
- name: comment-spacings
disabled: true
arguments:
- mypragma
- otherpragma
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#confusing-naming
- name: confusing-naming
disabled: true
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#confusing-results
- name: confusing-results
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#constant-logical-expr
- name: constant-logical-expr
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#context-as-argument
- name: context-as-argument
disabled: false
arguments:
- allowTypesBefore: "*testing.T,*github.com/user/repo/testing.Harness"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#context-keys-type
- name: context-keys-type
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#cyclomatic
- name: cyclomatic
disabled: true
arguments: [ 3 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#datarace
- name: datarace
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#deep-exit
- name: deep-exit
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#defer
- name: defer
disabled: false
arguments:
- [ "call-chain", "loop" ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#dot-imports
- name: dot-imports
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#duplicated-imports
- name: duplicated-imports
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#early-return
- name: early-return
disabled: false
arguments:
- "preserveScope"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#empty-block
- name: empty-block
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#empty-lines
- name: empty-lines
disabled: true # Covered by "whitespace" linter.
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#enforce-map-style
- name: enforce-map-style
disabled: true
arguments:
- "make"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#error-naming
- name: error-naming
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#error-return
- name: error-return
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#error-strings
- name: error-strings
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#errorf
- name: errorf
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#file-header
- name: file-header
disabled: true
# arguments:
# - This is the text that must appear at the top of source files.
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#flag-parameter
- name: flag-parameter
disabled: true
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#function-result-limit
- name: function-result-limit
disabled: false
arguments: [ 4 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#function-length
- name: function-length
disabled: true
arguments: [ 10, 0 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#get-return
- name: get-return
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#identical-branches
- name: identical-branches
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#if-return
- name: if-return
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#increment-decrement
- name: increment-decrement
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#indent-error-flow
- name: indent-error-flow
disabled: false
arguments:
- "preserveScope"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#import-alias-naming
- name: import-alias-naming
disabled: false
arguments:
- "^[a-z][a-z0-9]{0,}$"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#imports-blacklist
- name: imports-blacklist
disabled: false
arguments:
- "crypto/md5"
- "crypto/sha1"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#import-shadowing
- name: import-shadowing
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#line-length-limit
- name: line-length-limit
disabled: true
arguments: [ 80 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#max-public-structs
- name: max-public-structs
disabled: true
arguments: [ 3 ]
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#modifies-parameter
- name: modifies-parameter
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#modifies-value-receiver
- name: modifies-value-receiver
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#nested-structs
- name: nested-structs
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#optimize-operands-order
- name: optimize-operands-order
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#package-comments
- name: package-comments
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#range
- name: range
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#range-val-in-closure
- name: range-val-in-closure
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#range-val-address
- name: range-val-address
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#receiver-naming
- name: receiver-naming
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#redundant-import-alias
- name: redundant-import-alias
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#redefines-builtin-id
- name: redefines-builtin-id
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#string-of-int
- name: string-of-int
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#string-format
- name: string-format
disabled: false
arguments:
- - 'core.WriteError[1].Message'
- '/^([^A-Z]|$)/'
- must not start with a capital letter
- - 'fmt.Errorf[0]'
- '/(^|[^\.!?])$/'
- must not end in punctuation
- - panic
- '/^[^\n]*$/'
- must not contain line breaks
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#struct-tag
- name: struct-tag
arguments:
- "json,inline"
- "bson,outline,gnu"
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#superfluous-else
- name: superfluous-else
disabled: false
arguments:
- "preserveScope"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#time-equal
- name: time-equal
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#time-naming
- name: time-naming
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#var-naming
- name: var-naming
disabled: false
arguments:
- [ "ID" ] # AllowList
- [ "VM" ] # DenyList
- - upperCaseConst: true
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#var-declaration
- name: var-declaration
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unconditional-recursion
- name: unconditional-recursion
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unexported-naming
- name: unexported-naming
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unexported-return
- name: unexported-return
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unhandled-error
- name: unhandled-error
disabled: false
arguments:
- "fmt.Printf"
- "fmt.Fprintf"
- "fmt.Fprint"
- "fmt.Fprintln"
- "bytes.Buffer.Write"
- "bytes.Buffer.WriteByte"
- "bytes.Buffer.WriteString"
- "bytes.Buffer.WriteRune"
- "strings.Builder.WriteString"
- "strings.Builder.WriteRune"
- "strings.Builder.Write"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unnecessary-stmt
- name: unnecessary-stmt
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unreachable-code
- name: unreachable-code
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unused-parameter
- name: unused-parameter
disabled: false
arguments:
- allowRegex: "^_"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unused-receiver
- name: unused-receiver
disabled: true
arguments:
- allowRegex: "^_"
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#useless-break
- name: useless-break
disabled: false
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#waitgroup-by-value
- name: waitgroup-by-value
disabled: false

rowserrcheck:
# database/sql is always checked
# Default: []
Expand Down
55 changes: 55 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,61 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

Breaking changes are annotated with ☢️, and alpha/beta features with 🐥.

## Upcoming

This is a significant release, focused on improving i/o, responsiveness,
and performance. The headline feature is caching of ingested data for
document sources such as CSV or Excel.

### Added

- Long-running operations (such as data ingestion, or file download) now result
in a progress bar being displayed. Display of the progress bar is controlled
by the new config options [`progress`](https://sq.io/docs/config#progress)
and [`progress.delay`](https://sq.io/docs/config#progressdelay). You can also use
the `--no-progress` flag to disable the progress bar.
- Ingested [document sources](https://sq.io/docs/concepts#document-source) (such as
[CSV](https://sq.io/docs/drivers/csv) or [Excel](https://sq.io/docs/drivers/xlsx))
now make use of an ingest cache DB. Previously, ingestion of document source data occurred
on each `sq` command. It is now a one-time cost; subsequent use of the document source utilizes
the cache DB. If the source document changes, the ingest cache DB is invalidated and
ingested again. This is a significantly improved experience for large document sources.
- There's several new commands to interact with the cache.
- [`sq cache enable`](https://sq.io/docs/cmd/cache_enable) and
[`sq cache disable`](https://sq.io/docs/cmd/cache_disable) control cache usage.
You can also instead use the new [`ingest.cache`](https://sq.io/docs/config#ingestcache)
config option.
- [`sq cache clear`](https://sq.io/docs/cmd/cache_clear) clears the cache.
- [`sq cache location`](https://sq.io/docs/cmd/cache_location) prints the cache location on disk.
- [`sq cache stat`](https://sq.io/docs/cmd/cache_stat) shows stats about the cache.
- [`sq cache tree`](https://sq.io/docs/cmd/cache_location) shows a tree view of the cache.
- Downloading of remote document sources (e.g. a CSV file at
[`https://sq.io/testdata/actor.csv`](https://sq.io/testdata/actor.csv)) has been completely
overhauled. Previously, `sq` would re-download the remote file on every command. Now, the
remote file is downloaded and cached locally. Subsequent commands check for staleness of
the cached download, and re-download if necessary.
- As part of the download revamp, new config options have been introduced:
- [`http.request.timeout`](https://sq.io/docs/config#httprequesttimeout) and
[`http.response.timeout`](https://sq.io/docs/config#httpresponsetimeout) control HTTP timeout.
- [`https.insecure-skip-verify`](https://sq.io/docs/config#httpsinsecureskipverify) controls
whether HTTPS connections verify the server's certificate. This is useful for remote files served
with a self-signed certificate.
- There are two more new config options introduced as part of the above work.
- [`cache.lock.timeout`](https://sq.io/docs/config#cachelocktimeout) controls the time that
`sq` will wait for a lock on the cache DB. The cache lock is introduced for when you have
multiple `sq` commands running concurrently, and you want to avoid them stepping on each other.
- Similarly, [`config.lock.timeout`](https://sq.io/docs/config#configlocktimeout) controls the
timeout for acquiring the (newly-introduced) lock on `sq`'s config file. This helps prevent
issues with multiple `sq` processes mutating the config concurrently.
- `sq`'s own [logs](https://sq.io/docs/config#logging) previously outputted in JSON
format. Now there's a new [`log.format`](https://sq.io/docs/config#logformat) config option
that permits setting the log format to `json` or `text`. The `text` format is more human-friendly, and
is now the default.

### Fixed

- Opening a DB connection now correctly honors [`conn.open-timeout`](https://sq.io/docs/config#connopen-timeout).

## [v0.46.1] - 2023-12-06

### Fixed
Expand Down
Loading

0 comments on commit db55986

Please sign in to comment.