v0.47.0
This is a significant release, focused on improving i/o, responsiveness, and performance. The headline features are caching of ingested data for document sources such as CSV or Excel, and download caching for remote document sources. There are a lot of under-the-hood changes, so please open an issue if you encounter any weirdness.
Added
- Long-running operations (such as data ingestion, or file download) now result in a progress bar being displayed. Display of the progress bar is controlled by the new config options
progress
andprogress.delay
. You can also use the--no-progress
flag to disable the progress bar.- 👉 The progress bar is rendered on
stderr
and is always zapped from the terminal when command output begins. It won't corrupt the output.
- 👉 The progress bar is rendered on
- #307: Ingested document sources (such as CSV or Excel) now make use of an ingest cache DB. Previously, ingestion of document source data occurred on each
sq
command. It is now a one-time cost; subsequent use of the document source utilizes the cache DB. Until, that is, the source document changes: then the ingest cache DB is invalidated and ingested again. This is a significantly improved experience for large document sources. - There are several new commands to interact with the cache (although you shouldn't need to):
sq cache enable
andsq cache disable
control cache usage. You can also instead use the newingest.cache
config option.sq cache clear
clears the cache.sq cache location
prints the cache location on disk.sq cache stat
shows stats about the cache.sq cache tree
shows a tree view of the cache.
- #24: The download mechanism for remote document sources (e.g. a CSV file at
https://sq.io/testdata/actor.csv
) has been completely overhauled. Previously,sq
would re-download the remote file on every command. Now, the remote file is downloaded and cached locally. Subsequentsq
invocations check for staleness of the cached download, and re-download if necessary. - As part of the download revamp, new config options have been introduced:
http.request.timeout
is the timeout for the initial response from the server, andhttp.response.timeout
is the timeout for reading the entire response body. We separate these two timeouts because it's possible that the server responds quickly, but then for a large file, the download takes too long.https.insecure-skip-verify
controls whether HTTPS connections verify the server's certificate. This is useful for remote files served with a self-signed certificate.download.cache
controls whether remote files are cached locally.download.refresh.ok-on-err
controls whethersq
should continue with a stale cached download if an error occurred while trying to refresh the download. This is a sort of "Airplane Mode" for remote document sources:sq
continues with the cached download when the network is unavailable.
- There are two more new config options introduced as part of the above work.
cache.lock.timeout
controls the time thatsq
will wait for a lock on the cache DB. The cache lock is introduced for when you have multiplesq
commands running concurrently, and you want to avoid them stepping on each other.- Similarly,
config.lock.timeout
controls the timeout for acquiring the (newly-introduced) lock onsq
's config file. This helps prevent issues with multiplesq
processes mutating the config concurrently.
sq
's own logs previously outputted in JSON format. Now there's a newlog.format
config option that permits setting the log format tojson
ortext
. Thetext
format is more human-friendly, and is now the default.
Changed
Fixed
- Opening a DB connection now correctly honors
conn.open-timeout
.