Skip to content

Commit 5b67de8

Browse files
committed
Update README.md
1 parent e1624c6 commit 5b67de8

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# sqlite_zstd_vfs
22

3-
This [SQLite VFS extension](https://www.sqlite.org/vfs.html) provides streaming storage compression with [Zstandard](https://facebook.github.io/zstd/), compressing [pages of the main database file](https://www.sqlite.org/fileformat.html) as they're written out, and later decompressing them as they're read in. It runs page compression on background threads and occasionally generates [dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression) to improve subsequent compression.
3+
This [SQLite VFS extension](https://www.sqlite.org/vfs.html) provides streaming storage compression with [Zstandard](https://facebook.github.io/zstd/), compressing [pages of the main database file](https://www.sqlite.org/fileformat.html) as they're written out, and later decompressing them as they're read in. It runs page de/compression on background threads and occasionally generates [dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression) to improve subsequent compression.
44

55
Compressed page storage operates similarly to the [design of ZIPVFS](https://sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki), the SQLite developers' proprietary extension. Because we're not as familiar with the internal "pager" module, we use a full-fledged SQLite database as the bottom-most layer. Where SQLite would write database page #P at offset P × page_size in the disk file, instead we `INSERT INTO outer_page_table(rowid,data) VALUES(P,compressed_inner_page)`, and later `SELECT data FROM outer_page_table WHERE rowid=P`. *You mustn't be afraid to dream a little smaller, darling...*
66

@@ -95,21 +95,24 @@ Repeat the query to see the update.
9595

9696
## Performance
9797

98-
Here are some operation timings using a [1,195MiB TPC-H database](https://github.com/lovasoa/TPCH-sqlite) on my laptop. This isn't a thorough benchmark, just a rough indication that many applications should find the decompression overhead acceptable for the storage saved. (Credit to Zstandard!)
98+
Here are some operation timings using a [1,195MiB TPC-H database](https://github.com/lovasoa/TPCH-sqlite) on a Google N2 VM. This isn't a thorough benchmark, just a rough indication that many applications should find the de/compression overhead well worth the storage saved.
9999

100100
| | db file size | bulk load<sup>1</sup> | Query 1 | Query 8 |
101101
| -- | --: | --: | --: | --: |
102-
| SQLite defaults | 1182MiB | 4.3s | 8.4s | 3.9s |
103-
| zstd_vfs defaults | 647MiB | 21.1s | 11.5s | 44.7s |
104-
| zstd_vfs tuned<sup>2</sup> | 500MiB | 14.9s | 10.6s | 7.1s |
102+
| SQLite defaults | 1182MiB | 2.4s | 6.7s | 3.0s |
103+
| zstd_vfs defaults | 647MiB | 25.0s | 8.8s | 35.7s |
104+
| zstd_vfs tuned<sup>2</sup> | 433MiB | 33.7s | 7.8s | 4.5s |
105+
| zstd_vfs tuned &threads=8 | 433MiB | 6.9s | 6.7s | 3.1s |
105106

106107
<sup>1</sup> by VACUUM INTO<br/>
107-
<sup>2</sup> `&level=6&threads=8&outer_page_size=16384&outer_unsafe=true`; [`PRAGMA page_size=8192`](https://www.sqlite.org/pragma.html#pragma_page_size); [`PRAGMA cache_size=-102400`](https://www.sqlite.org/pragma.html#pragma_cache_size)
108+
<sup>2</sup> `&level=6&outer_page_size=2048&outer_unsafe=true`; [`PRAGMA page_size=65536`](https://www.sqlite.org/pragma.html#pragma_page_size); [`PRAGMA cache_size=-102400`](https://www.sqlite.org/pragma.html#pragma_cache_size)
108109

109-
Query 1 is an aggregation satisfied by one table scan. Decompression slows it down by ~25% while the database file shrinks by 50-60%. (Each query starts with a hot filesystem cache and cold database page cache.)
110+
Query 1 is an aggregation satisfied by one table scan. Foreground Zstandard decompression slows it down by ~25% while the database file shrinks by 45-65%. (Each query starts with a hot filesystem cache and cold database page cache.)
110111

111112
Query 8 is an [historically influential](https://www.sqlite.org/queryplanner-ng.html) eight-way join. SQLite's default ~2MB page cache is too small for its access pattern, leading to a disastrous slowdown from repeated page decompression; but simply increasing the page cache to 100MiB largely solves this problem. Evidently, we should prefer a much larger page cache in view of the increased cost to miss.
112113

114+
Background threads can be used for both compression and "prefetching" during sequential scans. This greatly improves bulk load speed, and can sometimes fully hide decompression latency (given sequential access patterns, large pages, and spare CPU availability).
115+
113116
## Tuning
114117

115118
Some parameters are controlled from the file URI's query string opening the database, while others are set later through [PRAGMA statements](https://www.sqlite.org/pragma.html):
@@ -125,7 +128,7 @@ Some parameters are controlled from the file URI's query string opening the data
125128
* **&outer_unsafe=false**: set true to speed up bulk load by disabling transaction safety for outer database (app crash easily causes corruption)
126129

127130
* **&outer_cache_size=-2000**: page cache size for outer database, in [PRAGMA cache_size](https://www.sqlite.org/pragma.html#pragma_cache_size) units. Limited effect if on SSD.
128-
* **&noprefetch=false**: set true to disable background prefetching/decompression even if threads>1. Prefetching is counterproductive if page size is too small or CPU cycles are scarce.
131+
* **&noprefetch=false**: set true to disable background prefetching/decompression even if threads>1. Prefetching is counterproductive if page_size is too small or CPU cycles are scarce.
129132

130133
* **PRAGMA page_size=4096**: uncompressed [page size](https://www.sqlite.org/pragma.html#pragma_page_size) for the newly-created inner database. Larger pages are more compressible, but increase [read/write amplification](http://smalldatum.blogspot.com/2015/11/read-write-space-amplification-pick-2_23.html). YMMV but 8 or 16 KiB have been working well for us.
131134
* **PRAGMA auto_vacuum=NONE**: set to FULL or INCREMENTAL on a newly-created database if you expect its size to fluctuate over time, so that the file will [shrink to fit](https://www.sqlite.org/pragma.html#pragma_auto_vacuum). (The outer database auto-vacuums when it's closed.)

0 commit comments

Comments
 (0)