A very simple, naive implementation of a JSON document store with some bitmap indexes in the mix. Module primarily but not exclusively for use with Canvas (https://github.com/canvas-ai/canvas-server)
- LMDB, to-be-replaced by pouchdb or rxdb as the main KV backend (https://www.npmjs.com/package/lmdb)
- Compressed (roaring) bitmaps (https://www.npmjs.com/package/roaring)
- FlexSearch for full-text search (https://www.npmjs.com/package/flexsearch)
- LanceDB (https://www.npmjs.com/package/@lancedb/lancedb)
- Simple LMDB KV store with enforced document schemas (See
./src/schemas
for more details) - Every data abstraction schema (File, Note, Browser tab, Email etc) defines its own indexing options; currently supported indexOptions:
- checksumAlgorithms: Checksums to calculate
- checksumFields: JSON document fields to calculate checksums
- searchFields: Full text search fields
- embeddingFields: Concatenated fields to calculate embedding vectors
- embeddingModel: Model to use for embedding
- embeddingDimensions: Dimensions of the embedding vectors
- embeddingProvider: Provider to use for embedding
- embeddingProviderOptions: Options for the embedding provider
- chunking: Chunking options
- storageOptions:
- supportedBackends: Array of backend names to use
- defaultBackend: Default backend to use
- defaultBackendOptions: Default backend options
- algorithm/checksum | docID
Example: sha1/4e1243.. => document ID) - timestamp | docID
Example: 20250212082411.1234 => document ID
We could use composite keys and LMDB range queries instead (timestamp/docID => document) but for now this way is more practical.
The following bitmap index prefixes are enforced to organize and filter documents:
internal/
- Internal bitmapscontext/
- Context path bitmaps, used internally by Canvas (as context tree nodes, context/uuid)data/abstraction/<schema>
- Schema type filters (incl subtrees like data/abstraction/file/ext/json)data/mime/<type>
data/content/encoding/<encoding>
client/os/<os>
client/application/<application>
client/device/<device-id>
client/network/<network-id>
-We support storing documents on multiple backends(StoreD), when a canvas application connects from a certain network, not all backends may be reachable(your home NAS from work for example)user/
tag/
- Generic tag bitmapscustom/
- Throw what you need here
- add support for chunking
- add support for versioning
- add support for embeddings (we should calculate embeddings on the db side if none are provided)
- add support for vector search
- move the contextTree functionality from Canvas to this module (db will present a tree view on top of the dataset)
- switch to (andBitmapArray, orBitmapArray, filterArray) instead of contextBitmapArray and featureBitmapArray
- For 2.0 we should move entirely to Collections (prefix based, not dataset based)
- We should move all internal bitmaps out of view, list methods should not return them nor should it be possible to edit them directly(maybe a dedicated dataset for internal bitmaps?)
- We need to implement a ignoreMissingBitmaps option for list methods; this module is consumed by tool calls from ai agents and minions, compiling a list of bitmaps may not be very accurate
- Add proper stats() support
- We should keep track of bitmap usage
- The above implies having "static" and "dynamic" bitmaps, static would be kept regardless of their usage but dynamic would be removed when not in use
- Implement nested bitmaps (simplest would be to just detect if a bitmap key ends with a ID or something like _nested:id or_ref:id)
- All of the above is a breeze with todays tools, goes to show that the only limiting factor in most scenarios will prominently become time!