Skip to content

canvas-ai/canvas-synapsd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynapsD

A very simple, naive implementation of a JSON document store with some bitmap indexes in the mix. Module primarily but not exclusively for use with Canvas (https://github.com/canvas-ai/canvas-server)

Architecture

Components

JSON Document Store

  • Simple LMDB KV store with enforced document schemas (See ./src/schemas for more details)
  • Every data abstraction schema (File, Note, Browser tab, Email etc) defines its own indexing options; currently supported indexOptions:
    • checksumAlgorithms: Checksums to calculate
    • checksumFields: JSON document fields to calculate checksums
    • searchFields: Full text search fields
    • embeddingFields: Concatenated fields to calculate embedding vectors
    • embeddingModel: Model to use for embedding
    • embeddingDimensions: Dimensions of the embedding vectors
    • embeddingProvider: Provider to use for embedding
    • embeddingProviderOptions: Options for the embedding provider
    • chunking: Chunking options
  • storageOptions:
    • supportedBackends: Array of backend names to use
    • defaultBackend: Default backend to use
    • defaultBackendOptions: Default backend options

Index implementation

Hashmaps / Inverted indexes

  • algorithm/checksum | docID
    Example: sha1/4e1243.. => document ID)
  • timestamp | docID
    Example: 20250212082411.1234 => document ID
    We could use composite keys and LMDB range queries instead (timestamp/docID => document) but for now this way is more practical.

Bitmap indexes

The following bitmap index prefixes are enforced to organize and filter documents:

  • internal/ - Internal bitmaps
  • context/ - Context path bitmaps, used internally by Canvas (as context tree nodes, context/uuid)
  • data/abstraction/<schema> - Schema type filters (incl subtrees like data/abstraction/file/ext/json)
  • data/mime/<type>
  • data/content/encoding/<encoding>
  • client/os/<os>
  • client/application/<application>
  • client/device/<device-id>
  • client/network/<network-id> -We support storing documents on multiple backends(StoreD), when a canvas application connects from a certain network, not all backends may be reachable(your home NAS from work for example)
  • user/
  • tag/ - Generic tag bitmaps
  • custom/ - Throw what you need here

TODO

  • add support for chunking
  • add support for versioning
  • add support for embeddings (we should calculate embeddings on the db side if none are provided)
  • add support for vector search
  • move the contextTree functionality from Canvas to this module (db will present a tree view on top of the dataset)
  • switch to (andBitmapArray, orBitmapArray, filterArray) instead of contextBitmapArray and featureBitmapArray
  • For 2.0 we should move entirely to Collections (prefix based, not dataset based)
  • We should move all internal bitmaps out of view, list methods should not return them nor should it be possible to edit them directly(maybe a dedicated dataset for internal bitmaps?)
  • We need to implement a ignoreMissingBitmaps option for list methods; this module is consumed by tool calls from ai agents and minions, compiling a list of bitmaps may not be very accurate
  • Add proper stats() support
    • We should keep track of bitmap usage
    • The above implies having "static" and "dynamic" bitmaps, static would be kept regardless of their usage but dynamic would be removed when not in use
  • Implement nested bitmaps (simplest would be to just detect if a bitmap key ends with a ID or something like _nested:id or_ref:id)
  • All of the above is a breeze with todays tools, goes to show that the only limiting factor in most scenarios will prominently become time!

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published