Home

Dump of ideas for the project:

use cases
- large scale, single-sharded applications
- data structures
  - chat logs
  - document store (json w/ embedded types)
  - structured text
  - game states?
  - for some of these (chat logs/game states) we'll need to be able to randomly access parts of a document
high performance features
- op composing combined with snapshot framing -- useful for syncing game state between game clients that need 'frames' to determine source of truth, or hot documents that process a lot of changes
tools for data management
- schema versioning -- old clients still work with new data
  - in migration framework, define mapping from old version -> new version
  - calculate rollback automatically/have user specify rollback plan
  - roll back schema when sending old clients data
  - migrate schema forward when writing ops from old clients
  - could either upgrade on read, or migrate all at once (probs go with former since we don't need to implement queries yet)
  - should work in theory?
- computed properties / tools for managing denormalized data
- should be able to roll out new version of app without disconnecting clients
  - application logic will be decoupled from the database, so will happen naturally through architectural decision
- should handle draft flow where you copy the document's current state -> make change -> write full as well
- handle state bundling problem for isomorphic js rendering -- be able to request documents at specific versions?
some way to use directly from the outside via websockets
- alternatively, make it easy to integrate into a server stack
  - Browsers have limited number of HTTP connections per server so this might be better
- need to think through security implications
- middleware for access control
  - query validation
  - read/write permissions
API for tagging ops related to user action so you can implement undo functionality (transactions)
Should it be postgres style, or riak style?
- could use ops as a mechanism for efficient sync if distributed between nodes
- if riak style:
  - how to query data globally? Maybe central elasticsearch index? Is that a robust solution?
    - How would subscriptions work? Would it be too chattery if a node had to publish things to other subscribed vnodes that are handling clients?
    - having aggregates done on an external service could induce query lag -- can do distributed queries on each node or use direct pubsub updates to mitigate
      - on the flip side, being able to asynchronously update an index would be awesome since operations can be fast-inserted, then we can update the index after a delay (similar to a debounce)
      - going down the debounce route would mean we need to make sure the index is up to date on startup though
  - what to do if ops get lost during a vnode failure / netsplit?
    - could just have good tuning for write availability, see http://docs.basho.com/riak/1.3.1/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/
    - should probably not rely on tuning though, since offline data sync between a client and a server might be a nice feature in the future, and the issue is similar to the netsplit issue
Handle object replacement merge issue in a more systematic way
- CRDTs / OT and data sync
  - could have users provide a schema which dictates the underlying CRDT behavior (riak allows this -- see how they did it)
    - this way apps can declaratively describe how conflicts could be resolved
    - should probably force users to do this anyway so that we can support offline sync easily
  - what can objects represent at a high level? Can we differentiate between these representations in the data type?
serialization into LevelDB
- options include MsgPack + BSON
- might go with BSON, but should benchmark http://stackoverflow.com/questions/6355497/performant-entity-serialization-bson-vs-messagepack-vs-json

Things I still need to read about:

Dotted version vectors -- riak switched from vector clocks to these, and they're better for our use too
- https://github.com/ricardobcl/Dotted-Version-Vectors
CRDT algorithms
- https://github.com/aphyr/meangirls
- https://github.com/asonge/loom
look at how other people have handled the edit-conflict-during-sync issue in a product
- not all data can be reliably merged when conflicts come up, ie input to freeform textboxes, so learning how this case is handled could be useful
look up how DHTs work
learn graphql
read more about concurrency control algorithms
zookeeper / consul
http://fabiensanglard.net/quake3/network.php

Potentially useful reading:

Reference implementations:

Riak Core:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally