-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Will Riley edited this page Aug 21, 2016
·
28 revisions
Dump of ideas for the project:
- use cases
- large scale, single-sharded applications
- data structures
- chat logs
- document store (json w/ embedded types)
- structured text
- game states?
- for some of these (chat logs/game states) we'll need to be able to randomly access parts of a document
- high performance features
- op composing combined with snapshot framing -- useful for syncing game state between game clients that need 'frames' to determine source of truth, or hot documents that process a lot of changes
- tools for data management
- schema versioning -- old clients still work with new data
- in migration framework, define mapping from old version -> new version
- calculate rollback automatically/have user specify rollback plan
- roll back schema when sending old clients data
- migrate schema forward when writing ops from old clients
- could either upgrade on read, or migrate all at once (probs go with former since we don't need to implement queries yet)
- should work in theory?
- computed properties / tools for managing denormalized data
- should be able to roll out new version of app without disconnecting clients
- application logic will be decoupled from the database, so will happen naturally through architectural decision
- should handle draft flow where you copy the document's current state -> make change -> write full as well
- handle state bundling problem for isomorphic js rendering -- be able to request documents at specific versions?
- schema versioning -- old clients still work with new data
- some way to use directly from the outside via websockets
- alternatively, make it easy to integrate into a server stack
- Browsers have limited number of HTTP connections per server so this might be better
- need to think through security implications
- middleware for access control
- query validation
- read/write permissions
- alternatively, make it easy to integrate into a server stack
- API for tagging ops related to user action so you can implement undo functionality (transactions)
- Should it be postgres style, or riak style?
- could use ops as a mechanism for efficient sync if distributed between nodes
- if riak style:
- how to query data globally? Maybe central elasticsearch index? Is that a robust solution?
- How would subscriptions work? Would it be too chattery if a node had to publish things to other subscribed vnodes that are handling clients?
- having aggregates done on an external service could induce query lag -- can do distributed queries on each node or use direct pubsub updates to mitigate
- on the flip side, being able to asynchronously update an index would be awesome since operations can be fast-inserted, then we can update the index after a delay (similar to a debounce)
- going down the debounce route would mean we need to make sure the index is up to date on startup though
- what to do if ops get lost during a vnode failure / netsplit?
- could just have good tuning for write availability, see http://docs.basho.com/riak/1.3.1/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/
- should probably not rely on tuning though, since offline data sync between a client and a server might be a nice feature in the future, and the issue is similar to the netsplit issue
- how to query data globally? Maybe central elasticsearch index? Is that a robust solution?
- Handle object replacement merge issue in a more systematic way
- CRDTs / OT and data sync
- could have users provide a schema which dictates the underlying CRDT behavior (riak allows this -- see how they did it)
- this way apps can declaratively describe how conflicts could be resolved
- should probably force users to do this anyway so that we can support offline sync easily
- what can objects represent at a high level? Can we differentiate between these representations in the data type?
- could have users provide a schema which dictates the underlying CRDT behavior (riak allows this -- see how they did it)
- CRDTs / OT and data sync
- serialization into LevelDB
- options include MsgPack + BSON
- might go with BSON, but should benchmark http://stackoverflow.com/questions/6355497/performant-entity-serialization-bson-vs-messagepack-vs-json
Things I still need to read about:
- Dotted version vectors -- riak switched from vector clocks to these, and they're better for our use too
- CRDT algorithms
- look at how other people have handled the edit-conflict-during-sync issue in a product
- not all data can be reliably merged when conflicts come up, ie input to freeform textboxes, so learning how this case is handled could be useful
- look up how DHTs work
- learn graphql
- read more about concurrency control algorithms
- zookeeper / consul
- http://fabiensanglard.net/quake3/network.php
Potentially useful reading:
- http://arxiv.org/abs/1608.03960
- https://medium.com/@raphlinus/towards-a-unified-theory-of-operational-transformation-and-crdt-70485876f72f#.h6i0yj3bo
- https://github.com/teh-cmc/seq/blob/master/README.md
Reference implementations: