Skip to content

RFC: transactions! snapshots! merge operators! #209

Closed
@spacejam

Description

@spacejam

Here are some sketches for upcoming features I'd like to implement after the 0.15 release bakes in the reliability of the core system a bit more. If you have an opinion on the API for any of these, please comment! This is just a (not runnable, pseudocode) sketch I wrote up while thinking about these features. I will edit this as ideas solidify.

Transactions

enum TxResult {
  Ok,
  PredicateFailure,
  Abort,
}

impl Tx {
    // Predicates must all return true if a transaction is to succeed.
    fn predicate(&mut self, k: Key, f: F) where F: Fn(&Key, Option<&Value>) -> bool;

    // Retries the transaction in a loop until it succeeds or reaches PredicateFailure. 
    fn retry_on_abort(&mut self, bool);

    fn set(&mut self, k: Key, v: Value);
    fn del(&mut self, k: &Key);

    // runs the transaction
    fn execute(self) -> TxResult
}

let mut tx = tree.tx();
tx.predicate("cats", |k, v| v == Some(b"meow"));
tx.predicate("dogs", |k, v| v == Some(b"woof"));
tx.set("cats", "woof");
tx.set("dogs", "meow");

match tx.execute() {
  TxResult::Committed(reads) => assert!(reads.is_empty()),
  TxResult::PredicateFailure(reads) => println!("tx failed because either cats \
                                     is not meow, or dogs is not woof \
                                     in {:?}", reads),
  TxResult::Abort => println!("another transaction operated on state concurrently, \
                                                we may want to retry");
}

After the 0.15 release bakes in the stability of the pagecache, 0.16 will be the "feature release" which includes transactions, snapshots and merge operators! A goal is to implement these in a way that minimizes the performance impact to workloads that do not care about these features. Performance is not the primary goal of these features, however, and for long-running snapshot cleanup, we may encounter some latency hiccups, although these may be hidden by punting cleanup to a threadpool.

Basic idea behind transactions:

  • all updates (set, del, cas, merge) include their original LSN
  • when executing, we check into an epoch, then generate a timestamp, then bound all reads of the transaction by this timestamp
  • any time an item is deleted, it uses the epoch-dropper mechanism to delay removal from shared state until all threads that checked in early enough to witness it have checked out of their epochs.
  • mutations in transactions only become visible to others after their transactions commit
  • transactions may cause each other to fail, resulting in conflicts. They first read all data items to be read or written, then set each data item into a pending state that refers to a transaction object, then they set the transaction object to the completed state, then clean up the pending state. Any reads that encounter a pending object try to set their transaction object to the failed state.

Snapshots

These use the same epoch & lsn-bounding technique as shown above to ensure old state doesn't get dropped out before a thread is done with it. Supports long-running scan operations, but they may hiccup when they complete due to cleaning up all of the garbage that has amassed since they began.

Merge Operators

type MergeOperator = Fn(key: &Key, last_value: Option<&Value>, new_merges: Vec<&Value>) -> Option<Value>;

fn concat_merge(key: &Key, existing_val: Option<&Value>, operands: Vec<&Value>)  -> Option<Value> {
    let mut result: Vec<u8> = Vec::with_capacity(operands.len());
    existing_val.map(|v| {
        result.extend_from_slice(&*v);
    });           
    for op in operands {    
        result.extend_from_slice(&*op); 
    } 
    // returning None deletes the value
    Some(result)
}

config.set_merge_operator(concat_merge);

let tree == config.tree();

tree.merge("cats", "m");
tree.merge("cats", "e");
tree.merge("cats", "o");
tree.merge("cats", "w");

assert_eq!(tree.get("cats"), Some("meow"));

// set and cas clear previous state
tree.set("cats", "woof");
assert_eq!(tree.get("cats"), Some("woof"));

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions