Description
Here are some sketches for upcoming features I'd like to implement after the 0.15 release bakes in the reliability of the core system a bit more. If you have an opinion on the API for any of these, please comment! This is just a (not runnable, pseudocode) sketch I wrote up while thinking about these features. I will edit this as ideas solidify.
Transactions
enum TxResult {
Ok,
PredicateFailure,
Abort,
}
impl Tx {
// Predicates must all return true if a transaction is to succeed.
fn predicate(&mut self, k: Key, f: F) where F: Fn(&Key, Option<&Value>) -> bool;
// Retries the transaction in a loop until it succeeds or reaches PredicateFailure.
fn retry_on_abort(&mut self, bool);
fn set(&mut self, k: Key, v: Value);
fn del(&mut self, k: &Key);
// runs the transaction
fn execute(self) -> TxResult
}
let mut tx = tree.tx();
tx.predicate("cats", |k, v| v == Some(b"meow"));
tx.predicate("dogs", |k, v| v == Some(b"woof"));
tx.set("cats", "woof");
tx.set("dogs", "meow");
match tx.execute() {
TxResult::Committed(reads) => assert!(reads.is_empty()),
TxResult::PredicateFailure(reads) => println!("tx failed because either cats \
is not meow, or dogs is not woof \
in {:?}", reads),
TxResult::Abort => println!("another transaction operated on state concurrently, \
we may want to retry");
}
After the 0.15 release bakes in the stability of the pagecache, 0.16 will be the "feature release" which includes transactions, snapshots and merge operators! A goal is to implement these in a way that minimizes the performance impact to workloads that do not care about these features. Performance is not the primary goal of these features, however, and for long-running snapshot cleanup, we may encounter some latency hiccups, although these may be hidden by punting cleanup to a threadpool.
Basic idea behind transactions:
- all updates (set, del, cas, merge) include their original LSN
- when executing, we check into an epoch, then generate a timestamp, then bound all reads of the transaction by this timestamp
- any time an item is deleted, it uses the epoch-dropper mechanism to delay removal from shared state until all threads that checked in early enough to witness it have checked out of their epochs.
- mutations in transactions only become visible to others after their transactions commit
- transactions may cause each other to fail, resulting in conflicts. They first read all data items to be read or written, then set each data item into a pending state that refers to a transaction object, then they set the transaction object to the completed state, then clean up the pending state. Any reads that encounter a pending object try to set their transaction object to the failed state.
Snapshots
These use the same epoch & lsn-bounding technique as shown above to ensure old state doesn't get dropped out before a thread is done with it. Supports long-running scan operations, but they may hiccup when they complete due to cleaning up all of the garbage that has amassed since they began.
Merge Operators
type MergeOperator = Fn(key: &Key, last_value: Option<&Value>, new_merges: Vec<&Value>) -> Option<Value>;
fn concat_merge(key: &Key, existing_val: Option<&Value>, operands: Vec<&Value>) -> Option<Value> {
let mut result: Vec<u8> = Vec::with_capacity(operands.len());
existing_val.map(|v| {
result.extend_from_slice(&*v);
});
for op in operands {
result.extend_from_slice(&*op);
}
// returning None deletes the value
Some(result)
}
config.set_merge_operator(concat_merge);
let tree == config.tree();
tree.merge("cats", "m");
tree.merge("cats", "e");
tree.merge("cats", "o");
tree.merge("cats", "w");
assert_eq!(tree.get("cats"), Some("meow"));
// set and cas clear previous state
tree.set("cats", "woof");
assert_eq!(tree.get("cats"), Some("woof"));