Skip to content

Commit

Permalink
Merge branch 'master' of github.com:aosabook/500lines
Browse files Browse the repository at this point in the history
  • Loading branch information
nedbat committed Jun 15, 2014
2 parents f1e87a1 + 2f4ac33 commit 66decfb
Show file tree
Hide file tree
Showing 44 changed files with 39,513 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .classpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" path="functionalDB/src"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
<classpathentry kind="output" path="functionalDB/bin"/>
</classpath>
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ Contributors
</td>
<td><a href="https://github.com/haz">haz</a></td>
<td>christian.muise@gmail.com</td>
</tr>
<tr>
<td>Carlos Scheidegger</td>
<td>AT&amp;T Research</td>
Expand All @@ -211,4 +212,43 @@ Contributors
<td><a href="https://github.com/cscheid">cscheid</a></td>
<td>carlos.scheidegger@gmail.com</td>
</tr>
<tr>
<td>Marina Samuel</td>
<td>Mozilla</td>
<td>ocr</td>
<td>
<ul>
<li><a href="http://marinasamuel.com">www.marinasamuel.com</a></li>
<li><a href="https://twitter.com/emtwos">@emtwos</a></li>
</ul>
</td>
<td><a href="https://github.com/emtwo">emtwo</a></td>
<td>msamuel@mozilla.com</td>
</tr>
<tr>
<td>Cate Huston</td>
<td>Google</td>
<td>Image Filter app</td>
<td>
<ul>
<li><a href="http://www.accidentallyincode.com/">www.accidentallyincode.com/</a></li>
<li><a href="https://twitter.com/catehstn">@catehstn</a></li>
</ul>
</td>
<td><a href="https://github.com/catehstn">catehstn</a></td>
<td>catehuston@gmail.com</td>
</tr>
<tr>
<td>Yoav Rubin</td>
<td>Microsoft</td>
<td>In-memory functional database</td>
<td>
<ul>
<li><a href="https://twitter.com/yoavrubin">@yoavrubin</a></li>
</ul>
</td>
<td><a href="https://github.com/yoavrubin">yoavrubin</a></td>
<td></td>

</tr>
</table>
1 change: 1 addition & 0 deletions functionalDB/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/bin
13 changes: 13 additions & 0 deletions functionalDB/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# FDB

FDB (TODO - find a better name) is an in-memory, no-sql functional database, written in Clojure.

It is a modest attempt to provide part of the functionality that the Datomic database provides (the main omitted functionality is the durability part).

The core idea is to add the notion of time to the data, and provide on top of the standard CRUD operations, a set of time related operations.

The database itself is a collection of datums. A datum is built of an entity (think of a row in a table)that has its attributes (a column) and the value of that entity's attribute at a given time.
Any update does not overwrite the prvious value, but addes another datum to the database.

The database supports both DB "modifying" (modifying as in creating a new DB value) transactions and what-if actions.

58 changes: 58 additions & 0 deletions functionalDB/src/fdb/constructs.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
(ns fdb.constructs
(:use fdb.storage))

(defrecord Database [timestamped top-id curr-time])
(defrecord Timestamped [storage VAET AVET VEAT EAVT])
(defrecord Entity [id attrs])
(defrecord Attr [name value ts prev-ts])

(defn make-index
"An index is a tree, implemented by nested maps, each level corresponds to either entity-id, attribute name or a value, where the leaves of the tree are sets.
The order of the levels changes from one index to another, to allow different quering on different indices. It is possible to reorder a path in the tree to an EAV structure
using the to-eav function, or transform an EAV to a specific index path using the from-eav, both function reside in the metadataa of the index. The leaf index is the index of
the leaf element in an EAV triplet. The usage-pred is a predicate to decide whether to index a specific attribute in an entity or not."
[from-eav to-eav usage-pred leaf-index]
(with-meta {} {:from-eav from-eav :to-eav to-eav :usage-pred usage-pred :leaf-index leaf-index}))

(defn from-eav [index] (:from-eav (meta index)))
(defn to-eav [index] (:to-eav (meta index)))
(defn usage-pred [index] (:usage-pred (meta index)))
(defn leaf-index [index] (:leaf-index (meta leaf-index)))

(defn single? [attr] (= :db/single (:cardinality (meta attr))))

(defn ref? [attr] (= :db/ref (:type (meta attr))))

(defn make-db
"Create an empty database"
[]
(atom (Database. [(Timestamped.
(initial-storage) ; storage
(make-index #(vector %3 %2 %1) #(vector %3 %2 %1) #(ref? %) 0) ; VAET - for graph queries and joins
(make-index #(vector %2 %3 %1) #(vector %3 %1 %2) #(not (not %)) 0) ; AVET - for filtering
(make-index #(vector %3 %1 %2) #(vector %2 %3 %1) #(not (not %)) 1) ; VEAT - for filtering
(make-index #(vector %1 %2 %3) #(vector %1 %2 %3) #(not (not %)) 2) )] ; EAVT - for filtering
0 0)))

(defn indices[] [:VAET :AVET :VEAT :EAVT])

(defn make-entity
"creates an entity, if id is not supplied, a running id is assigned to the entity"
([] (make-entity :db/no-id-yet ))
([id] (Entity. id {})))

(defn make-attr
"creation of an attribute. The name, value and type of an attribute are mandatory arguments, further arguments can be passed as named arguments.
The type of the attribute may be either :string, :number, :boolean or :db/ref . If the type is :db/ref, the value is an id of another entity and indexing of backpointing is maintained.
The named arguments are as follows:
:indexed - a boolean, can be either true or false - marks whether this attribute should be indexed. By defaults attributes are not indexed.
:cardinality - the cardinality of an attribute, can be either:
:db/single - which means that this attribute can be a single value at any given time (this is the default cardinality)
:db/multiple - which means that this attribute is actually a set of values. In this case updates of this attribute may be one of the following (NOTE that all these operations accept a set as argument):
:db/add - adds a set of values to the currently existing set of values
:db/reset-to - resets the value of this attribute to be the given set of values
:db/remove - removes the given set of values from the attribute's current set of values"
([name value type ; these ones are required
& {:keys [indexed cardinality] :or {indexed false cardinality :db/single}} ]
{:pre [(contains? #{true false} indexed) (contains? #{:db/single :db/multiple} cardinality)]}
(with-meta (Attr. name value -1 -1) {:type type :indexed indexed :cardinality cardinality} )))
260 changes: 260 additions & 0 deletions functionalDB/src/fdb/core.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
(ns fdb.core
[:use [fdb storage query constructs]]
[:require [clojure.set :as CS :only (union difference intersection)]])

(defn- collify [x] (if (coll? x) x [x]))

(defn- next-ts [db] (inc (:curr-time db)))

(defn- next-id
"returns a pair composed of the id to use for the given entity and the next free running id in the database"
[db ent]
(let [top-id (:top-id db)
ent-id (:id ent)
inceased-id (inc top-id)]
(if (= ent-id :db/no-id-yet)
[(keyword (str inceased-id)) inceased-id]
[ent-id top-id])))

(defn entity-at
"the entity with the given ent-id at the given time (defaults to the latest time)"
([db ent-id] (entity-at db (:curr-time db) ent-id))
([db ts ent-id] (stored-entity (get-in db [:timestamped ts :storage]) ent-id)))

(defn attr-at
"The attribute of an entity at a given time (defaults to recent time)"
([db ent-id attr-name] (attr-at db ent-id attr-name (:curr-time db)))
([db ent-id attr-name ts]
(get-in (entity-at db ts ent-id) [:attrs attr-name])))

(defn value-of-at
"value of a datom at a given time, if no time is provided, we default to the most recent value"
([db ent-id attr-name] (:value (attr-at db ent-id attr-name)))
([db ent-id attr-name ts] (:value (attr-at db ent-id attr-name ts))))

(defn- update-attr-value
"updating the attribute value based on the kind of the operation, the cardinality defined for this attribute and the given value"
[attr value operation]
(cond
(single? attr) (assoc attr :value #{value})
; now we're talking about an attribute of multiple values
(= :db/reset-to operation) (assoc attr :value value)
(= :db/add operation) (assoc attr :value (CS/union (:value attr) value))
(= :db/remove operation) (assoc attr :value (CS/difference (:value attr) value))))

(defn add-attr
"adds an attribute to an entity"
[ent attr]
(let [attr-id (keyword (:name attr))]
(assoc-in ent [:attrs attr-id] attr)))

(defn- update-creation-ts
"updates the timestamp value of all the attributes of an entity to the given timestamp"
[ent ts-val]
(reduce #(assoc-in %1 [:attrs %2 :ts ] ts-val) ent (keys (:attrs ent))))

(defn- add-entry-to-index [index path operation]
(if (= operation :db/remove)
index
(let [update-path (butlast path)
update-value (last path)
to-be-updated-set (get-in index update-path #{})]
(assoc-in index update-path (conj to-be-updated-set update-value) ))))

(defn- update-attr-in-index [index ent-id attr-name target-val operation]
(let [colled-target-val (collify target-val)
add-entry-fn (fn [indx vl] (add-entry-to-index indx ((from-eav index) ent-id attr-name vl) operation))]
(reduce add-entry-fn index colled-target-val)))

(defn- add-entity-to-index [ent timestamped ind-name]
(let [ent-id (:id ent)
index (ind-name timestamped)
all-attrs (vals (:attrs ent))
relevant-attrs (filter #((usage-pred index) %) all-attrs )
add-in-index-fn (fn [ind attr] (update-attr-in-index ind ent-id (:name attr) (:value attr) :db/add))]
(assoc timestamped ind-name (reduce add-in-index-fn index relevant-attrs))))

(defn- fix-new-entity [db ent]
(let [[ent-id next-top-id] (next-id db ent)
new-ts (next-ts db)]
[(update-creation-ts (assoc ent :id ent-id) new-ts) next-top-id]))

(defn add-entity [db ent]
(let [[fixed-ent next-top-id] (fix-new-entity db ent)
new-timestamped (update-in (last (:timestamped db)) [:storage] update-storage fixed-ent)
add-fn (partial add-entity-to-index fixed-ent)
new-timestamped (reduce add-fn new-timestamped (indices))]
(assoc db :timestamped (conj (:timestamped db) new-timestamped) :top-id next-top-id)))

(defn add-entities [db ents-seq] (reduce add-entity db ents-seq))

(defn- update-attr-modification-time [attr new-ts] (assoc attr :ts new-ts :prev-ts ( :ts attr)))

(defn- update-attr [attr new-val new-ts operation]
{:pre [(if (single? attr)
(contains? #{:db/reset-to :db/remove} operation)
(contains? #{:db/reset-to :db/add :db/remove} operation))]}
(-> attr
(update-attr-modification-time new-ts)
(update-attr-value new-val operation)))

(defn- remove-path-values [old-vals target-val operation]
(case operation
:db/add [] ; nothing to remove
:db/reset-to old-vals ; removing all of the old values
:db/remove (collify target-val))) ; removing the values defined by the caller

(defn- remove-entry-from-index [index path]
(let [path-head (first path)
path-to-items (butlast path)
val-to-remove (last path)
old-entries-set (get-in index path-to-items)]
(cond
(not (contains? old-entries-set val-to-remove)) index ; the set of items does not contain the item to remove, => nothing to do here
(and (= 1 (count old-entries-set) ) (= 1 (count (index path-head)))) (dissoc index path-head) ; a path representing a single EAV - remove it entirely
(= (count old-entries-set) 1) (update-in index [path-head] dissoc (second path)) ; a path that splits at the second item - just remove the unneeded part of it
:else (update-in index path-to-items disj val-to-remove))))

(defn- remove-entries-from-index [ent-id operation index attr]
(if (= operation :db/add)
index
(let [attr-name (:name attr)
datom-vals (collify (:value attr))
paths (map #((from-eav index) ent-id attr-name %) datom-vals)]
(reduce remove-entry-from-index index paths))))

(defn- update-index [ent-id old-attr target-val operation timestamped ind-name]
(if-not ((usage-pred (get-in timestamped [ind-name])) old-attr)
timestamped
(let [index (ind-name timestamped)
old-vals (:value old-attr)
attr-name (:name old-attr)
remove-paths (remove-path-values old-vals target-val operation)
cleaned-index (remove-entries-from-index ent-id operation index old-attr)
updated-index (update-attr-in-index cleaned-index ent-id attr-name target-val operation)]
(assoc timestamped ind-name updated-index))))

(defn- update-entity [storage e-id new-attr]
(assoc-in (stored-entity storage e-id) [:attrs (:name new-attr)] new-attr))

(defn- update-timestamped
[timestamped ent-id old-attr updated-attr new-val operation]
(let [storage (:storage timestamped)
new-timestamped (reduce (partial update-index ent-id old-attr new-val operation) timestamped (indices))]
(assoc new-timestamped :storage (update-storage storage (update-entity storage ent-id updated-attr)))))

(defn update-datom
([db ent-id attr-name new-val]
(update-datom db ent-id attr-name new-val :db/reset-to ))
([db ent-id attr-name new-val operation ]
(let [update-ts (next-ts db)
timestamped (last (:timestamped db))
attr (attr-at db ent-id attr-name)
updated-attr (update-attr attr new-val update-ts operation)
fully-updated-timestamped (update-timestamped timestamped ent-id attr updated-attr new-val operation)]
(update-in db [:timestamped] conj fully-updated-timestamped))))

(defn- remove-entity-from-index [ent timestamped ind-name]
(let [ent-id (:id ent)
index (ind-name timestamped)
all-attrs (vals (:attrs ent))
relevant-attrs (filter #((usage-pred index) %) all-attrs )
remove-from-index-fn (partial remove-entries-from-index ent-id :db/remove)]
(assoc timestamped ind-name (reduce remove-from-index-fn index relevant-attrs))))

(defn- reffing-datoms-to [e-id timestamped]
(let [vaet (:VAET timestamped)]
(for [[attr-name reffing-set] (e-id vaet)
reffing reffing-set]
[reffing attr-name e-id])))

(defn- remove-back-refs [db e-id timestamped]
(let [refing-datoms (reffing-datoms-to e-id timestamped)
remove-fn (fn[d [e a v]] (update-datom db e a v :db/remove))
clean-db (reduce remove-fn db refing-datoms)]
(last (:timestamped db))))

(defn remove-entity [db ent-id]
(let [ent (entity-at db ent-id)
timestamped (remove-back-refs db ent-id (last (:timestamped db)))
retimed-timestamped (update-in timestamped [:VAET] dissoc ent-id)
no-ent-timestamped (assoc retimed-timestamped :storage (remove-entity-from-storage (:storage retimed-timestamped) ent))
new-timestamped (reduce (partial remove-entity-from-index ent) no-ent-timestamped (indices))]
(assoc db :timestamped (conj (:timestamped db) new-timestamped))))

(defn transact-on-db [initial-db txs]
(loop [[tx & rst-tx] txs transacted initial-db]
(if tx
(recur rst-tx (apply (first tx) transacted (rest tx)))
(let [initial-timestamped (:timestamped initial-db)
new-timestamped (last (:timestamped transacted))]
(assoc initial-db :timestamped (conj initial-timestamped new-timestamped) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

(defmacro _transact [db op & txs]
(when txs
(loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []]
(if frst-tx#
(recur rst-tx# res# (conj accum-txs# (vec frst-tx#)))
(list* (conj res# accum-txs#))))))

(defn- _what-if
"Operates on the db with the given transactions, but without eventually updating it"
[db f txs] (f db txs))

(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))

(defmacro transact [db & txs] `(_transact ~db swap! ~@txs))

(defmacro q
"querying the database using datalog queries built in a map structure ({:find [variables*] :where [ [e a v]* ]}). (after the where there are clauses)
At the moment support only filtering queries, no joins is also assumed."
[db query]
`(let [query# (q-clauses ~(:where query)) ; extracting the query clauses from the query
ind# (choose-index ~db query#) ; selecting which index to use
q-res# (query-index ind# query#) ; actually quering, from now on just preparing results
binded-res# (bind-variables-to-query q-res# ind#) ; building a meta + real results structure
needed-vars# (symbol-col-to-set ~(:find query))] ; extracting out the needed to be reported variables
(map (partial locate-vars-in-query-res needed-vars# ) binded-res#))) ; picking the needed variables from the query result

(defn evolution-of
"The sequence of the values of an entity's attribute, as changed through time"
[db ent-id attr-name]
(loop [res [] ts (:curr-time db)]
(if (= -1 ts) (reverse res)
(let [attr (attr-at db ent-id attr-name ts)]
(recur (conj res {(:ts attr) (:value attr)}) (:prev-ts attr))))))

(defn db-before
"How the db was before a given timestamp"
[db ts]
(let [timestamped-before (subvec (:timestamped db) 0 ts )]
(assoc db :timestamped timestamped-before :curr-time ts)))

(defn outgoing-refs [db ts ent-id]
(if ent-id
(->> (entity-at db ts ent-id)
(:attrs) (vals) (filter ref?) (mapcat (comp collify :value)))
[]))

(defn- incoming-refs [db ts ent-id]
(let [vaet (ind-at db ts :VAET)]
(reduce into (vals (ent-id vaet)))))

(defn- remove-explored [pendings explored restruct-fn]
(if (contains? explored (first pendings))
(recur (rest pendings) explored restruct-fn)
(restruct-fn pendings)))

(defn- traverse [pendings explored out-reffing ent-at restruct-fn]
(let [cleaned-pendings (remove-explored pendings explored restruct-fn)
item (first cleaned-pendings)
next-pends (reduce conj (restruct-fn (rest cleaned-pendings)) (out-reffing item))]
(when item (cons (ent-at item)
(lazy-seq (traverse next-pends (conj explored item) out-reffing ent-at restruct-fn))))))

(defn traverse-db
([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db)))
([start-ent-id db algo direction ts]
(let [pend-struct (if (= :bfs algo) vec list*)
explor-fn (if (= :outgoing direction) outgoing-refs incoming-refs)]
(traverse [start-ent-id] #{} (partial explor-fn db ts) (partial entity-at db ts) pend-struct))))
Loading

0 comments on commit 66decfb

Please sign in to comment.