Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #381 Reduce the number of Eval calls #396

Merged
merged 15 commits into from
Aug 13, 2018

Conversation

EthanEChristian
Copy link
Contributor

I have changed both the Durability and Compiler evaluation of functions that are embedded in the rulebase, as part of this I also gave AlphaNodes an Id.

I have also added some performance tests that generate a simple rulebase and compile, serialize and deserialize the rulebase. Running performance locally (lein test :performance )
gave me a before of:

Running Performance tests for:
Generated Session Compilation
==========================================
Mean: 13412.32ms
Standard Deviation : 5924.659171429188ms

Running Performance tests for:
Generated Session serialization
==========================================
Mean: 205.94ms
Standard Deviation : 85.99474635115799ms

Running Performance tests for:
Generated Session deserialization
==========================================
Mean: 29966.56ms
Standard Deviation : 12047.334043945159ms

and an after of:

Running Performance tests for:
Generated Session Compilation
==========================================
Mean: 2682.92ms
Standard Deviation : 108.87604695248626ms


Running Performance tests for:
Generated Session serialization
==========================================
Mean: 223.9ms
Standard Deviation : 11.80211845390479ms


Running Performance tests for:
Generated Session deserialization
==========================================
Mean: 1683.98ms
Standard Deviation : 87.03987362123178ms

I chose to persist the functions on the rulebase, this allows for the functions to be easily pulled off for serialization and deserialization. Allowing for minimal traversal of the already constructed rulebase, memory usage seems comparable to what it was before and the performance numbers speak for themselves.

@WilliamParker @mrrodriguez @rbrush
When you get some time, could you look over this?

[type (with-meta
(eng/->AlphaNode env beta-children alpha-fn)
{:alpha-expr (:alpha-expr (meta alpha-map))})])
[type (eng/->AlphaNode id env beta-children alpha-fn)])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn here, i was temped to add the AlphaNodes to the id-to-node since alpha nodes have ids now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'm not sure what utility there would be in it. It may be useful at some point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it could be useful for debugging and REPL exploration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there seems to be value, i will add them

{:std (double std)
:mean (double mean)}))

(defn run-performance-test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rolled my own here, rather than using a library like criterium because criterium makes assumptions based on the classes loaded and compiled. This lead to infinite looping.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"because criterium makes assumptions based on the classes loaded and compiled" => Could you elaborate? What sorts of assumptions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker I agree the comment should explain it so we don't forget why.

I am fairly sure the rationale behind this though is that criterium tries to warm up the JVM by stabilizing the JIT and GC metrics first prior to timing. I believe that class loading is one of the metrics that criterium waits to see become more stable.
This performance test intentionally is creating classes on the fly via eval, so that conflicts with what criterium is trying to ignore.

Ethan may have a clearer description of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker as @mrrodriguez mentioned criterium does try to warm up for JIT, here. Again as Mike says, it seems that criterium is looking for a stable state before running benchmarks. While ideal for most situations, compilation in this case will never reach a steady state. Effectively making the test a really expensive while (true) block.

I will probably put a comment on this function mentioning the reasoning.

@EthanEChristian
Copy link
Contributor Author

Issue #381


record-holder (ArrayList.)
;; The rulebase should either be given from the base-session or found in
;; the restored session-state.
maybe-base-rulebase (when (and (not rulebase-only?) base-rulebase)
base-rulebase)

compilation-partition-size (or compilation-partition-size 1250)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

significance of 1250 here? what is it?

Copy link
Contributor Author

@EthanEChristian EthanEChristian Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its pretty arbitrary, originally i had 1k but on testing i was able to load a larger rulebase with 1.25k.
Honestly, it could probably be a bit higher, it would just come down to the size of the functions being compiled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to say what 1250 is; bytes? A comment as to its arbitrary nature would be good as well so the next one through feels better if it needs to be altered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer accurate, this has changed

"Takes a map in form produced by extract-exprs and evaluates the values(expressions) of the map in a batched manner.
This allows the eval calls to be more effecient, rather than evaluating each expression on its own."
[key->expr :- {(schema/tuple sc/Int sc/Keyword) schema/SExpr}
partiton-size :- sc/Int]
Copy link
Contributor Author

@EthanEChristian EthanEChristian Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor spelling mistake, i'll clean that up

Copy link
Collaborator

@mrrodriguez mrrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just went through the testing for the most part. Working on the other parts.

(println description)
(println "==========================================")
(println (str "Mean: " mean "ms"))
(println (str "Standard Deviation : " std "ms" \newline))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency, do "Deviation:" instead of "Deviation :"

:let [next-fact (symbol (str seed-sym "prime"))
production (assoc base-production
:lhs [{:type (keyword seed-sym)
:constraints ['(= this ?binding) `(filter-fn ~'this)]}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use syntax quote so that = is expanded. Even though Clara will accept the bare = symbol in many places, I think it is a bad practice due to it being able to be shadowed. Clara should be using clojure.core/=, not a bare =.

`(= ~'this ~'?binding)

This applies elsewhere in this ns too.

[clara.rules.durability.fressian :as dura])
(:import [java.io ByteArrayOutputStream ByteArrayInputStream]))

(defn filter-fn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this would work. gensym should return a new and unique symbols every time it is called. So this filter-fn will always be false.

If that is intentional, some sort of explanation doc string would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these rules never fire, this is a placeholder.

(for [seed-sym seed-syms
:let [next-fact (symbol (str seed-sym "prime"))
production (assoc base-production
:lhs [{:accumulator '(clara.rules.accumulators/all)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clara.rules.accumulators needs to be required explicitly by the ns running these rules, which I believe is this one. There shouldn't be assumptions about other namespaces already requiring your direct dependencies. That can lead to compilation-order dependent runs that fail occasionally.

(into {}
(for [seed-sym seed-syms
:let [production (assoc base-production
:lhs [{:type [seed-sym]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the :type wrapped in a vector here? I think it should be :type seed-sym.

Copy link
Contributor Author

@EthanEChristian EthanEChristian Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, (keyword seed-sym), but you are right

(testing "Session serialization performance"
(run-performance-test
{:description "Generated Session serialization"
:func #(.serialize (dura/create-session-serializer (ByteArrayOutputStream.))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't interact with the API via the impl detail that protocols may expose interfaces.
This should be called via the protocol fn var.

[clara.rules.durability :as dur]
...

(dur/serialize ...)

Also, I'd recommend not calling [clara.rules.durability.fressian :as dura] since that is confusing.
Perhaps, [clara.rules.durability.fressian :as fres] or something

(run-performance-test
{:description "Generated Session deserialization"
:func #(-> (dura/create-session-serializer (ByteArrayInputStream. session-bytes))
(.deserialize session-bytes {:rulebase-only? true}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, call the protocol fn via it's var, not impl detail Java interop.

(run-performance-test
{:description "Generated Session deserialization"
:func #(-> (dura/create-session-serializer (ByteArrayInputStream. session-bytes))
(.deserialize session-bytes {:rulebase-only? true}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is :rulebase-only? only used in the last deserialize case? Shouldn't it be set for all serialize and deserialize calls?


(let [session (com/mk-session rules)
os (ByteArrayOutputStream.)]
(testing "Session serialization performance"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be more appropriate: "Session rulebase durability performance". This isn't testing the memory serialization aspects.

Copy link
Collaborator

@mrrodriguez mrrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've went through all but
extract-exprs
and compile-exprs (big ones).

However, without those, I can see the picture of how this is working out. I will soon go through those. I am sure I'll have questions/comments there as well.

@@ -563,6 +550,11 @@
the session was serialized without the rulebase, i.e. :with-rulebase? = false, so it needs a rulebase
to be 'attached' back onto it to be usable.

* :compilation-partition-size - The maximum number of expressions that will be evaluated per call to eval.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if :forms-per-eval would be more revealing of a name. I'm not positive that I love this name.

It is also just an unfortunate configuration option. It's fairly hard to have any good idea what this number should be. I guess if it needs to be tweaked, it'd be via trial/error.

get-alphas-fn])
get-alphas-fn
;; A map of [node-id field-name] to function.
node-identifier-to-fn])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more consistent name would be node-id-to-fn. This is a weird clash with id-to-node above.
I guess you are trying to be more explicit that the id is represented as a [node-id field-name], but I don't think that "identifier" captures that since that's what "id" is an abbreviation for already.

Perhaps abandon the naming schema for this and go with something like node-expr-fn-lookup. I don't think that all associated structures have to be named in a way that explicitly describes the structure of their key-val pairs.

condition)]
condition)

retrieve-fn (fn [id field] (get id-key->expr [id field]))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended rename: compiled-expr-fn. This is too generic and confusing to read later on.

@@ -1263,105 +1439,70 @@
id :- sc/Int
is-root :- sc/Bool
children :- [sc/Any]
compile-expr ; Function to do the evaluation.
id-key->expr :- {(schema/tuple sc/Int sc/Keyword) IFn}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as I mentioned above in this ns, perhaps calling it expr-fn-lookup is more meaningful. In this local scope, it is less important that where I mentioned above though.


;; Sort the ids to compile based on dependencies.
[{:keys [id-to-production-node id-to-condition-node id-to-new-bindings forward-edges backward-edges]} :- schema/BetaGraph
id-key->expr :- {(schema/tuple sc/Int sc/Keyword) IFn}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same renaming suggestions as above.

[type (with-meta
(eng/->AlphaNode env beta-children alpha-fn)
{:alpha-expr (:alpha-expr (meta alpha-map))})])
[type (eng/->AlphaNode id env beta-children alpha-fn)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'm not sure what utility there would be in it. It may be useful at some point.

@@ -1677,7 +1775,8 @@
fact-type-fn
ancestors-fn
activation-group-sort-fn
activation-group-fn]
activation-group-fn
expressions]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place I'd consider renaming as mentioned above. expressions is probably the most vague.

env (assoc :env env))
{:alpha-expr alpha-expr})))
[alpha-nodes :- [schema/AlphaNode]
id-key->expr :- {(schema/tuple sc/Int sc/Keyword) IFn}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same renaming suggestions as above.

productions)))
productions))

(sc/defn extract-exprs :- {(schema/tuple sc/Int sc/Keyword) schema/SExpr}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema is referred to a lot. It'd be helpful to extract it with a s/defschema so it is easier to read through the many places it is used.

Suggested NodeExprFnLookup

@EthanEChristian
Copy link
Contributor Author

I forgot to update the changelog, so i will be adding that shortly as well

condition)]
(case (or (:node-type beta-node)
;; If there is no :node-type then the node is the ::root-condition
;; however, creating a case for nil could potentially casue weird effects if something about the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "casue" -> "cause"

(:id-to-condition-node beta-graph))))

(sc/defn compile-exprs :- schema/NodeFnLookup
"Takes a map in form produced by extract-exprs and evaluates the values(expressions) of the map in a batched manner.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: "in form produced" -> "in the form produced"


(sc/defn compile-exprs :- schema/NodeFnLookup
"Takes a map in form produced by extract-exprs and evaluates the values(expressions) of the map in a batched manner.
This allows the eval calls to be more effecient, rather than evaluating each expression on its own."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a line after this should say
"See #381 for more details."

;; Grouping by ns, most expressions will not have a defined ns, only expressions from production nodes.
;; These expressions must be evaluated in the ns where they were defined, as they will likely contain code
;; that is namespace dependent.
(for [[ns pairs] (group-by (comp :ns meta key) key->expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Maybe avoid naming confusion here, call this [ns group]
and then below bind the partition-all result to expr-batch.

It was a bit confusing to read pairs so much and also not that informative of what it is.

;; Partitioning the number of forms to be evaluated, Java has a limit to the size of methods if we were
;; evaluate all expressions at once it would likely exceed this limit and throw an exception.
pairs (partition-all partition-size pairs)
:let [node-keys (map #(nth % 0) pairs)]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you avoiding using first here for some reason? If it is perf, I believe that pairs in this line is a seq of map entries. If that is the case, key is the fastest.

;; evaluate all expressions at once it would likely exceed this limit and throw an exception.
pairs (partition-all partition-size pairs)
:let [node-keys (map #(nth % 0) pairs)]]
(mapv (fn [node-key expr]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is mapv vector, the anonymous fn just reimplements it's 2-arity case.

(with-bindings (if ns
{#'*ns* (-> ns symbol the-ns)}
{})
(batching-try-eval node-keys (mapv #(nth % 1) pairs))))))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, second, or better, val if it is always a map entry.

;; If none of the rules are the issue, it is likely that the
;; size of the code trying to be evaluated has exceeded the limit
;; set by java.
(throw (ex-info "The batching size of expressions has exceeded the limit set by java."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex-info has an 3-arity version designed to take the cause, ie ex-info [msg map cause]

So

(ex-info "The batching size of expressions has exceeded the limit set by java." {} e)

I'm not sure the :original-exception will add value. Perhaps if that exception has ex-data to include. Also, it may be useful to put the node-keys in this map.

;; Grouping by ns, most expressions will not have a defined ns, only expressions from production nodes.
;; These expressions must be evaluated in the ns where they were defined, as they will likely contain code
;; that is namespace dependent.
(for [[ns pairs] (group-by (comp :ns meta key) key->expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group-by will result in a non-deterministic order here. It may be best (for debugging perhaps), to be deterministic.
Perhaps a sort-by key would be reasonable to add?

;; 1250 is an arbitrary number, this could be lower or higher depending on the
;; rulebase that is being compiled, with regards to the average size of the
;; forms being evaluated.
forms-per-eval (:forms-per-eval options 1250)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have the 1250 in a var
forms-per-eval-default somewhere and refer to that in all the places that you currently refer to 1250. That way the comments and the compiler vs deserialization impl doesn't get out of sync.

Also, how did you come up with 1250? Is this a threshold you found to work well in practice? It does depend on size of expressions of course.

As an aside to this discussion:
I think ideally, rules would be written in a way where there were not large inline code blocks in the LHS conditions, accumulators, and/or the RHS action forms.

I know from experience that people sometimes tend to make really large RHS forms though (I don't recommend it).

e.g.

Bad

(defrule my-rule
 
=>
(let [<lots of bindings and logic>]
   (insert! some-result))

Good
Good

(defrule my-rule
 
=>
;; `do-bigger-logic-fn` is evaluated under normal compilation conditions (better).
;; Also, things can be broken up and made more readable by naming functions around them.
;; Also, these pure fn's can be independently verified/tested.
(let [some-result (do-bigger-logic-fn <bindings needed from LHS>)]
   (insert! some-result))

If rules were written this way, I'd think the :forms-per-eval could be quite a bit higher. I realize that'd be a breaking change for impl's that have abused the RHS by making huge action expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EthanEChristian and I talked a lot about this offline. It actually may be the case that inline forms don't have an effect on the :forms-per-eval. This is nice if it proves to be the case.

Basically, the reason is that batching-try-eval eval's a vector of expressions of a fn form.
e.g.

[(fn [x] ...) (fn [y] ...) ...]

These each get their own classfile, so the big method size ends up just being in the method that creates this vector of fn references. There is constant overhead in terms of bytecodes for this method then. That constant overhead can be multiplied by teh :forms-per-eval to get a close approx of the correct value to default to. Keeping it configurable is likely still a good idea, due these sizes being possibly variable across Clj compiler versions, etc.

We can make sure to add those details here shortly once it is completely validated.

@WilliamParker
Copy link
Collaborator

@EthanEChristian @mrrodriguez I'll try to take an initial look at this over the weekend.

@EthanEChristian EthanEChristian changed the title Issue 381: Reduce the number of Eval calls Issue #381: Reduce the number of Eval calls Jun 22, 2018
@EthanEChristian EthanEChristian changed the title Issue #381: Reduce the number of Eval calls Issue #381 Reduce the number of Eval calls Jun 22, 2018
Copy link
Collaborator

@WilliamParker WilliamParker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took an initial look over this, I'll probably have more to say later.

project.clj Outdated
@@ -68,8 +68,15 @@
;; perhaps because Leiningen is using this as uneval'ed code.
;; For now just duplicate the line.
:test-selectors {:default (complement (fn [x]
(some->> x :ns ns-name str (re-matches #"^clara\.generative.*"))))
:generative (fn [x] (some->> x :ns ns-name str (re-matches #"^clara\.generative.*")))}
(let [backlisted-packages #{"generative" "performance"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: "backlisted" => blacklisted

:let [{:keys [type constraints fact-binding args]} condition]]

{:type (com/effective-type type)
{:id id
:type (com/effective-type type)
:alpha-fn (com/compile-condition type (first args) constraints fact-binding env)
:children (vec beta-children)})))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that we build a map here and then use it to construct alpha nodes later in assemble-session while building the beta nodes directly in this namespace, but changing that is probably out of scope here. Maybe there's a reason for this that I'm not remembering offhand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that is a little weird the only thing i think that is forcing this is, the access to the compiled beta nodes. At this point the beta-children are just ids of the nodes that will be constructed later.

(:require [schema.core :as s])
#?(:clj
(:import [clojure.lang
IFn])))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: You could use the ifn? predicate instead if you want to avoid the need for conditionals. Looks like that has been in ClojureScript since 2012: commit

type (first args) constraints
fact-binding env)
;; Remove all metadata but file and line number
;; to protect from evaluating usafe metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: "usafe" => "unsafe"

:msg "compiling alpha node"}})))
{}
alpha-graph)
exprs (reduce-kv (fn [prev id production-node]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a more specific name here than "exprs" would be helpful. This will be a map containing expressions if I'm understanding correctly?

(fn [prev id beta-node]
(let [condition (:condition beta-node)
condition (if (symbol? condition)
(.loadClass (clojure.lang.RT/makeClassLoader) (name condition))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that anything that is a symbol is a class seems likely to be a bug - we'll need to test it though. Looks like there is some existing logic along these lines. Also will we enter this path from ClojureScript at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is previous functionality, just shuffled around to work in my new approach. It does seem a little questionable, but i assumed that maintaining the old way would be better than changing functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be fine to leave as is with regards to this issue. I have a suspicion there is a way to break this (as in make it throw ClassNotFoundException) but I don't think this change would make it any more broken since the assumption is the same before and after. We can look at this separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resolve would be more appropriate, however, resolve has a defect that can throw CNFE as well.
https://dev.clojure.org/jira/browse/CLJ-1403

this may be fixed in 1.10, but that is future thinking.

Even if resolve returns nil as expected here, handling the nil/exception case would be nicer than letting it bubble up without much context.

I agree though that this is not related to the work of this PR.

;; If none of the rules are the issue, it is likely that the
;; size of the code trying to be evaluated has exceeded the limit
;; set by java.
(throw (ex-info "The batching size of expressions has exceeded the limit set by java."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to say the likely cause in an exception message, but IMO we shouldn't state it with quite so much certainty. For example, we could have a bug in our implementation that causes such an exception.

(try
(eval exprs)
(catch Exception e
(mapv (fn [expr node-keys]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doseq" seems more in line with your intent here since this return value isn't used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used mapv to keep from having to zip the 2 lists back together, if you feel strongly i can use doseq instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, the anonymous function has two args. Never mind, this is OK and it is a minor deal anyway. It might be worth adding a quick comment that we want mapv rather than map to avoid laziness.

Do we have an existing test that validates this "rethrow if an individual expression actually is not compilable" workflow? If not we should probably add one.

;; anyway.
(try
(eval exprs)
(catch Exception e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are scenarios where the compilation could throw an Error other than a VirtualMachineError (which it wouldn't make sense to try to recover from, see the docs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch Exception shouldn't catch these sorts of errors anyway right?

But you do make a good point, InterruptedException would be a good candidate to not catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: determine what exceptions to not catch

Copy link
Collaborator

@mrrodriguez mrrodriguez Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why you shouldn't catch all exceptions and rethrow here. I think that the message needs to be clear that it could only "possibly" be a problem with compilation of the expression. However, I don't think it is beneficial to rethrow other exceptions in their "raw" form . It is more useful to wrap the exception and provide it with what context you can. Attach the root cause exception though so it is bubbles up.

More context should always be a better thing in this sort of case.

I think leaving these more critical JVM errors alone is a good idea though. They are a lower-level concern and are in a separate hierarchy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually musing on whether there are any Error types that we should catch, with the idea being to rethrow them with the specific expression that caused them rather than a gigantic collections that probably isn't too useful to the user. I don't know that the traditional assumptions about not recovering from Error necessarily apply to a dynamic code generation and compilation framework - I particularly had LinkageError in mind. We definitely wouldn't want to catch any VirtualMachineError instances though since the JVM's behavior when you do is specifically called out as undefined by the spec.

In practice this probably isn't a huge deal and I don't want to block this on investigating it - I'd expect most bad cases to throw Exception. Maybe just a followup issue to think about later.

[type (with-meta
(eng/->AlphaNode env beta-children alpha-fn)
{:alpha-expr (:alpha-expr (meta alpha-map))})])
[type (eng/->AlphaNode id env beta-children alpha-fn)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it could be useful for debugging and REPL exploration.

(-> rdr
(read-record d/add-alpha-fn)
d/clj-record-holder-add-fact!)))}}
(create-cached-node-handler AlphaNode
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that AlphaNodes have ids we should be able to leverage the cached-node-handlers

@@ -2748,3 +2752,26 @@
(fire-rules))]

(is (= [{:?x inner-one}] (query session test-query)))))

(deftest test-maximum-forms-per-eval
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker, A test demonstrating the batch eval failure with a method size too large.

This test is a little slow, i was considering putting it in the performance tests, but i would be worried that those test would not be run and we might miss changes to compilation that might break the limit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this take to run? Perhaps it should be in another test file - it would still run during Leiningen builds but not as part of the test-rules namespace. I've been trying to move away from having everything in test-rules from a code organization POV anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test takes ~41 secs while running in a REPL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, that is quite a long-running test. Perhaps we could put it in a "long_running_tests" file that wouldn't be part of a normal REPL workflow; if there are other instances of things like this it might be worthwhile to create a test profile that is run in Travis but exclude the file from "lein test" runs.

It also seems like this could assert on the exception message added about message size too large? Or by the exception info, you could add a key if needed. It would be pretty simple to just recursively get all the messages and make sure one of them matches a regex.

(for [{:keys [id condition beta-children env] :as node} alpha-nodes]
(cond-> {:id id
:type (effective-type (:type condition))
:alpha-fn (get expr-fn-lookup [id :alpha-expr])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest failing here if the map doesn't contain the value. Perhaps just a function like:

(defn safe-get
    [m k]
    (let [v (get m k ::not-found)]
        (if (= v ::not-found)
              (throw (ex-info "Key not found with safe-get") {:map map :key k})
              v))))

Since this is at compile time the cost should be quite minimal and it would be useful to fail quickly if there was a problem with the expression lookup.

Applies to compile-node as well.

@@ -563,6 +550,11 @@
the session was serialized without the rulebase, i.e. :with-rulebase? = false, so it needs a rulebase
to be 'attached' back onto it to be usable.

* :forms-per-eval - The maximum number of expressions that will be evaluated per call to eval.
This will be defaulted to 1250 if not provided. Larger batch sizes should see better performance compared
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be defaulted to 1250 if not provided.

Out of date I think - didn't you change this to 5000?

;; Grouping by ns, most expressions will not have a defined ns, only expressions from production nodes.
;; These expressions must be evaluated in the ns where they were defined, as they will likely contain code
;; that is namespace dependent.
(for [[ns group] (sort-by key (group-by (comp :ns meta key) key->expr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not shadow the clojure.core/ns macro here.

;; easier because the metadata contains unevalable forms.
(with-meta [id expr-key] (assoc metadata expr-key s-expr))
s-expr))
is-root? #(= #{0} (get backward-edges %))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a defined function and use everywhere we have this check?

;; set by java.
(throw (ex-info (str "There was a failure while batch evaling the node expressions, " \newline
"but wasn't present when evaling them individually. This likely indicates " \newline
"that the method size exceeded the maximum set by java, see the cause for the actual error.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is a limitation set by the JVM, not by Java, which aren't quite the same thing; Clojure compiles to bytecode, not an intermediate representation in Java.

get-alphas-fn])
get-alphas-fn
;; A map of [node-id field-name] to function.
node-expr-fn-lookup])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not have a schema on this or was it just an oversight?

;; however, creating a case for nil could potentially cause weird effects if something about the
;; compilation of the beta graph changes. Therefore making an explicit case for ::root-condition
;; and if there was anything that wasn't ::root-condition this case statement will fail rather than
;; failing somewhere else.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to error checking like this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like failing instead of waiting to later and being broken. However, I think a it'd be nicer to have that default case at the end that threw a more meaningful exception than "no branch of case matched".

eg

(case x
 y :foo
 z :bar
 (throw (ex-info "explanation"))

However, this isn't a critical change for the sake of this review. It's more of a thought about trying to give more meaningful compilation failure errors in the future.

@WilliamParker
Copy link
Collaborator

Looked over the compiler changes and left some comments, but overall looks good @EthanEChristian

I haven't been through the durability changes yet.

Copy link
Collaborator

@WilliamParker WilliamParker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid functionally - I've looked over the whole PR now - but with some maintenance concerns I'd like to address. My primary concerns are:

  • The reliance of durability on metadata that isn't part of the schemas.
  • The potentially unreliable use of identical? in the safe-get function.

(ThreadLocal.))

(defn- add-node-fn [node fn-key meta-key]
(assoc node
fn-key
((.get compile-expr-fn) (:id node) (meta-key (meta node)))))
(get (.get node-fn-cache) [(:id node) meta-key])))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "meta-key" seems like more of an "expr-key" now. Previously the argument name some sense as it was looked up on the metadata, but that is no longer the case correct?

(assoc id->expr
;; Letting the key carry all of the metadata, this makes the batch evaluation of the expressions
;; easier because the metadata contains unevalable forms.
(with-meta [id expr-key] (assoc metadata expr-key s-expr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of having the metadata carry fundamental semantic meaning of the data (without which durability would be broken) that isn't reflected in the NodeExprLookup schema seems like a suboptimal practice. I'd like to at least have the metadata expectations reflected in the schema. Alternatively you could have the key map to a tuple, map, etc. containing both the values, but my primary concern here is that as someone reading the code, I'd expect the schema to be a complete description of the data, or at least the mandatory parts of the data. This seems to break that expectation.

because the metadata contains unevalable forms.

Are you referring to the issues discussed in #243 ? If so could you reference it in the comment?

@@ -2748,3 +2752,26 @@
(fire-rules))]

(is (= [{:?x inner-one}] (query session test-query)))))

(deftest test-maximum-forms-per-eval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, that is quite a long-running test. Perhaps we could put it in a "long_running_tests" file that wouldn't be part of a normal REPL workflow; if there are other instances of things like this it might be worthwhile to create a test profile that is run in Travis but exclude the file from "lein test" runs.

It also seems like this could assert on the exception message added about message size too large? Or by the exception info, you could add a key if needed. It would be pretty simple to just recursively get all the messages and make sure one of them matches a regex.

blacklisted-packages)]
(some->> x :ns ns-name str (re-matches (re-pattern (apply str patterns)))))))
:generative (fn [x] (some->> x :ns ns-name str (re-matches #"^clara\.generative.*")))
:performance (fn [x] (some->> x :ns ns-name str (re-matches #"^clara\.performance.*")))}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this live in the main Clara project seems reasonable to me, but there is an existing (old) clara-benchmark project. FYI @rbrush in case you have suggestions, objections, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of including this in the main Clara project. The test selectors are a nice solution to performance concerns here.

@@ -31,7 +31,9 @@
Accumulator
ISystemFact]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problem here on review.

function will throw an exception."
[m k]
(let [v (get m k ::not-found)]
(if (identical? v ::not-found)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the reason you used identical? instead of = here was for performance? @EthanEChristian : I realize that as we've discussed previously in another (non-Clara) context it is currently part of Clojure's implementation that equal keywords are identical, and so when comparing two things that are always keywords using identity checks is a way to gain marginally better performance. However, the number of operations here is just the number of nodes in the rules session compiled, and then only at compile-time, correct? I doubt that that will be any kind of serious problem for a number of checks (at compile time) on the order of thousands or tens of thousands, and this pattern isn't a part of Clara's codebase yet. As is, this introduces a maintenance burden of keeping track of whether Clojure has changed this; I don't believe that property is part of any API guarantees.

In this particular case, I think we can resolve this one of two ways

  • Using clojure.core/=. I don't think this is likely to have any meaningful performance cost, but if you have data to the contrary please post.
  • Creating a let-binding for the keyword so that it actually is guaranteed to be identical by Clojure's public contracts and commenting with the reasoning.

i.e.

(defn safe-get
  "A helper function for retrieving a given key from the provided map. If the key doesn't exist within the map this
   function will throw an exception."
  [m k]
  (let [not-found-keyword ::not-found
        v (get m k not-found-keyword)]
    (if (identical? v not-found-keyword)
      (throw (ex-info "Key not found with safe-get" {:map map :key k}))
      v)))

@EthanEChristian
Copy link
Contributor Author

@WilliamParker, i can see what you mean with the metadata.
I have changed the implementation to not rely on metadata, instead the lookups now us a tuple as the value, making the lookup something like:

{[Int Keyword] [SExpr/Fn CompilationContext]}

where CompilationContext is what the metadata used to be.

Copy link
Collaborator

@WilliamParker WilliamParker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted one code comment that I think is out of date with the latest changes but otherwise this looks good. +1 to merge once that is addressed.

@mrrodriguez you reviewed this a while ago, is there anything left you wanted to address?

@@ -551,12 +554,12 @@
;; In this case there is nothing to do with memory, so just serialize immediately.
(if (:rulebase-only? opts)
;; The keys of the node-fns should contain enough data to reconstruct the entire map.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;; The keys of the node-fns should contain enough data to reconstruct the entire map.

I think this comment is outdated with the latest changes.

[k (-> k meta (get (nth k 1)))])))
(for [[node-key compilation-ctx] expr-lookup]
[node-key [(-> compilation-ctx (get (nth node-key 1)))
compilation-ctx]])))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the line and file info will be included with the serialized rulebase correct? That will marginally increase the space consumed by the rulebase, but the size of sessions won't be impacted so I think this is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the :compile-ctx will be there too, such as :condition and :production. Previously this information was discarded after compiling the rulebase the first time. We then were making the assumption that it would recompile after deserialization. This way if for some reason a fn cannot be recompiled, it should give better context to what condition/production the function was being compiled for.

[m k]
(let [v (get m k ::not-found)]
(if (= v ::not-found)
(throw (ex-info "Key not found with safe-get" {:map map :key k}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be :map m, not :map map. You are going to return the clojure.core/map fn as it is written.

function will throw an exception."
[m k]
(let [v (get m k ::not-found)]
(if (= v ::not-found)
Copy link
Collaborator

@mrrodriguez mrrodriguez Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @WilliamParker 's point:

The potentially unreliable use of identical? in the safe-get function.

The likelihood of this changing in clj is very slim. It is even slimmer that it'd change for keyword literals. However, if choosing to remain paranoid for a change like that to happen, I still think an identical?-based sentinel test is what would typically be done in this sort of case

(let [not-found ::not-found
       v (get m k not-found)]
  (if (identical? v not-found) <etc>))

Alternatively, I've even seen the more aggressive, guaranteed-unique sentinel used:

(let [not-found (Object.)      ;; <---- unique object identity
       v (get m k not-found)]
  (if (identical? v not-found) <etc>))

However, the keyword one has some potential to be "faster" since the keyword may be constructed as a static constant of the surround fn class, while the Object constructor may be called repeatedly.

You could work around this by getting even more intense and doing:

(def ^:private sentinel-value (Object.))

.... later ....
(let [
       v (get m k sentinel-value)
  (if (identical? v sentinel-value) <etc>))

However, there is more fun here that you now have a Var.getRawRoot in your runtime path; could be worked around with direct-linking enabled etc, but at that point, the compile-time consant keyword compilation is looking more appealing. :P

The namespaced keyword is a generally accepted safe clj idiom though.

Either way, in a fn like safe-get, I'd lean towards using an approach where identical? is used instead of =, which can be done by just using the same references in both places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that the likelihood of it changing is low, but not entirely trivial. I'd rather not take the risk when there is an easy way not to though. I'm OK with using the same reference to enforce identity as you describe.

@@ -1751,6 +1874,37 @@
productions
(throw (ex-info (str "Non-unique production names: " non-unique) {:names non-unique})))))

(def forms-per-eval-default
;; The default max number of forms that will be evaluated together as a single batch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the analysis. The only thing that is odd to me is you used inline comments instead of making this a doc string of the var. I don't suppose it matters much though.

Copy link
Collaborator

@mrrodriguez mrrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me still.

@EthanEChristian
Copy link
Contributor Author

EthanEChristian commented Jul 28, 2018

@WilliamParker @mrrodriguez ,
I think everything has been addressed. 🎉

If you guys agree, i will probably merge this in sometime over the weekend.

Copy link
Collaborator

@WilliamParker WilliamParker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to merge

Creating and populating the next 5343 elements of the array: 64,116B
Creating the vector and the return statement: 4B

This sums to 65,527B just shy of the 65,536B method size limit."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but this increases the importance of setting up our tests against all relevant Clojure versions in Travis since potentially different/future Clojure versions could have differences here. That is really something we should be doing anyway though.

(let [not-found ::not-found
v (get m k not-found)]
(if (identical? v not-found)
(throw (ex-info "Key not found with safe-get" {:map m :key k}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this approach to enforcing identity equality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants