-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clara.rules.platform/group-by-seq used in engine may cause incorrect semantics #393
Comments
This bug was found and reported by dominicm (@SevereOverfl0w) in the #clara Slack channel |
@WilliamParker pointed out to me that the This would just require that the perf is tested to see if this extra object wrapping allocation/gc was acceptable for perf or not. |
+1 to @mrrodriguez 's diagnosis, and I agree that this looks like a bug. We've used Java wrappers around Clojure types to avoid this issue in the past; see AlphaRootsWrapper in the compiler. Not addressing it in group-by-seq looks like a miss. It seems like a similar strategy would work here, provided that the performance impacts are acceptable. This is a hotspot, so it is possible that they would be problematic, but allocation and GC is something that is very heavily optimized by the JVM so it would likely be OK. It is possible that an implementation would need some tweaking to take maximum advantage of such optimizations. Some options to explore for improving the performance of a wrapper would include:
However, if the simple approach of just adding a wrapper doesn't degrade performance too much I'd suggest just doing that. |
Agreed that simply wrapping this with something like AlphaRootsWrapper is our best bet here. Note that the underlying LinkedHashMap implementation already allocates at least one object per map entry, so we're not adding a significant allocation source where the previous was none. And as Will points out the JVM is very optimized for this. |
Anyone working on this one? |
Sorry for being slow here. This is on my to-do list, but unfortunately between traveling and other demands I haven't gotten back to it. I'll see if I can pick it up within the next week...I don't expect it to be a significant change. |
The immediate issue is fixed in PR #397 here. I am wondering if this goes far enough, though. For instance, the tuned-group-by function would be prone to a similar issue, although that is in a hot code path and the current issue would only be a problem if the fact type itself is a Clojure record or map. I don't mind merging this fix and we can consider whether to go after these other cases as well, since the current PR is a step forward if nothing else. |
It may make sense to do them as 2 separate issues. The |
Alright, I went ahead and merged the immediate fix and logged #398 to consider the tuned-group-by -- which I agree isn't an immediate need. I'm also happy to push a fix release for this if that would be helpful. |
@rbrush
I'm fine with anything -- only wondering how you would approach releasing this change :) |
@alex-dixon I agree that an alpha release is probably warranted here to make it easy to try out. There doesn't seem to be a measurable impact on complex rules (since this isn't the dominant time sink), but a trivial (and probably non-representative) microbenchmark test I have that inserts and queries lots of facts repeatedly does show about a 20% hit. These microbenchmarks can be misleading since this 20% penalty may only apply to 1% of the execution time of a realistic rule engine...but it raises enough concern that an alpha preview seems justified. |
@rbrush is this microbenchmark something you can post? Maybe it is something we should add to the performance benchmarking build @EthanEChristian is adding with #396. Did you find any details on what is taking up the extra time? I've thought of some possible optimizations that would be relatively low-hanging fruit.
I've seen the type-based dispatch and conditionals of clojure.core/= cause performance problems before, probably because they confused the JIT compiler. If we know the type in question we can directly link to the equiv method though. The tradeoff of this would be the need to have instance checks when wrapping and unwrapping, but those might be less significant and in cases where the values aren't ever actually records might be optimized away anyway by the JVM since to my understanding sometimes it will make such assumptions that a path won't happen when such is observed at runtime but have the ability to "rewind" if it ever does. |
@WilliamParker The one I found that shows the most change is the old clara-benchmark project: https://github.com/rbrush/clara-benchmark. The most pronounced benchmark is the clara.simple-query-large, when run like this (and modifying the project.clj in clara-benchmark to try different versions): lein run clara.simple-query-large Your idea of additional type checks here is interesting, and I might give it a shot under this workload. |
I doubt that
I don't know that it is safe to "only wrap records". We want deep-clj equality checking on the data structures. It isn't only the top-level object that we care about. {:x user.MyRec{:y 10}} I agree that If we were able to directly target the I believe that the objects in this hot loop will all be the same type of object during the loop. So the interface will be a monomorphic callsite. However, repeated calls to so perhaps: (if (instance? IPersistentCollection record)
(.equiv ^IPersistentCollection record ^IPersistentCollection (.record ^ClojureRecordWrapper other))
(= record (.record ^ClojureRecordWrapper other))) This is assuming that the p.s. the |
@mrrodriguez I hadn't considered the case of records inside collections as in your example
I agree that is a problem that restricts our optimization opportunities. On the remaining optimization opportunities, I think we only need to know the type that is wrapped by the object on which equals is called to direct link potentially. Something like
We'd need to make completely sure we were hitting the correct method though and consider if it is something likely to change in Clojure if applicable. This is getting fairly involved - I don't know that we should block releasing the alpha preview on it. |
Unfortunately I don't see a clear way to eliminate the overhead here. As a test I just had the wrapper call the Java .equals method to separate the cost of creating the wrapper from that of Clojure equals semantics. It turns out just having a wrapper class accounts for about 60% of the overhead in my tests -- presumably because of the indirection and GC overhead that is being exposed in the microbenchmark. Using (= ...) instead of Java's (.equals ....) accounts for the rest. In any case, it's not clear this will significantly impact any real workloads, so I'm okay with doing this as an alpha/preview release to let people easily try it out. If this is a critical path, we might look at creating a more specialized implementation of group-by-seq using a bespoke data structure....although that would be more involved. |
@rbrush I agree that this shouldn't be holding back this alpha/preview release. I had one idea to maybe cut down on the new object allocations, but I think it'd only be useful if the common case is I haven't tested this at all, just showing it in case it looks interesting to explore to anyone else. Here is the outline: (defprotocol SetWrapper
(set-wrapped! [this wrapped wrapped-hash]))
(deftype JavaEqualityWrapper [^:unsynchronized-mutable wrapped
^int ^:unsynchronized-mutable hash-code]
SetWrapper
(set-wrapped! [this new-wrapped wrapped-hash]
(set! wrapped new-wrapped)
(set! hash-code wrapped-hash))
Object
(equals [this other]
(and (instance? JavaEqualityWrapper other)
(= wrapped (.wrapped ^JavaEqualityWrapper other))))
(hashCode [this]
hash-code))
(defn group-by-seq
[f coll]
(let [test-wrapper (JavaEqualityWrapper. nil nil)
^java.util.Map m (reduce (fn [^java.util.Map m x]
(let [k (f x)
k-hash (hash k)
tester (set-wrapped! test-wrapper k k-hash)
existing (.get m tester)
^java.util.ArrayList xs (or existing
(java.util.ArrayList.))]
(.add xs x)
(if existing
m
(.put m (JavaEqualityWrapper. k k-hash)
xs)))
m)
(java.util.LinkedHashMap.)
coll)
it (.iterator (.entrySet m))]
;; Explicitly iterate over a Java iterator in order to avoid running into issues as
;; discussed in http://dev.clojure.org/jira/browse/CLJ-1738
(loop [coll (transient [])]
(if (.hasNext it)
(let [^java.util.Map$Entry e (.next it)]
(recur (conj! coll [(.wrapped ^JavaEqualityWrapper (.getKey e)) (vec (.getValue e))])))
(persistent! coll)))))
Note:
|
I tried a few variations of this against the clara.simple-query-large benchmark. Interestingly enough, the most successful was calling .hashCode instead of clojure.core/hash when building the wrapper.
I realize that this isn't null-safe and would need to be adjusted for real use. EDIT: I also haven't investigated how using Java hashcode semantics with Clojure equality semantics would work or if it would be safe. This reduced the overhead to no more than 5% (and maybe just random noise, I'd need to test more) on a couple runs I did. I would want to validate this more extensively, and I haven't tracked down what exactly about the hashing is problematic. I also haven't tried any of the other benchmarks yet. |
@WilliamParker I don't think that'd be safe in a general case. Clojure I believe that one of the reasons that |
I saw the same results @WilliamParker saw when switching to .hashCode. Nice find! I hadn't thought that would be the most significant item. @mrrodriguez I agree with your comments in the general case, but I think this is safe in this specific case. The only requirement for using this in the JavaHashMap is that if two objects are Clojure =, they will produce the same Java .hashCode. I think this is the case. We will see more hash collisions using Java .hashCode, but the HashMap itself will still behave correctly.
Note this is not true if using Clojure == and Java .hashCode, since it handles numeric conversions and items with different hash values can be ==, but that doesn't apply here. All unit test pass with the Java .hashCode, if nothing else. This seems promising to me, but please let me know if I'm missing something. |
@rbrush I can't think of any specific examples. Numerics with I suppose it is ok. I feel on edge about it thinking that there are corner cases missing, but I don't know of any. Your assessment that
seems accurate. |
Alright, I took a pass at this and created PR #399 for comment. This brings the penalty at least into the mid single digits for this microbenchmark, and likely negligible for most real workloads. As an experiment I also tried a version where JavaEqualityWrapper was written in Java, but found no measurable difference there. |
I think this change will be OK as long as the following holds: "If two objects are Clojure =, then they are Java .equals". In theory one could create a type of ones own for which this didn't hold true, although I don't know why one would do so. IMO this is an OK thing to do for now - better than either the existing bug or introducing a significant performance hit. If it is a problem, it might be possible to avoid it in many cases by having something other than a Clojure persistent data structure as the key. The usage of this function use bindings maps as the key to group by, so we'd have to use a Java map there (in Clojure only of course) or define our own type. e.g.
You could still use the Clojure hash for the values while avoiding the extra cost for the binding structure, and avoid the cost when the values are primitives etc. Regarding the reason for the different Clojure hash, my (incomplete) understanding is that the motivation was to reduce hash collisions. Since this is a very temporary map, as a practical matter I expect we could reduce them by just allocating a larger map size upfront since it would be garbage collected shortly anyway. |
I don't know that I see how this would help. I'm not sure the outer-most object is the cause of the hash performance cost. My assumption at least has been the larger data structures in the binding maps are in the values of the map. Anyways, it's a future consideration anyways.
I believe that the primary reason that Clojure used mumur3 for strings to prevent "Denial of Service attacks through forced hash collisions". This was a hot topic that came up a few years ago that affected many main stream programming languages. Java has not changed the The reason that Clojure has a separate "hasheq" set of interfaces and contracts, in general, is not about collisions. It is about having different interpretations of equivalence across types (for example, collections), than what Java has. Also, there are other nuances there, such as clj records having their type considered in clj equality, but not in Java equality (meant to serve a generic interop purpose). |
@mrrodriguez There were issues with hashcode collisions on Clojure's highly nested data structures; see this design doc. Also see the map hasheq implementation; the differences aren't just in the primitives. So I actually do think it was in large part about collisions. As for whether the perf cost is coming from the primitive hashing or the larger data structures, I haven't done any testing to tease out where the cost exactly is. If the cost is in the primitives it would indeed be harder to avoid using the strategy I described of using a type other than a Clojure data structure to group by; that's a good point. The benchmark referred to here is using primitive types for the fields in its generator. In practice, I suspect that running into the issues described in the Clojure hash design doc would be abnormal although possible for idiomatic Clara rules. |
@WilliamParker I was referring to string hashing though. String isn’t mentioned in that design from what I can see. The doc has good info about other stuff relevant here though. String is where I’ve seen the biggest different in perf before. Since using a string key is common in things like JSON and Java and also is used behind keywords and symbol I believe. However some of those cache it. |
clara.rules.platform/group-by-seq
can cause a subtle defect when Clojure record are involved. More generally, the problem is thatc.r.p/group-by-seq
uses ajava.util.LinkedHashMap
to perform the grouping operation. This is a Java-based collection type and is only aware of Java based equality semantics, ieObject.equals()
andObject.hashCode()
.Clojure has it's own equality semantics taken into account in
=
andhash
. There are some occasions where the difference between the Clojure vs Java equality semantics can cause functionally different behavior.c.r.p/group-by-seq
is used internally within the Clara engine for various performance-sensitive operations. Thej.u.LinkedHashMap
was chosen to be used withinc.r.p/group-by-seq
to streamline this performance-sensitive operation. aj.u.LinkedHashMap
is convenient since it is mutable, for fast building of the ephemeral data structure needed. It also provides insert-order ordering determinism. This is useful to guarantee consistent execution times through the Clara engine from one JVM instance to the next, given the same input.A concrete example of a problem with
j.u.LinkedHashMap
's use of Java-based equality semantics has came up with regard to Clojure record types.Clojure record types use their type information in
clojure.core/=
comparisons andclojure.core/hash
. However, they also are meant to interop with Java, that is unaware of their record types, and instead they pose as plainjava.util.Map
impl's from the Java interop perspective. This means that Clojure recordsObject.equals()
andObject.hashCode()
implementations do not take the type of records into account.e.g.
This is a problem in the Clara engine. One example is that the engine it uses
c.r.p/group-by-seq
on:bindings
of a rule. The:bindings
may contained Clojure records. The Clojure records should be compared with Clojure equality semantics, but instead they are using Java semantics.@thegeez in the #clara Slack channel gave a reproducing case that can show this defect manifest itself. I'll copy it here:
To fix this, it seems likely that the
j.u.LinkedHashMap
should not be used. There aren't necessarily any Clojure equivalents that can capture the performance characteristics, and the order determinism that is desired here. Also, if there are, Clara will (and should) be reluctant to add more dependencies, due to the burden that places on consumers of the library who may use conflicting versions of those same dependencies (aka "jar hell").Edit: The comments below mention a wrapper on the items in the
j.u.LinkedHashMap
and I think that likely is the better approach than what I originally said below.My first thought would be to perhaps just consider a custom implementation of this function that had the correct behavior with respect to Clojure equality semantics and our ordering guarantees, yet retained comparable performance characteristics. This probably could be done with a combination of a Clojure transient collection and perhaps a separate structure for tracking the ordering of the incoming input. More thought would be needed.Related issues to the background of this:
#141
#207
The text was updated successfully, but these errors were encountered: