clara.rules.platform/group-by-seq used in engine may cause incorrect semantics

`clara.rules.platform/group-by-seq` can cause a subtle defect when Clojure record are involved. More generally, the problem is that `c.r.p/group-by-seq` uses a `java.util.LinkedHashMap` to perform the grouping operation. This is a Java-based collection type and is only aware of Java based equality semantics, ie `Object.equals()` and `Object.hashCode()`.

Clojure has it's own equality semantics taken into account in `=` and `hash`. There are some occasions where the difference between the Clojure vs Java equality semantics can cause functionally different behavior.

`c.r.p/group-by-seq` is used internally within the Clara engine for various performance-sensitive operations. The `j.u.LinkedHashMap` was chosen to be used within `c.r.p/group-by-seq` to streamline this performance-sensitive operation. a `j.u.LinkedHashMap` is convenient since it is mutable, for fast building of the ephemeral data structure needed. It also provides insert-order ordering determinism. This is useful to guarantee consistent execution times through the Clara engine from one JVM instance to the next, given the same input.

A concrete example of a problem with `j.u.LinkedHashMap`'s use of Java-based equality semantics has came up with regard to Clojure record types. 
Clojure record types use their type information in `clojure.core/=` comparisons and `clojure.core/hash`. However, they also are meant to interop with Java, that is unaware of their record types, and instead they pose as plain `java.util.Map` impl's from the Java interop perspective. This means that Clojure records `Object.equals()` and `Object.hashCode()` implementations do not take the type of records into account.

e.g.
```clj
(defrecord Foo [])
(defrecord Bar[])

(= (->Foo) (->Bar)) ;;= false
(.equals (->Foo) (->Bar)) ;;= true
```

This is a problem in the Clara engine. One example is that the engine it uses `c.r.p/group-by-seq` on `:bindings` of a rule. The `:bindings` may contained Clojure records. The Clojure records should be compared with Clojure equality semantics, but instead they are using Java semantics.

@thegeez in the #clara Slack channel gave a reproducing case that can show this defect manifest itself. I'll copy it here:
```clj
(defrecord ClauseOne [b])
(defrecord ClauseTwo [b])

(defrule rule-one
  [ClauseOne (= ?b b)]
  [ClauseTwo (= ?b b)]
 =>
  (println "Match found:" ?b )
  )

(defrecord WrapAlice [num])
(defrecord WrapBob [num])

(comment
  (clear-ns-productions!)
  (let [alice (->WrapAlice 1)
        bob (->WrapBob 1)]
    (println "(= alice bob)" (= alice bob))
    (println "(.equals alice bob)" (.equals alice bob))
    (-> (mk-session)
        (insert
         (->ClauseOne alice)
         ;; Line below is important, without it the wrong behavior does not surface
         (->ClauseTwo alice)
         (->ClauseTwo bob)
         )
        (fire-rules)
        ))
  ;;(= alice bob) false
  ;;(.equals alice bob) true
  ;; Match found: #clara_bug.core.WrapAlice{:num 1}
  ;; Match found: #clara_bug.core.WrapBob{:num 1} ;; <- should not happen
  )
```


To fix this, it seems likely that the `j.u.LinkedHashMap` should not be used. There aren't necessarily any Clojure equivalents that can capture the performance characteristics, and the order determinism that is desired here. Also, if there are, Clara will (and should) be reluctant to add more dependencies, due to the burden that places on consumers of the library who may use conflicting versions of those same dependencies (aka "jar hell").


Edit: The comments below mention a wrapper on the items in the `j.u.LinkedHashMap` and I think that likely is the better approach than what I originally said below.

~~My first thought would be to perhaps just consider a custom implementation of this function that had the correct behavior with respect to Clojure equality semantics and our ordering guarantees, yet retained comparable performance characteristics. This probably could be done with a combination of a Clojure transient collection and perhaps a separate structure for tracking the ordering of the incoming input. More thought would be needed.~~


-----
Related issues to the background of this:

#141 
#207 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clara.rules.platform/group-by-seq used in engine may cause incorrect semantics #393

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

clara.rules.platform/group-by-seq used in engine may cause incorrect semantics #393

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions