Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Schema and Worker Creation #513

Closed
ikitommi opened this issue Aug 17, 2021 · 28 comments
Closed

Performance of Schema and Worker Creation #513

ikitommi opened this issue Aug 17, 2021 · 28 comments
Labels
enhancement New feature or request

Comments

@ikitommi
Copy link
Member

ikitommi commented Aug 17, 2021

Problem Statement

Malli is optimized selectively for runtime performance. The performance critical paths include: validation, explanation, transformation and parsing. Good runtime performance has been achieved by pushing all possible work into worker initialization, away from the runtime.

Creating a schema and a validator and applying it, all at once:

;; 5.7µs
(m/validate [:or :int :string] 42)
; => true

Building the schema and schema validator separately, ahead of time:

;; 3.0µs
(def schema (m/schema [:or :int :string]))

;; 1.7µs
(def validate (m/validator schema))

;; 4ns
(validate 42)
; => true

The validation is over 1000x faster! For production apps with low latency requirements, one should always precompute the workers if possible. Tools like reitit coercion do this already on behalf of the user.

As it stands out now, the Schema & worker creation performance is only optimised for lines-of-implementation-code, which is not the best metric here. The perf is bad, REALLY bad.

Having a 500ms schema worker compilation phase to emit 1000x faster runtime is totally ok on a hi-performance cloud server but can be a show-stopper on targets like on browsers, especially if running on slow mobile phones.

Flamegraphs (less is better) with m/validate and m/validator:

using m/validator:

Screenshot 2021-08-17 at 9 15 08

using m/validate:

Screenshot 2021-08-17 at 9 17 23

Can we do something to make it better?

Definetely.

Schema and worker creation code should be optimised to a reasonable level for performance (both CPU & Memory) and Malli should support lazy initialisation to avoid stop-the-world pre-calculation of all in single-threaded environments

What can be done

  1. Creating Schema instances
  2. Transforming Schemas
  3. Guidelines and Tools

1. Creating Schema instances

  1. m/schema is just not optimised. We merge registries, eagerly create forms, parse entries etc. Quick wins available.
;; 400ns
(m/schema :int)
  1. all schema instances are created separately - any larger domain model can have hundreds or thousands (or hundreds of thousands) of identically behaving leaf schemas for no good reason. With a smart cache, we could re-use once created instances.
(= (m/schema :int)
   (m/schema :int))
; => false

in the flamegraph below:

  1. just bad code
  2. schema format checking should be optional
  3. can be removed with caching, all :ints are the same

Screenshot 2021-08-17 at 8 57 52

  1. entry parsing is slow. should be easy to make it 10x faster.
;; 44µs
(m/schema
 [:map
  [:x boolean?]
  [:y {:optional true} int?]
  [:z [:map
       [:x boolean?]
       [:y {:optional true} int?]]]])

Oh, the mountains!

Screenshot 2021-08-17 at 9 02 55

2. Transforming Schemas

  1. m/into-schema, m/walk and LensSchema protocol are the secret sauce of enabling generic transformations for schemas. But, elegant functional immutability is not performant by default. Running Schema transformations also currently forces re-creation of schemas.

Identity walker, e.g. "no-op":

;; 26µs
(m/walk schema (m/schema-walker identity))

Screenshot 2021-08-17 at 9 03 01

As the tools are generic, it might be easy to get order(s) of magnitude improvements to this using smart diffing & caching and adding optional new -copy constructors.

3. Guidelines and Tools

  1. Malli allows schemas to persisted (and loaded back), one could run the heavy transformations at development-time, emit just the final results as EDN, to be read back as ready schemas.
;; transform & persist at development/compile
(->> [:map
      [:x boolean?]
      [:y {:optional true} int?]
      [:z [:map
           [:x boolean?]
           [:y {:optional true} int?]]]]
     (mu/closed-schema)
     ;; ... some heavy transformations
     (malli.edn/write-string)
     (spit "malli.edn"))

;; reading it back at runtime
(-> "malli.edn" (slurp) (malli.edn/read-string))
  1. Programming againt the map-syntax can be used like transient & persistent! in Clojure: instead of going through the Schema instance trees, one could just accumulate the map-syntax and screate a schema out of that it the end. See upcoming changes to map syntax.
(-> [:map
     [:x boolean?]
     [:y {:optional true} int?]
     [:z [:map
          [:x boolean?]
          [:y {:optional true} int?]]]]
    (mu/to-map-syntax)
    (assoc-in [:properties :closed] true)
    (assoc-in [:children 2] [:z nil :int])
    (mu/from-map-syntax))
;[:map {:closed true} 
; [:x boolean?] 
; [:y {:optional true} int?] 
; [:z :int]]

TODO

  • define performance targets, e.g.
    • 2-4x faster schema creation
    • 10-20x faster schema transformations
    • 75% reduced memory foonprint
  • work as subtasks into this issue
  • time / sponsor(s) for making this
@bsless
Copy link
Contributor

bsless commented Aug 17, 2021

Initial observations, regarding simple schema only (referring to this image)

  1. Got no explanation to this gap on the left, however, registy has room to improve:
    1.1. Currently uses satisfies?, bad for runtime
    1.2. Add IntoRegistry protocol
    1.3. Extend to map with simple-registry constructor
    1.4. simple-registry can cheat and assume it can access the map via .valAt
  2. separate min and max from opts, bind children count
  3. The entire tower you see is due to the destructuring of ?props
    3.1. I can speed it up significantly but it's ugly, unless you want to pull in https://github.com/joinr/structural
  4. Should implement Schemas protocol for all schemas

These get ~2x speedup for (m/schema :int)

@ikitommi
Copy link
Member Author

Thanks for observing!

I think 1 & 2 would be a great start for this. With Schema instance cache, the price for creating :int is paid just once, so I suggest we don't add new libs yet.

@ikitommi
Copy link
Member Author

You have been busy @bsless! I started to look the schema transformation perf, first 2 loc change made the identity walking of example drop from 26µs to 1.3µs :) Will check the -copy constructor next & re-visit entry parsing after that. I'll check all PRs after that and see how they stack up.

@bsless
Copy link
Contributor

bsless commented Aug 22, 2021

Two observations regarding -form:

  • do we need to preserve meta? It adds some overhead
  • forms can be wrapped in delay so they are not realized until the schema is printed.

@bsless
Copy link
Contributor

bsless commented Aug 23, 2021

@ikitommi could you share the work you've done until now and results / findings? for science if nothing else :)

@bsless
Copy link
Contributor

bsless commented Aug 23, 2021

Should we look towards adding some performance regression tests on both platforms?
What kind of things would we like to test? How would results be presented and compared?
Will Github hunt us down and kill us if we start burning CPU in github actions for performance regression tests?
Capabilities:

  • cold start and steady-state tests
  • scan Java, GC and JIT settings
  • for each test case report with confidence level
  • associate results with git sha & date

@ikitommi
Copy link
Member Author

more complete perf suite would be great++

@ikitommi
Copy link
Member Author

ikitommi commented Aug 23, 2021

related to the copy-constructor. A cheap trick to implement it:

  1. reify also IntoSchema alongside Schema
  2. modify -parent to return this

=> all calls to -into-schema are handled by the same instance, making it see all closed variables, one can pass in extra data (e.g. parsed results) etc.

^{:type ::schema}
(reify
  IntoSchema
  (-into-schema [_ properties children options]
    (prn "gonna do a cheap copy for :map")
    (-into-schema parent properties children (assoc options ::parsed parsed)))
  (-type [_] (-type parent))
  (-type-properties [_] (-type-properties parent))
  (-properties-schema [_ options] (-properties-schema parent options))
  (-children-schema [_ options] (-children-schema parent options))

  Schema
  ...
  (-parent [this] this)

the good

each schema can decide wether to implement IntoSchema

the bad

few lines of boilerplate.

@bsless
Copy link
Contributor

bsless commented Aug 23, 2021

Something to be wary of if you implement IntoSchema on Schema is that all checks of either Schema or IntoSchema check for Schema first or you'll do double work, no?

@ikitommi
Copy link
Member Author

It would foremost still be a Schema, as it is checked first on m/schema. It could be used with m/into-schema, which currently doesn't allow Schema instances. Not 100% sure would there be an issue, please tell if you know one!

But, only use case for this would be to reuse some heavy calculation like parsing. But, already implemented the child equality check, so this is not needed in the the case "nothing changed":

(def schema
  (m/schema
    [:map
     [:x boolean?]
     [:y {:optional true} int?]
     [:z [:map
          [:x boolean?]
          [:y {:optional true} int?]]]]))

;; 26µs => 1.3µs
(m/walk schema (m/schema-walker identity))

... for the case "something changed", this would make sense, the copy-constructor could pass in partial parse results. Not sure if that's worth all the extra code. It could be generic code, but still.

new flamegraph of the identity walker:

Screenshot 2021-08-23 at 14 55 57

@bsless
Copy link
Contributor

bsless commented Aug 23, 2021

I meant specifically fns like:

(defn- -schema [?schema options]
  (or (and (or (schema? ?schema) (into-schema? ?schema)) ?schema)
      (-lookup ?schema options)
      (-fail! ::invalid-schema {:schema ?schema})))

It should be okay, though.

But, already implemented the child equality check, so this is not needed in the the case "nothing changed":

Someone will invent something that will surprise us.

@ikitommi
Copy link
Member Author

Here's a btw a comparison of a clojure.core identity walker:

;; 10µs
(clojure.walk/postwalk
  identity
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z [:map
        [:x boolean?]
        [:y {:optional true} int?]]]])

@ikitommi
Copy link
Member Author

urgh, the m/-schema is broken - it's used to look both Schema instances and IntoSchema instances. Depending on where it's called it either works or not. Here it fails:

(m/into-schema (m/schema :tuple) {} [:int])
; =throws=> No implementation of method: :-into-schema of protocol: #'malli.core/IntoSchema found for class: malli.core$_tuple_schema$reify$reify__24082

@bsless
Copy link
Contributor

bsless commented Aug 23, 2021

more complete perf suite would be great++

Here you go #525 :)

@ikitommi
Copy link
Member Author

ikitommi commented Aug 24, 2021

The current improvements from master in numbers:

(def ?schema
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z [:map
        [:x boolean?]
        [:y {:optional true} int?]]]])
        
;; 44µs => 9.4µs (4x)
(def schema (m/schema ?schema))

;; 26µs => 1.3µs (20x)
(m/walk schema (m/schema-walker identity))

;; 51µs => 7.2µs (7x)
(mu/closed-schema schema)

Creation

Screenshot 2021-08-24 at 16 36 06

Closed Map

Screenshot 2021-08-24 at 16 36 22

@ikitommi
Copy link
Member Author

ikitommi commented Aug 24, 2021

TODO, maybe:

  • I have a spike on the instance caching, the perf gains were not good enough with old/slow parsing, now it could show up
  • the copy-constructor, for entry-schemas, might be good, or not
  • delayed forms / entries / heavy computations
  • check that children are not moved over a slow abstraction, e.g. first on vector, turning sequences to vectors (without a good reason) etc.

@ikitommi
Copy link
Member Author

some more improvements from #531 (faster parsing & utilities)

;; 1.7µs => 1.1µs
(bench (m/schema [:or :int :string]))

;; 400ns -> 280ns
(bench (m/schema :int))

(def ?schema
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z [:map
        [:x boolean?]
        [:y {:optional true} int?]]]])

;; 9.4µs -> 8.5µs
(bench (m/schema ?schema))

@bsless
Copy link
Contributor

bsless commented Aug 25, 2021

some more improvements from #531 (faster parsing & utilities)

How does front-loading the destructuring work with taking functions as args in simple schema?

@ikitommi
Copy link
Member Author

I think it's bit slower with destructuring, but works (destructuring function returns nil) and it's not a very common case. Relevant tests: https://github.com/metosin/malli/blob/master/test/malli/core_test.cljc#L2045-L2089

@ikitommi
Copy link
Member Author

[metosin/malli "0.7.0-20210826.064206-1"] has all the currently cumulated improvements in.

@ikitommi
Copy link
Member Author

ikitommi commented Sep 12, 2021

One more small improvement (#539) coming, will check open PRs, but then closing this as the low-hanging fruits are collected, I think performance is more of a journey than a goal. That PR removes the last calls to satisfies?, making things like m/entries, m/deref and m/deref-all order of magnitude faster on JVM, less on the JS runtimes.

Also, entry-schemas can cache their parse-results and m/-set-entries and m/-set-properties can use that. As mostly everything is built on top of those, the perf gains span over most utilities.

Cumulative gains so far, on the JVM:

(def schema
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z [:map
        [:x boolean?]
        [:y {:optional true} int?]]]])

(def schema (m/schema ?schema))

;; 44µs -> 3.4µs (13x)
(bench (m/schema ?schema))

;; 4.2µs -> 830ns (4.5x)
(bench (mu/assoc schema :w :string))

;; 134µs -> 15µs (9x)
(bench (mu/merge schema schema))

;; 51µs -> 3.9µs (13x)
(bench (mu/closed-schema schema))

There is clearly a need to be a way to create large amount of "raw" schemas programmatically without checking all the intermediate steps for correctness, pre-create validators etc. I think the way to go is to add the first class support, described in #406. As it's all data, there is a clear separation of "constructing things" and "instantiating schemas. Also, I really like the clj-fx syntax, so malli should do that.

Will work on that with the new Clojurists Together funding.

@bsless
Copy link
Contributor

bsless commented Sep 12, 2021

It's only tangentially related but I imagine #498 could have benefits in the context of a long running application

@ikitommi
Copy link
Member Author

The entry-parsing was doing some things twice, and did a small tweak on merge. cumulative:

;; 134µs
;;  9µs (15x)
(bench (mu/merge schema schema))
;; 44µs
;; 3.4µs (13x)
(m/schema
 [:map
  [:x boolean?]
  [:y {:optional true} int?]
  [:z [:map
       [:x boolean?]
       [:y {:optional true} int?]]]])

@ikitommi
Copy link
Member Author

all the stuff in:

~ clj -Sforce -Sdeps '{:deps {metosin/malli {:mvn/version "0.7.0-SNAPSHOT"}}}'
Downloading: metosin/malli/0.7.0-SNAPSHOT/malli-0.7.0-20210914.182554-2.pom from clojars

@ikitommi
Copy link
Member Author

Forked #543 out of this with, first try already gave 15x better perf on schema creation (3.7µs -> 250ns).

@ikitommi
Copy link
Member Author

There are some open PRs, but they can be merged later, closing this one.

@ikitommi
Copy link
Member Author

ikitommi commented Oct 24, 2021

Master has now support for both memoized workers and lazy parsing (#550):

(def ?schema
  [:map
   [:x boolean?]
   [:y {:optional true} int?]
   [:z [:map
        [:x boolean?]
        [:y {:optional true} int?]]]])

(def schema (m/schema ?schema))

memoized workers:

;; 1.6µs -> 64ns (25x)
(p/bench (m/validate schema {:x true, :z {:x true}}))

lazy parsing:

;; 44µs -> 2.5µs (18x)
(bench (m/schema ?schema))

;; 44µs -> 240ns (180x, lazy)
(p/bench (m/schema ?schema {::m/lazy-entries true}))

Upcoming Schme AST (#544) will remove most of the parsing overhead, instead of just postponing it.

@ikitommi
Copy link
Member Author

Master also has now the Schema AST merged, yielding (hiccup) parse-free schema creation:

(def ast (m/ast ?schema))
;{:type :map,
; :keys {:x {:order 0, :value {:type boolean?}},
;        :y {:order 1, :value {:type int?}
;            :properties {:optional true}},
;        :z {:order 2,
;            :value {:type :map,
;                    :keys {:x {:order 0
;                               :value {:type boolean?}},
;                           :y {:order 1 
;                               :value {:type int?}
;                               :properties {:optional true}}}}}}}

;; 150ns (16x)
(p/bench (m/from-ast ast))

(-> ?schema
    (m/schema)
    (m/ast)
    (m/from-ast)
    (m/form)
    (= ?schema))
; => true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants