-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Schema and Worker Creation #513
Comments
Initial observations, regarding simple schema only (referring to this image)
These get ~2x speedup for |
Thanks for observing! I think 1 & 2 would be a great start for this. With Schema instance cache, the price for creating |
You have been busy @bsless! I started to look the schema transformation perf, first 2 loc change made the identity walking of example drop from 26µs to 1.3µs :) Will check the |
Two observations regarding
|
@ikitommi could you share the work you've done until now and results / findings? for science if nothing else :) |
Should we look towards adding some performance regression tests on both platforms?
|
more complete perf suite would be great++ |
related to the copy-constructor. A cheap trick to implement it:
=> all calls to ^{:type ::schema}
(reify
IntoSchema
(-into-schema [_ properties children options]
(prn "gonna do a cheap copy for :map")
(-into-schema parent properties children (assoc options ::parsed parsed)))
(-type [_] (-type parent))
(-type-properties [_] (-type-properties parent))
(-properties-schema [_ options] (-properties-schema parent options))
(-children-schema [_ options] (-children-schema parent options))
Schema
...
(-parent [this] this) the goodeach schema can decide wether to implement the badfew lines of boilerplate. |
Something to be wary of if you implement |
It would foremost still be a But, only use case for this would be to reuse some heavy calculation like parsing. But, already implemented the child equality check, so this is not needed in the the case "nothing changed": (def schema
(m/schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]]))
;; 26µs => 1.3µs
(m/walk schema (m/schema-walker identity)) ... for the case "something changed", this would make sense, the copy-constructor could pass in partial parse results. Not sure if that's worth all the extra code. It could be generic code, but still. new flamegraph of the identity walker: |
I meant specifically fns like: (defn- -schema [?schema options]
(or (and (or (schema? ?schema) (into-schema? ?schema)) ?schema)
(-lookup ?schema options)
(-fail! ::invalid-schema {:schema ?schema}))) It should be okay, though.
Someone will invent something that will surprise us. |
Here's a btw a comparison of a ;; 10µs
(clojure.walk/postwalk
identity
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]]) |
urgh, the (m/into-schema (m/schema :tuple) {} [:int])
; =throws=> No implementation of method: :-into-schema of protocol: #'malli.core/IntoSchema found for class: malli.core$_tuple_schema$reify$reify__24082 |
Here you go #525 :) |
The current improvements from master in numbers: (def ?schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]])
;; 44µs => 9.4µs (4x)
(def schema (m/schema ?schema))
;; 26µs => 1.3µs (20x)
(m/walk schema (m/schema-walker identity))
;; 51µs => 7.2µs (7x)
(mu/closed-schema schema) CreationClosed Map |
TODO, maybe:
|
some more improvements from #531 (faster parsing & utilities) ;; 1.7µs => 1.1µs
(bench (m/schema [:or :int :string]))
;; 400ns -> 280ns
(bench (m/schema :int))
(def ?schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]])
;; 9.4µs -> 8.5µs
(bench (m/schema ?schema)) |
How does front-loading the destructuring work with taking functions as args in simple schema? |
I think it's bit slower with destructuring, but works (destructuring function returns nil) and it's not a very common case. Relevant tests: https://github.com/metosin/malli/blob/master/test/malli/core_test.cljc#L2045-L2089 |
|
One more small improvement (#539) coming, will check open PRs, but then closing this as the low-hanging fruits are collected, I think performance is more of a journey than a goal. That PR removes the last calls to Also, entry-schemas can cache their parse-results and Cumulative gains so far, on the JVM: (def schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]])
(def schema (m/schema ?schema))
;; 44µs -> 3.4µs (13x)
(bench (m/schema ?schema))
;; 4.2µs -> 830ns (4.5x)
(bench (mu/assoc schema :w :string))
;; 134µs -> 15µs (9x)
(bench (mu/merge schema schema))
;; 51µs -> 3.9µs (13x)
(bench (mu/closed-schema schema)) There is clearly a need to be a way to create large amount of "raw" schemas programmatically without checking all the intermediate steps for correctness, pre-create validators etc. I think the way to go is to add the first class support, described in #406. As it's all data, there is a clear separation of "constructing things" and "instantiating schemas. Also, I really like the clj-fx syntax, so malli should do that. Will work on that with the new Clojurists Together funding. |
It's only tangentially related but I imagine #498 could have benefits in the context of a long running application |
The entry-parsing was doing some things twice, and did a small tweak on merge. cumulative: ;; 134µs
;; 9µs (15x)
(bench (mu/merge schema schema)) ;; 44µs
;; 3.4µs (13x)
(m/schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]]) |
all the stuff in: ➜ ~ clj -Sforce -Sdeps '{:deps {metosin/malli {:mvn/version "0.7.0-SNAPSHOT"}}}'
Downloading: metosin/malli/0.7.0-SNAPSHOT/malli-0.7.0-20210914.182554-2.pom from clojars |
There are some open PRs, but they can be merged later, closing this one. |
Master has now support for both memoized workers and lazy parsing (#550): (def ?schema
[:map
[:x boolean?]
[:y {:optional true} int?]
[:z [:map
[:x boolean?]
[:y {:optional true} int?]]]])
(def schema (m/schema ?schema)) memoized workers: ;; 1.6µs -> 64ns (25x)
(p/bench (m/validate schema {:x true, :z {:x true}})) lazy parsing: ;; 44µs -> 2.5µs (18x)
(bench (m/schema ?schema))
;; 44µs -> 240ns (180x, lazy)
(p/bench (m/schema ?schema {::m/lazy-entries true})) Upcoming Schme AST (#544) will remove most of the parsing overhead, instead of just postponing it. |
Master also has now the Schema AST merged, yielding (hiccup) parse-free schema creation: (def ast (m/ast ?schema))
;{:type :map,
; :keys {:x {:order 0, :value {:type boolean?}},
; :y {:order 1, :value {:type int?}
; :properties {:optional true}},
; :z {:order 2,
; :value {:type :map,
; :keys {:x {:order 0
; :value {:type boolean?}},
; :y {:order 1
; :value {:type int?}
; :properties {:optional true}}}}}}}
;; 150ns (16x)
(p/bench (m/from-ast ast))
(-> ?schema
(m/schema)
(m/ast)
(m/from-ast)
(m/form)
(= ?schema))
; => true |
Problem Statement
Malli is optimized selectively for runtime performance. The performance critical paths include: validation, explanation, transformation and parsing. Good runtime performance has been achieved by pushing all possible work into worker initialization, away from the runtime.
Creating a schema and a validator and applying it, all at once:
Building the schema and schema validator separately, ahead of time:
The validation is over 1000x faster! For production apps with low latency requirements, one should always precompute the workers if possible. Tools like reitit coercion do this already on behalf of the user.
As it stands out now, the Schema & worker creation performance is only optimised for lines-of-implementation-code, which is not the best metric here. The perf is bad, REALLY bad.
Having a 500ms schema worker compilation phase to emit 1000x faster runtime is totally ok on a hi-performance cloud server but can be a show-stopper on targets like on browsers, especially if running on slow mobile phones.
Flamegraphs (less is better) with
m/validate
andm/validator
:using
m/validator
:using
m/validate
:Can we do something to make it better?
Definetely.
Schema and worker creation code should be optimised to a reasonable level for performance (both CPU & Memory) and Malli should support lazy initialisation to avoid stop-the-world pre-calculation of all in single-threaded environments
What can be done
1. Creating Schema instances
m/schema
is just not optimised. We merge registries, eagerly create forms, parse entries etc. Quick wins available.in the flamegraph below:
:int
s are the sameOh, the mountains!
2. Transforming Schemas
m/into-schema
,m/walk
andLensSchema
protocol are the secret sauce of enabling generic transformations for schemas. But, elegant functional immutability is not performant by default. Running Schema transformations also currently forces re-creation of schemas.Identity walker, e.g. "no-op":
As the tools are generic, it might be easy to get order(s) of magnitude improvements to this using smart diffing & caching and adding optional new
-copy
constructors.3. Guidelines and Tools
transient
&persistent!
in Clojure: instead of going through the Schema instance trees, one could just accumulate the map-syntax and screate a schema out of that it the end. See upcoming changes to map syntax.TODO
The text was updated successfully, but these errors were encountered: