-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of Concept: Automatically store strings as raw values #938
base: main
Are you sure you want to change the base?
Proof of Concept: Automatically store strings as raw values #938
Conversation
When a value is already a String, there is little point using Marshal to serialize it. The only benefit is to properly presserve the String encoding, but this can instead be stored as a bitflag on the key. On a simple benchmark reading a 1MB UTF-8 string, it's about twice faster. ```ruby require 'bundler/inline' gemfile do source "https://rubygems.org" gem "dalli" gem "benchmark-ips" end require "dalli" require "benchmark/ips" client = Dalli::Client.new("localhost", compress: false) payload = "B" * 1_000_000 client.set("key", payload) Benchmark.ips do |x| x.report("get 1MB UTF-8") { client.get("key") } end ``` ``` $ ruby /tmp/benchmark-dalli.rb Warming up -------------------------------------- get 1MB UTF-8 156.000 i/100ms Calculating ------------------------------------- get 1MB UTF-8 1.582k (± 2.7%) i/s - 7.956k in 5.031764s $ ruby -Ilib /tmp/benchmark-dalli.rb Warming up -------------------------------------- get 1MB UTF-8 280.000 i/100ms Calculating ------------------------------------- get 1MB UTF-8 2.798k (± 4.3%) i/s - 14.000k in 5.012061s ``` This is inspired by my work on our in-house serializer library: Shopify/paquito#20
Any idea how that 2x benefit scales with payload size? I don't think storing 1 MB strings is particularly unusual, but I'm also not sure it's the highest frequency case. And there's additional conceptual overhead in the API by adding these as explicit encoding options. How would this look in "real" apps? Would this be a big benefit if, for example, the Rails cache checked if an object was a String before adding and used the encoding flag? Thoughts? |
It's more or less linear.
Well, since I refactored it in Rails 7.0, Rails' Another advantage of this feature it that it allows to preserve common string encodings when using |
Here's an updated benchmark. On my machine (M1 pro), the difference start to be significant at 150KB, and then it grows more or less linearly from there. Note that this is pretty much a
Benchmark source# frozen_string_literal: true
version = ENV["PATCH"] ? "patched" : "baseline"
if ENV["PATCH"]
$LOAD_PATH.unshift("lib")
end
require 'bundler/inline'
gemfile do
source "https://rubygems.org"
gem "dalli"
gem "benchmark-ips"
end
require "dalli"
require "benchmark/ips"
client = Dalli::Client.new("localhost", compress: false)
[100, 150, 250, 500, 1_000].each do |size|
puts "== #{size}kB =="
payload = "B" * 1_000 * size
client.set("key", payload)
Benchmark.ips do |x|
x.report(version) { client.get("key") }
x.save!("/tmp/dalli-bench-#{size}kb.data")
x.compare!
end
end |
Hey @casperisfine , could I know where are the strings from? Are they HTML or random generated strings or query results? |
@drinkbeer the benchmark source is provided, it's just The content of the string doesn't matter here because I initialize Dalli with |
If this feature is deemed undesirable, I'd like to suggest an alternative which is to allow to pass a custom |
It looks like this is still an issue and would impact how Rails uses dalli as it manages it's own serializer / compressor...
gives
Are folks good if we rebase and clean this up to make it easier to callers of dalli to work with strings when then they handle encoding and such outside of the dalli gem? |
NB: I'm opening this as a proof of concept because there is a number of specs that need to be updated. It's menial work that I'd rather not do if there is no interest for such feature, but that I can easily do if the feature is desired.
Context
When a value is already a String, there is little point using Marshal to serialize it. The only benefit is to properly preserve the String encoding, but this can instead be stored as a bitflag on the key.
Benchmark
On a simple benchmark reading a 1MB UTF-8 string, it's about twice faster.
This is inspired by my work on our in-house serializer library: Shopify/paquito#20