Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of Concept: Automatically store strings as raw values #938

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

casperisfine
Copy link
Contributor

NB: I'm opening this as a proof of concept because there is a number of specs that need to be updated. It's menial work that I'd rather not do if there is no interest for such feature, but that I can easily do if the feature is desired.

Context

When a value is already a String, there is little point using Marshal to serialize it. The only benefit is to properly preserve the String encoding, but this can instead be stored as a bitflag on the key.

Benchmark

On a simple benchmark reading a 1MB UTF-8 string, it's about twice faster.

require 'bundler/inline'

gemfile do
  source "https://rubygems.org"
  gem "dalli"
  gem "benchmark-ips"
end

require "dalli"
require "benchmark/ips"

client = Dalli::Client.new("localhost", compress: false)
payload = "B" * 1_000_000
client.set("key", payload)
Benchmark.ips do |x|
  x.report("get 1MB UTF-8") { client.get("key") }
end
$ ruby /tmp/benchmark-dalli.rb
Warming up --------------------------------------
       get 1MB UTF-8   156.000  i/100ms
Calculating -------------------------------------
       get 1MB UTF-8      1.582k (± 2.7%) i/s -      7.956k in   5.031764s
$ ruby -Ilib /tmp/benchmark-dalli.rb
Warming up --------------------------------------
       get 1MB UTF-8   280.000  i/100ms
Calculating -------------------------------------
       get 1MB UTF-8      2.798k (± 4.3%) i/s -     14.000k in   5.012061s

This is inspired by my work on our in-house serializer library: Shopify/paquito#20

When a value is already a String, there is little point using Marshal
to serialize it. The only benefit is to properly presserve the String
encoding, but this can instead be stored as a bitflag on the key.

On a simple benchmark reading a 1MB UTF-8 string, it's about twice faster.

```ruby

require 'bundler/inline'

gemfile do
  source "https://rubygems.org"
  gem "dalli"
  gem "benchmark-ips"
end

require "dalli"
require "benchmark/ips"

client = Dalli::Client.new("localhost", compress: false)
payload = "B" * 1_000_000
client.set("key", payload)
Benchmark.ips do |x|
  x.report("get 1MB UTF-8") { client.get("key") }
end
```

```
$ ruby /tmp/benchmark-dalli.rb
Warming up --------------------------------------
       get 1MB UTF-8   156.000  i/100ms
Calculating -------------------------------------
       get 1MB UTF-8      1.582k (± 2.7%) i/s -      7.956k in   5.031764s
$ ruby -Ilib /tmp/benchmark-dalli.rb
Warming up --------------------------------------
       get 1MB UTF-8   280.000  i/100ms
Calculating -------------------------------------
       get 1MB UTF-8      2.798k (± 4.3%) i/s -     14.000k in   5.012061s
```

This is inspired by my work on our in-house serializer library:
Shopify/paquito#20
@petergoldstein
Copy link
Owner

Any idea how that 2x benefit scales with payload size? I don't think storing 1 MB strings is particularly unusual, but I'm also not sure it's the highest frequency case. And there's additional conceptual overhead in the API by adding these as explicit encoding options.

How would this look in "real" apps? Would this be a big benefit if, for example, the Rails cache checked if an object was a String before adding and used the encoding flag?

Thoughts?

@casperisfine
Copy link
Contributor Author

Any idea how that 2x benefit scales with payload size?

It's more or less linear. Marshal is relatively fast at serializing strings since most of it is just adding a prefix and then doing a memcpy. I'll expand the benchmark to test different string sizes.

Would this be a big benefit if, for example, the Rails cache checked if an object was a String before adding and used the encoding flag?

Well, since I refactored it in Rails 7.0, Rails' MemCacheStore always pass a string to dalli. That said we could initialize Dalli with serialize: false, but it wasn't done before so it means breaking backward compat :/

Another advantage of this feature it that it allows to preserve common string encodings when using raw: true.

@casperisfine
Copy link
Contributor Author

casperisfine commented Nov 29, 2022

Here's an updated benchmark.

On my machine (M1 pro), the difference start to be significant at 150KB, and then it grows more or less linearly from there.

Note that this is pretty much a memcpy benchmark, so might change quite a bit based on RAM speed etc.

== 100kB ==
Warming up --------------------------------------
             patched     1.236k i/100ms
Calculating -------------------------------------
             patched     13.394k (± 6.8%) i/s -     66.744k in   5.006565s

Comparison:
             patched:    13394.0 i/s
            baseline:    13216.6 i/s - same-ish: difference falls within error

== 150kB ==
Warming up --------------------------------------
             patched     1.115k i/100ms
Calculating -------------------------------------
             patched     11.133k (± 6.8%) i/s -     55.750k in   5.031202s

Comparison:
             patched:    11132.9 i/s
            baseline:     9636.4 i/s - 1.16x  (± 0.00) slower

== 250kB ==
Warming up --------------------------------------
             patched   780.000  i/100ms
Calculating -------------------------------------
             patched      6.828k (± 6.8%) i/s -     34.320k in   5.049507s

Comparison:
             patched:     6827.6 i/s
            baseline:     5140.8 i/s - 1.33x  (± 0.00) slower

== 500kB ==
Warming up --------------------------------------
             patched   398.000  i/100ms
Calculating -------------------------------------
             patched      3.950k (± 5.4%) i/s -     19.900k in   5.051562s

Comparison:
             patched:     3950.2 i/s
            baseline:     2593.9 i/s - 1.52x  (± 0.00) slower

== 1000kB ==
Warming up --------------------------------------
             patched   223.000  i/100ms
Calculating -------------------------------------
             patched      2.257k (± 4.4%) i/s -     11.373k in   5.048974s

Comparison:
             patched:     2256.8 i/s
            baseline:     1315.9 i/s - 1.71x  (± 0.00) slower
Benchmark source
# frozen_string_literal: true


version = ENV["PATCH"] ? "patched" : "baseline"
if ENV["PATCH"]
  $LOAD_PATH.unshift("lib")
end

require 'bundler/inline'

gemfile do
  source "https://rubygems.org"
  gem "dalli"
  gem "benchmark-ips"
end

require "dalli"
require "benchmark/ips"

client = Dalli::Client.new("localhost", compress: false)
[100, 150, 250, 500, 1_000].each do |size|
  puts "== #{size}kB =="
  payload = "B" * 1_000 * size
  client.set("key", payload)
  Benchmark.ips do |x|
    x.report(version) { client.get("key") }
    x.save!("/tmp/dalli-bench-#{size}kb.data")
    x.compare!
  end
end

@drinkbeer
Copy link

Hey @casperisfine , could I know where are the strings from? Are they HTML or random generated strings or query results?

@casperisfine
Copy link
Contributor Author

@drinkbeer the benchmark source is provided, it's just payload = "B" * 1_000 * size.

The content of the string doesn't matter here because I initialize Dalli with compress: false to not skew the results.

@casperisfine
Copy link
Contributor Author

If this feature is deemed undesirable, I'd like to suggest an alternative which is to allow to pass a custom ValueMarshaller, so that users can implement this kind of logic using flags themselves.

@danmayer
Copy link

It looks like this is still an issue and would impact how Rails uses dalli as it manages it's own serializer / compressor...

require 'dalli'
require 'benchmark/ips'

##
# StringSerializer is a serializer that avoids the overhead of Marshal or JSON.
##
class StringSerializer
  def self.dump(value)
    value
  end

  def self.load(value)
    value
  end
end

client = Dalli::Client.new('localhost', compress: false)
json_client = Dalli::Client.new('localhost', serializer: JSON, compress: false)
string_client = Dalli::Client.new('localhost', serializer: StringSerializer, compress: false)

payload = 'B' * 1_000_000
client.set('key', payload)
json_client.set('json_key', payload)
string_client.set('string_key', payload)
Benchmark.ips do |x|
  x.report('get 1MB MARSHAL') { client.get('key') }
  x.report('get 1MB JSON') { json_client.get('json_key') }
  x.report('get 1MB STRING') { string_client.get('string_key') }
end

gives

Warming up --------------------------------------
     get 1MB MARSHAL   168.000 i/100ms
        get 1MB JSON    59.000 i/100ms
      get 1MB STRING   225.000 i/100ms
Calculating -------------------------------------
     get 1MB MARSHAL      1.755k (± 5.1%) i/s  (569.80 μs/i) -      8.904k in   5.087145s
        get 1MB JSON    604.734 (± 2.0%) i/s    (1.65 ms/i) -      3.068k in   5.075365s
      get 1MB STRING      2.411k (± 5.6%) i/s  (414.74 μs/i) -     12.150k in   5.056508s

Are folks good if we rebase and clean this up to make it easier to callers of dalli to work with strings when then they handle encoding and such outside of the dalli gem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants