Skip to content

bad Dict performance for small integers, strings #1423

@binarybana

Description

@binarybana

In playing with dictionaries, I boiled down my code to a microbenchmark of simply filling a Dict{Int,Float64} with a million elements. Compared to Python, the results are disappointing. Here's the gist of the gist:

function stress0(N::Int)
    x = Dict{Int,Float64}()
    for i = 1:N
        x[i] = 0.0
    end
    return x
end

Full Julia and Python code

julia> load("test.jl")

julia> for i=1:5 @time stress0(10^6); end
elapsed time: 3.6805942058563232 seconds
elapsed time: 4.042525053024292 seconds
elapsed time: 4.042686939239502 seconds
elapsed time: 4.022021055221558 seconds
elapsed time: 4.173625946044922 seconds

In [1]: %run test.py

In [4]: %time stress0(10**6);
CPU times: user 0.17 s, sys: 0.08 s, total: 0.25 s
Wall time: 0.23 s

In [5]: %time stress0(10**6);
CPU times: user 0.15 s, sys: 0.07 s, total: 0.22 s
Wall time: 0.21 s

In [6]: %time stress0(10**6);
CPU times: user 0.18 s, sys: 0.08 s, total: 0.27 s
Wall time: 0.20 s

So roughly a 20x difference. And even if we preallocate the dictionary to a more appropriate size:

julia> for i=1:5 @time stress0prime(10^6); end
elapsed time: 0.8082170486450195 seconds
elapsed time: 0.9532439708709717 seconds
elapsed time: 0.9672729969024658 seconds
elapsed time: 0.9541919231414795 seconds
elapsed time: 0.9703891277313232 seconds

We are still around 5x. I heard somewhere that Julia has a much better hashing algorithm than Python (in terms of distribution), and if that's the cause of this, is it worth the 20x?

And to make sure this wasn't a read/write tradeoff, reading is not as bad, but still 5x slower:

julia> x = stress0(10^6);

julia> for i=1:5 @time stress2(x,10^6); end
elapsed time: 1.0894720554351807 seconds
elapsed time: 1.150738000869751 seconds
elapsed time: 1.089750051498413 seconds
elapsed time: 1.116157054901123 seconds
elapsed time: 1.0949499607086182 seconds

In [8]: x = stress0(10**6)

In [12]: %time stress2(x,10**6);
CPU times: user 0.26 s, sys: 0.03 s, total: 0.28 s
Wall time: 0.26 s

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions