Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulfill https://github.com/nim-lang/Nim/pull/14995#issuecomment-66491… #15104

Merged
merged 1 commit into from
Jul 28, 2020

Conversation

c-blake
Copy link
Contributor

@c-blake c-blake commented Jul 28, 2020

(nim r tables; nim r sharedtables) work. Seems ready for CI testing. (This replaces
the now defunct #15103)

request.  This can be conceived as an alternate, more capable resolution of
  nim-lang#12200
than
  nim-lang#12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.
@c-blake
Copy link
Contributor Author

c-blake commented Jul 28, 2020

While I think it's fairly self-evident from source code and algos, some like to see benchmarks. These are based on #14995 (comment) but reduced.

import tables, times

template timeIt(label1, label2, value2, scale, body) =
  var t0 = epochTime()
  body
  let dt = epochTime() - t0
  echo label1, dt * 1e9 / scale, label2, value2

proc runtest(els: int) =
  var ot = initCountTable[int](els)
  timeIt("Create " & $(els div 1_000) & "Kitem table (ns/elt): ",
         " len ", ot.len, float(els)):
    for i in 0 ..< els: ot[i] = i
  timeIt("Delete99% (ns/elt): ", " len ", ot.len, float(els)):
    for i in 0 ..< els:
      if i mod 100 != 0: ot.del(i)
for els in @[ 10_000, 50_000 ]: runtest(els)

On an (Intel i7-6700k@4.7GHz, Linux, nim bbb250f, d:danger, etc.), I get these results.
Before (manually rounded times to 0.1 ns):

Create 10Kitem table (ns/elt): 21.5 len 9999
Delete99% (ns/elt): 78735.2 len 99
Create 50Kitem table (ns/elt): 26.7 len 49999
Delete99% (ns/elt): 457257.2 len 499

After (manually rounded times to 0.1 ns):

Create 10Kitem table (ns/elt): 10.7 len 9999
Delete99% (ns/elt): 28.2 len 99
Create 50Kitem table (ns/elt): 9.5 len 49999
Delete99% (ns/elt): 14.3 len 499

So, we see deletes get 78735/28 =~ 2800x faster and 457257/14 =~ 33,000x faster. Obviously, since it's O(n) expected vs O(1) that could be made as big as you want. (It's 1.7x and 1.5x less than 5000x and 10,000x since branch prediction/pre-fetching/etc.)

A careful eye might notice inserts getting faster across the PR, though no relevant code changed. I am not sure what that is, but suspect gcc code auto-inlining/layout/branch prediction/instruction cache type explanations that might not even generalize to an AMD x64 CPU. In any case, it's in the right direction, and also only O(2x). PGO can make up to 2x deltas, anyway. So a less half-assed study would be needed to really know. (I mentioned it so you know I'm aware. It might have been wiser to delete the timing for the insert phase...)

@c-blake
Copy link
Contributor Author

c-blake commented Jul 28, 2020

All tests pass. Also, if you would like we could put a 15-20 line explanatory note in front of delImplIdx to perhaps prevent people from changing without understanding. Let me know about that before you merge.

@Araq
Copy link
Member

Araq commented Jul 28, 2020

All tests pass. Also, if you would like we could put a 15-20 line explanatory note in front of delImplIdx to perhaps prevent people from changing without understanding. Let me know about that before you merge.

More comments are always nice but can be done in a follow-up PR. I'm merging this already because it's so beautiful. :-)

@Araq Araq merged commit b2a1944 into nim-lang:devel Jul 28, 2020
@c-blake
Copy link
Contributor Author

c-blake commented Jul 28, 2020

Ok. Thanks. :-)

@c-blake c-blake deleted the CountTable_del_fix2 branch July 28, 2020 21:52
@c-blake
Copy link
Contributor Author

c-blake commented Jul 29, 2020

#15108

narimiran pushed a commit that referenced this pull request Jul 29, 2020
request.  This can be conceived as an alternate, more capable resolution of
  #12200
than
  #12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.

(cherry picked from commit b2a1944)
mildred pushed a commit to mildred/Nim that referenced this pull request Jan 11, 2021
request.  This can be conceived as an alternate, more capable resolution of
  nim-lang#12200
than
  nim-lang#12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants