Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CountTable is not reliable. #12200

Closed
khchen opened this issue Sep 16, 2019 · 6 comments
Closed

CountTable is not reliable. #12200

khchen opened this issue Sep 16, 2019 · 6 comments

Comments

@khchen
Copy link
Contributor

khchen commented Sep 16, 2019

Today I got a very strange output from my app. After digging, I found the bug is in the CountTable.

Example

import tables

var ct = initCountTable[int]()
ct.inc(130, 1)
ct.inc(132, 1)
ct.inc(258, 1)

ct.inc(131, 100)
echo ct[131] # should output 100

ct.inc(132, -1)
echo ct[131] # should output 100, too

Current Output

100
0

Expected Output

100
100

Additional Information

Nim Compiler Version 0.20.99 [Windows: amd64]
@khchen khchen changed the title Bug in CountTable. CountTable is not reliable. Sep 16, 2019
@narimiran
Copy link
Member

Thanks for using the issue template and coming up with the precise* small example of the problem. It really makes debugging much easier.

* because even small changes don't exhibit this behaviour — this bug has been with us for a very long time, and only now somebody has spotted it.

narimiran added a commit to narimiran/Nim that referenced this issue Sep 16, 2019
We cannot use `while t.data[h].val != 0:` in `rawGet` because zero can be a
valid value for an existing key.
@Araq
Copy link
Member

Araq commented Sep 16, 2019

You really can't count negative values like that... The CountTable is designed that 0 occurrences are not stored.

@khchen
Copy link
Contributor Author

khchen commented Sep 16, 2019

You really can't count negative values like that... The CountTable is designed that 0 occurrences are not stored.

You mean CountTable cannot inc() negative value or cannot store negative value?
However, I still cannot understand why access [132] will influence the value in [131].

@Araq
Copy link
Member

Araq commented Sep 17, 2019

inc should take a Positive number, yes.

@andreaferretti
Copy link
Collaborator

That should be documented, and the argument of inc should be a Natural then

narimiran added a commit to narimiran/Nim that referenced this issue Sep 17, 2019
@narimiran
Copy link
Member

That should be documented, and the argument of inc should be a Natural then

Done. #12208

@Araq Araq closed this as completed in 618316b Sep 17, 2019
c-blake added a commit to c-blake/Nim that referenced this issue Jul 28, 2020
request.  This can be conceived as an alternate, more capable resolution of
  nim-lang#12200
than
  nim-lang#12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.
Araq pushed a commit that referenced this issue Jul 28, 2020
request.  This can be conceived as an alternate, more capable resolution of
  #12200
than
  #12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.
narimiran pushed a commit that referenced this issue Jul 29, 2020
request.  This can be conceived as an alternate, more capable resolution of
  #12200
than
  #12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.

(cherry picked from commit b2a1944)
mildred pushed a commit to mildred/Nim that referenced this issue Jan 11, 2021
request.  This can be conceived as an alternate, more capable resolution of
  nim-lang#12200
than
  nim-lang#12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants