CountTable is not reliable. #12200

khchen · 2019-09-16T16:04:03Z

Today I got a very strange output from my app. After digging, I found the bug is in the CountTable.

Example

import tables

var ct = initCountTable[int]()
ct.inc(130, 1)
ct.inc(132, 1)
ct.inc(258, 1)

ct.inc(131, 100)
echo ct[131] # should output 100

ct.inc(132, -1)
echo ct[131] # should output 100, too

Current Output

100
0

Expected Output

100
100

Additional Information

Nim Compiler Version 0.20.99 [Windows: amd64]

The text was updated successfully, but these errors were encountered:

narimiran · 2019-09-16T17:12:42Z

Thanks for using the issue template and coming up with the precise* small example of the problem. It really makes debugging much easier.

* because even small changes don't exhibit this behaviour — this bug has been with us for a very long time, and only now somebody has spotted it.

We cannot use `while t.data[h].val != 0:` in `rawGet` because zero can be a valid value for an existing key.

Araq · 2019-09-16T18:18:26Z

You really can't count negative values like that... The CountTable is designed that 0 occurrences are not stored.

khchen · 2019-09-16T18:55:09Z

You really can't count negative values like that... The CountTable is designed that 0 occurrences are not stored.

You mean CountTable cannot inc() negative value or cannot store negative value?
However, I still cannot understand why access [132] will influence the value in [131].

Araq · 2019-09-17T05:47:22Z

inc should take a Positive number, yes.

andreaferretti · 2019-09-17T08:10:49Z

That should be documented, and the argument of inc should be a Natural then

narimiran · 2019-09-17T13:17:50Z

That should be documented, and the argument of inc should be a Natural then

Done. #12208

request. This can be conceived as an alternate, more capable resolution of nim-lang#12200 than nim-lang#12208 The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx` to abstract client code conventions for cell emptiness & cell hashing via three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which all take a single integer argument and clear a cell, test if clear or produce the hash of the key stored at that index in `.data[]`. Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of `delImpl`/`delImplIdx` by defining define those arguments just before the first invocation as non-exported templates. Because `CountTable` does not save hash() outputs as `.hcode`, it needs a new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search when no `.hcode` field is available for "prefix compare" acceleration. It is conceivable this new template could be used by future variants, such as one optimized for integer keys where `hash()` and `==` are fast and `.hcode` is both wasted space & time (though a small change to interfaces there for a sentinel key meaning "empty" is needed for maximum efficiency). We also eliminate the old O(n) `proc remove(CountTable...)` in favor of simply invoking the new `delImpl*` templates and take care to correctly handle the case where `val` is either zero for non-existent keys in `inc` or evolves to zero over time in `[]=` or `inc`. The only user-visible changes from the +-42 delta here are speed, iteration order post deletes, and relaxing the `Positive` constraint on `val` in `proc inc` again, as indicated in the `changelog.md` entry.

request. This can be conceived as an alternate, more capable resolution of #12200 than #12208 The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx` to abstract client code conventions for cell emptiness & cell hashing via three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which all take a single integer argument and clear a cell, test if clear or produce the hash of the key stored at that index in `.data[]`. Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of `delImpl`/`delImplIdx` by defining define those arguments just before the first invocation as non-exported templates. Because `CountTable` does not save hash() outputs as `.hcode`, it needs a new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search when no `.hcode` field is available for "prefix compare" acceleration. It is conceivable this new template could be used by future variants, such as one optimized for integer keys where `hash()` and `==` are fast and `.hcode` is both wasted space & time (though a small change to interfaces there for a sentinel key meaning "empty" is needed for maximum efficiency). We also eliminate the old O(n) `proc remove(CountTable...)` in favor of simply invoking the new `delImpl*` templates and take care to correctly handle the case where `val` is either zero for non-existent keys in `inc` or evolves to zero over time in `[]=` or `inc`. The only user-visible changes from the +-42 delta here are speed, iteration order post deletes, and relaxing the `Positive` constraint on `val` in `proc inc` again, as indicated in the `changelog.md` entry.

request. This can be conceived as an alternate, more capable resolution of #12200 than #12208 The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx` to abstract client code conventions for cell emptiness & cell hashing via three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which all take a single integer argument and clear a cell, test if clear or produce the hash of the key stored at that index in `.data[]`. Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of `delImpl`/`delImplIdx` by defining define those arguments just before the first invocation as non-exported templates. Because `CountTable` does not save hash() outputs as `.hcode`, it needs a new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search when no `.hcode` field is available for "prefix compare" acceleration. It is conceivable this new template could be used by future variants, such as one optimized for integer keys where `hash()` and `==` are fast and `.hcode` is both wasted space & time (though a small change to interfaces there for a sentinel key meaning "empty" is needed for maximum efficiency). We also eliminate the old O(n) `proc remove(CountTable...)` in favor of simply invoking the new `delImpl*` templates and take care to correctly handle the case where `val` is either zero for non-existent keys in `inc` or evolves to zero over time in `[]=` or `inc`. The only user-visible changes from the +-42 delta here are speed, iteration order post deletes, and relaxing the `Positive` constraint on `val` in `proc inc` again, as indicated in the `changelog.md` entry. (cherry picked from commit b2a1944)

request. This can be conceived as an alternate, more capable resolution of nim-lang#12200 than nim-lang#12208 The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx` to abstract client code conventions for cell emptiness & cell hashing via three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which all take a single integer argument and clear a cell, test if clear or produce the hash of the key stored at that index in `.data[]`. Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of `delImpl`/`delImplIdx` by defining define those arguments just before the first invocation as non-exported templates. Because `CountTable` does not save hash() outputs as `.hcode`, it needs a new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search when no `.hcode` field is available for "prefix compare" acceleration. It is conceivable this new template could be used by future variants, such as one optimized for integer keys where `hash()` and `==` are fast and `.hcode` is both wasted space & time (though a small change to interfaces there for a sentinel key meaning "empty" is needed for maximum efficiency). We also eliminate the old O(n) `proc remove(CountTable...)` in favor of simply invoking the new `delImpl*` templates and take care to correctly handle the case where `val` is either zero for non-existent keys in `inc` or evolves to zero over time in `[]=` or `inc`. The only user-visible changes from the +-42 delta here are speed, iteration order post deletes, and relaxing the `Positive` constraint on `val` in `proc inc` again, as indicated in the `changelog.md` entry.

khchen changed the title ~~Bug in CountTable.~~ CountTable is not reliable. Sep 16, 2019

Araq added the Showstopper label Sep 16, 2019

narimiran added a commit to narimiran/Nim that referenced this issue Sep 16, 2019

fix nim-lang#12200, wrong behaviour of CountTable

b37c0ff

We cannot use `while t.data[h].val != 0:` in `rawGet` because zero can be a valid value for an existing key.

Araq added Severe Standard Library and removed Showstopper labels Sep 17, 2019

narimiran added a commit to narimiran/Nim that referenced this issue Sep 17, 2019

fix nim-lang#12200, cannot 'inc' CountTable by a negative value

d1cb4ac

Araq closed this as completed in 618316b Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CountTable is not reliable. #12200

CountTable is not reliable. #12200

khchen commented Sep 16, 2019

narimiran commented Sep 16, 2019

Araq commented Sep 16, 2019

khchen commented Sep 16, 2019

Araq commented Sep 17, 2019

andreaferretti commented Sep 17, 2019

narimiran commented Sep 17, 2019

CountTable is not reliable. #12200

CountTable is not reliable. #12200

Comments

khchen commented Sep 16, 2019

Example

Current Output

Expected Output

Additional Information

narimiran commented Sep 16, 2019

Araq commented Sep 16, 2019

khchen commented Sep 16, 2019

Araq commented Sep 17, 2019

andreaferretti commented Sep 17, 2019

narimiran commented Sep 17, 2019